Yes, I actually noted that about the filter vs. tokenizer. It's easy to get confused if you don't have a good understanding of the differences between tokenizers and filters.
As for the query parser problem, there's always a workaround, but it was nice to be made aware of. It sort of was a ghost-like problem before. Allthough it would be great to have the opportunity to "disable" the splitting on whitespace even for DisMax, I understand that it probably not the most wanted feature for next solr release :) *Aleksander Akerø* Systemkonsulent Mobil: 944 89 054 E-post: aleksan...@gurusoft.no *Gurusoft AS* Telefon: 92 44 09 99 Østre Kullerød www.gurusoft.no 2014-01-30 Erick Erickson <erickerick...@gmail.com>: > Note, the comments about lowercasetokenizer were a red herring. You were > using LowerCaseFilterFactory. note "Filter" rather than "Tokenizer". So it > would > just do what you expected, lowercase the entire input. You would have used > LowerCaseTokenizerFactory in place of KeywordTokenizerFactory, not as a > Filter. > > As for the rest, I expect Jack is right, it's the query parsing above > the field input. > > Best > Erick > > On Thu, Jan 30, 2014 at 6:29 AM, Aleksander Akerø > <aleksan...@gurusoft.no> wrote: > > Hi Srinivasa > > > > Yes I've come to understand that the analyzers will never "see" the > > whitespace, thus no need for patternreplacement, like Jack points out. So > > the solution would be to set wich parser to use for the query. Also Jack > > has pointed out that the "field" queryparser should work in this > particular > > setting -> http://wiki.apache.org/solr/QueryParser > > > > My problem was though, that it was only for one of the fields in the > schema > > that i needed this for, but for all the other fields, e.g. name, > > description etc., I would very much like to make use of the eDisMax > > functionality. And it seems that there can only be defined one query > parser > > per query. in other words: for all fields. Jack, you may correct me if > I'm > > wrong here :) > > > > This particular customer wanted a wildcard search at both ends of the > > phrase, and that sort of ambiguated the problem. And therefore I chose to > > replace all whitespace for this field in sql at index time, using the > DIH. > > And then using EdgeNGramFilterFactory on both sides of the keyword like > the > > config below, and that seemed to work pretty nicely. > > > > <!-- #### WildCard search number #### --> <fieldType name="keyword" > class= > > "solr.TextField" positionIncrementGap="100"> <analyzer type="index"> < > > tokenizer class="solr.KeywordTokenizerFactory"/> <filter class= > > "solr.LowerCaseFilterFactory"/> <filter > class="solr.EdgeNGramFilterFactory" > > minGramSize="2" maxGramSize="25" side="front"/> <filter class= > > "solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="25" > side="back"/> > > </analyzer> <analyzer type="query"> <tokenizer class= > > "solr.KeywordTokenizerFactory"/> <filter > class="solr.LowerCaseFilterFactory" > > /> </analyzer> </fieldType> > > > > I also added a bit of extra weighting for the "keyword" field so that > exact > > matches recieved a higher score. > > > > What this solution doesn't do is to exclude values like "EE 009", when > > searching for "FE 009", but they return far down on the list, which for > the > > customer is ok, because usually these results are somewhat related og > > within the same category. > > > > *Aleksander Akerø* > > Systemkonsulent > > Mobil: 944 89 054 > > E-post: aleksan...@gurusoft.no > > > > *Gurusoft AS* > > Telefon: 92 44 09 99 > > Østre Kullerød > > www.gurusoft.no > > > > > > 2014-01-30 Jack Krupansky <j...@basetechnology.com> > > > >> The standard, keyword-oriented query parsers will all treat unquoted, > >> unescaped white space as term delimiters and ignore the what space. > There > >> is no way to bypass that behavior. So, your regex will never even see > the > >> white space - unless you enclose the text and white space in quotes or > use > >> a backslash to quote each white space character. > >> > >> You can use the "field" and "term" query parsers to pass a query string > as > >> if it were fully enclosed in quotes, but that only handles a single term > >> and does not allow for multiple terms or any query operators. For > example: > >> > >> {!field f=myfield}Foo Bar > >> > >> See: > >> http://wiki.apache.org/solr/QueryParser > >> > >> You can also pre-configure the field query parser with the defType=field > >> parameter. > >> > >> -- Jack Krupansky > >> > >> > >> -----Original Message----- From: Srinivasa7 > >> Sent: Thursday, January 30, 2014 6:37 AM > >> > >> To: solr-user@lucene.apache.org > >> Subject: Re: KeywordTokenizerFactory - trouble with "exact" matches > >> > >> Hi, > >> > >> I have similar kind of problem where I want search for a words with > >> spaces > >> in that. And I wanted to search by stripping all the spaces . > >> > >> I have used following schema for that > >> > >> <fieldType name="nospaces" class="solr.TextField" > >> autoGeneratePhraseQueries="true" > > >> <analyzer type="index"> > >> <tokenizer class="solr.KeywordTokenizerFactory"/> > >> <filter class="solr.LowerCaseFilterFactory"/> > >> <filter class="solr.PatternReplaceFilterFactory" > >> pattern="[^\w]+" replacement="" replace="all"/> > >> </analyzer> > >> <analyzer type="query"> > >> > >> <tokenizer class="solr.KeywordTokenizerFactory"/> > >> <filter class="solr.LowerCaseFilterFactory"/> > >> <filter class="solr.PatternReplaceFilterFactory" > >> pattern="[^\w]+" replacement="" replace="all"/> > >> </analyzer> > >> </fieldType> > >> > >> > >> And > >> > >> > >> <field name="text_nospaces" type="nospaces" indexed="true" > stored="true" > >> omitNorms="true" /> > >> <copyField source="text" dest="text_nospaces" /> > >> > >> > >> > >> But it is not searching the right terms . we are stripping the spaces > and > >> indexing lowercase values when we do that. > >> > >> > >> Like : East Enders > >> > >> when I seach for 'east end ers' text, its not returning any values > >> saying > >> no document found. > >> > >> I realised the solr uses QueryParser before passing query string to the > >> QueryAnalyzer in defined in schema. > >> > >> And The Query parser is tokenizing the query string providing in query > . So > >> it is sending each token to the QueryAnalyser that is defined in schema. > >> > >> > >> SO is there anyway that I can by pass this query parser or use a correct > >> query processor which can consider the entire string as single pharse. > >> > >> At the moment I am using dismax query processor. > >> > >> Any suggestion would be much appreciated. > >> > >> Thanks > >> Srinivasa > >> > >> > >> > >> -- > >> View this message in context: http://lucene.472066.n3.nabble.com/ > >> > KeywordTokenizerFactory-trouble-with-exact-matches-tp4114193p4114432.html > >> Sent from the Solr - User mailing list archive at Nabble.com. > >> >