Hi Jack, thanks for your reply. Ok in this case I agree that "enriching" the query in the application layer is a good idea. We are still a bit puzzled how the enriched query should look like. I'll post here when we found a solution. If somebody has suggestions, I'd be happy to hear them.
Mirko 2013/11/21 Jack Krupansky <j...@basetechnology.com> > The query parser does its own tokenization and parsing before your > analyzer tokenizer and filters are called, assuring that only one white > space-delimited token is analyzed at a time. > > You're probably best off having an application layer preprocessor for the > query that "enriches" the query in the manner that you're describing. > > Or, simply settle for a "heuristic" approach that may give you 70% of what > you want using only existing Solr features on the server side. > > -- Jack Krupansky > > -----Original Message----- From: Mirko > Sent: Thursday, November 21, 2013 5:30 AM > To: solr-user@lucene.apache.org > Subject: Parse eDisMax queries for keywords > > > Hi, > We would like to implement special handling for queries that contain > certain keywords. Our particular use case: > > In the example query "Footitle season 1" we want to discover the keywords > "season" , get the subsequent number, and boost (or filter for) documents > that match "1" on field name="season". > > We have two fields in our schema: > > <!-- "titles" contains titles --> > <field name="title" type="text" indexed="true" stored="true" > multiValued="false"/> > > <fieldType name="text" class="solr.TextField" omitNorms="true"> > <analyzer > > <charFilter class="solr.MappingCharFilterFactory" > mapping="mapping-ISOLatin1Accent.txt"/> > <tokenizer class="solr.StandardTokenizerFactory"/> > <filter class="solr.LowerCaseFilterFactory"/> > <!-- ... --> > </analyzer> > </fieldType> > > <field name="season" type="season_number" indexed="true" stored="false" > multiValued="false"/> > > <!-- "season" contains season numbers --> > <fieldType name="season_number" class="solr.TextField" omitNorms="true" > > <analyzer type="query"> > <tokenizer class="solr.KeywordTokenizerFactory"/> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.PatternReplaceFilterFactory" pattern=".*(?:season) > *0*([0-9]+).*" replacement="$1"/> > </analyzer> > </fieldType> > > > Our idea was to use a Keyword tokenizer and a Regex on the "season" field > to extract the season number from the complete query. > > However, we use a ExtendedDisMax query parser in our search handler: > > <requestHandler name="/select" class="solr.SearchHandler"> > <lst name="defaults"> > <str name="defType">edismax</str> > <str name="qf"> > title season > </str> > > </lst> > </requestHandler> > > > The problem is that the eDisMax tokenizes the query, so that our field > "season" receives the tokens ["Foo", "season", "1"] without any order, > instead of the complete query. > > How can we pass the complete query (untokenized) to the season field? We > don't understand which tokenizer is used here and why our "season" field > received tokens instead of the complete query. > > Or is there another approach to solve this use case with Solr? > > Thanks, > Mirko >