Re: term position question from analyzer stack for WordDelimiterFilterFactory

Otis Gospodnetic Tue, 26 Apr 2011 14:37:32 -0700

Hi Robert,

I'm no WDFF expert, but all these zero look suspicious:


org.apache.solr.analysis.WordDelimiterFilterFactory {preserveOriginal=0,
generateNumberParts=0, catenateWords=0, generateWordParts=0,
catenateAll=0, catenateNumbers=0}

A quick visit to 
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory
 makes me think you want:

splitOnCaseChange=1  (if you want Mc Afee for some reason?)
generateWordParts=1 (if you want Mc Afee for some reason?)
preserveOriginal=1


Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



----- Original Message ----
> From: Robert Petersen <[email protected]>
> To: [email protected]; [email protected]
> Sent: Tue, April 26, 2011 4:39:49 PM
> Subject: RE: term position question from analyzer stack for 
>WordDelimiterFilterFactory
> 
> OK this is even more weird... everything is working much better except
> for  one thing: I was testing use cases with our top query terms to make
> sure the  below query settings wouldn't break any existing behavior, and
> got this most  unusual result.  The analyzer stack completely eliminated
> the word  McAfee from the query terms!  I'm like huh?  Here is the
> analyzer  page output for that search term:
> 
> Query  Analyzer
> org.apache.solr.analysis.WhitespaceTokenizerFactory {}
> term  position     1
> term text     McAfee
> term  type     word
> source start,end      0,6
> payload     
> org.apache.solr.analysis.SynonymFilterFactory
> {synonyms=query_synonyms.txt,  expand=true, ignoreCase=true}
> term position     1
> term  text     McAfee
> term type     word
> source  start,end     0,6
> payload     
> org.apache.solr.analysis.StopFilterFactory  {words=stopwords.txt,
> ignoreCase=true}
> term position      1
> term text     McAfee
> term type      word
> source start,end     0,6
> payload     
> org.apache.solr.analysis.WordDelimiterFilterFactory  {preserveOriginal=0,
> generateNumberParts=0, catenateWords=0,  generateWordParts=0,
> catenateAll=0, catenateNumbers=0}
> term  position
> term text
> term type
> source  start,end
> payload
> org.apache.solr.analysis.LowerCaseFilterFactory  {}
> term position
> term text
> term type
> source  start,end
> payload
> com.lucidimagination.solrworks.analysis.LucidKStemFilterFactory
> {protected=protwords.txt}
> term  position
> term text
> term type
> source  start,end
> payload
> org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory  {}
> term position
> term text
> term type
> source  start,end
> payload
> 
> 
> 
> -----Original Message-----
> From: Robert  Petersen [mailto:[email protected]] 
> Sent: Monday, April 25,  2011 11:27 AM
> To: [email protected]; [email protected]
> Subject:  RE: term position question from analyzer stack  for
> WordDelimiterFilterFactory
> 
> Aha!  I knew something must be  awry, but when I looked at the analysis
> page output, well it sure looked like  it should match.  :)
> 
> OK here is the query side WDF that finally  works, I just turned
> everything off.  (yay)  First I tried just  completely removeing WDF from
> the query side analyzer stack but that didn't  work.  So anyway I suppose
> I should turn off the catenate all plus the  preserve original settings,
> reindex, and see if I still get a match  huh?  (PS  thank you very much
> for the help!!!)
> 
>            <filter  class="solr.WordDelimiterFilterFactory"
>                  generateWordParts="0"
>                  generateNumberParts="0"
>                  catenateWords="0"
>                  catenateNumbers="0"
>                  catenateAll="0"
>                  preserveOriginal="0"
>                  />    
> 
> 
> 
> -----Original Message-----
> From: [email protected] [mailto:[email protected]] On Behalf Of  Yonik
> Seeley
> Sent: Monday, April 25, 2011 9:24 AM
> To: [email protected]
> Subject:  Re: term position question from analyzer stack  for
> WordDelimiterFilterFactory
> 
> On Mon, Apr 25, 2011 at 12:15 PM,  Robert Petersen <[email protected]>
> wrote:
> > The  search and index analyzer stack are the same.
> 
> Ahhh, they should not  be!
> Using both generate and catenate in WDF at query time is a no-no.
> Same  reason you can't have multi-word synonyms at query time:
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.Synonym
> FilterFactory
> 
> I'd  recommend going back to the WDF settings in the solr example
> server as a  starting point.
> 
> 
> -Yonik
> http://www.lucenerevolution.org -- Lucene/Solr User  Conference, May
> 25-26, San Francisco
>

Re: term position question from analyzer stack for WordDelimiterFilterFactory

Reply via email to