Re: WordDelimiterFilterFactory - tokenizer question

Mike L. Sun, 05 Apr 2015 09:44:26 -0700

Thanks Jack! That was oversight on my end - I also assumed the 
splitOnNumerics="1" and LowerCaseFilterFactory would be breaking out the 
tokens. I tried again with generateWordParts="1" generateNumberParts="1" and it 
seemed to work. Appreciate it.

Mike

      From: Jack Krupansky <jack.krupan...@gmail.com>
 To: solr-user@lucene.apache.org; Mike L. <javaone...@yahoo.com> 
 Sent: Sunday, April 5, 2015 8:23 AM
 Subject: Re: WordDelimiterFilterFactory - tokenizer question

You have to tell the filter what types of tokens to generate - words, numbers. 
You told it to generate... nothing. You did tell it to preserve the original, 
unfiltered token though, which is fine.
-- Jack Krupansky

On Sun, Apr 5, 2015 at 3:39 AM, Mike L. <javaone...@yahoo.com.invalid> wrote:

Solr User Group,
    I have a non-multivalied field with contains stored values similar to this:

US100AUS100BUS100CUS100-DUS100BBA
My assumption is - If I tokenized with the below fieldType definition, 
specifically the WDF -splitOnNumbers and the LowerCaseFilterFactory would have 
have provided me solr matches on the following query words:
?q=US 100?q=US100
across on field values. In other words, all US100A, US100B, US100C, US100-D 
would have matched and scored against my qf weights. However - I'm not seeing 
that sort of behavior and have tried various combinations and starting to 
question my assumptions on the tokenizer.

Ideally - I would like to return all values (US100A, US100B, US100C, US100-D) 
when for example, q=US100A is searched on this field.

I know I should probably provide the debugQuery results, but was hoping this 
was a quick hit for somebody and also I'm reindexing. 
WordDelimiterFilterFactory doesn't seem to be working as expected. Hoping to 
get some clarification or if something sticks out here.

Below is the field type definition being used:
 <fieldType name="field_tokenized" class="solr.TextField" omitNorms="true">
       <analyzer type="index">
        <tokenizer  class="solr.WhitespaceTokenizerFactory"/>
          <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" 
ignoreCase="true" expand="true"/>
         <filter class="solr.TrimFilterFactory"/>
         <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.WordDelimiterFilterFactory" splitOnNumerics="1" 
preserveOriginal="1" generateWordParts="0" generateNumberParts="0" 
catenateWords="0" catenateNumbers="0" catenateAll="0"/>
       </analyzer>

      <analyzer type="query">
        <tokenizer  class="solr.WhitespaceTokenizerFactory"/>
          <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" 
ignoreCase="true" expand="true"/>
         <filter class="solr.TrimFilterFactory"/>
         <filter class="solr.LowerCaseFilterFactory"/>
         <filter class="solr.WordDelimiterFilterFactory" splitOnNumerics="1"  
generateWordParts="0" generateNumberParts="0" catenateWords="0" 
catenateNumbers="0" catenateAll="0"/>
     </analyzer>
    </fieldType>

Thanks in advance.
Mike

Re: WordDelimiterFilterFactory - tokenizer question

Reply via email to