Look at the admin/analysis page and be sure to check the "verbose"
checkboxes. that'll show you what each filter does to the input. My
guess is that WordDelimiterFilterFactory has different parameters
and that's what you're seeing. WDFF can be tricky to understand...

If that's not helpful, you need to provide your field definition.

Best
Erick

On Fri, Mar 2, 2012 at 10:52 PM, Floyd Wu <floyd...@gmail.com> wrote:
> Hi there,
>
> I have a document and its title is "20111213_solr_apache conference report".
>
> When I use analysis web interface to see what tokens exactly solr analyze
> and the following is the result
>
> term text20111213_solrapacheconferencereportterm type<NUM><ALPHANUM>
> <ALPHANUM><ALPHANUM>
>
>
> Why 20111213_solr tokenized as <NUM> and "_" char won't be removed? (I've
> add "_" as stop word in stopwords.txt)
>
> I did another test when "20111213_solr_apache conference_report".
> As you can see the difference is I add an underscore char between
> conference and report. To analyze this string
> term text20111213_solrapacheconferencereportterm type<NUM><ALPHANUM>
> <ALPHANUM><ALPHANUM>
> this time the underscore char between conference and report is removed!
>
> Why? How to make solr remove underscore char and behave consistent?
> Please help on this.
>
> Thanks in advance.
>
> Floyd

Reply via email to