Re: StopWords coming in Top 10 terms despite using StopFilterFactory

Pranav Prakash Fri, 23 Sep 2011 00:46:08 -0700

> You've got CommonGramsFilterFactory and StopFilterFactory both using
> stopwords.txt, which is a confusing configuration.  Normally you'd want one
> or the other, not both ... but if you did legitimately have both, you'd want
> them to each use a different wordlist.
>


Maybe I am wrong. But my intentions of using both of them is - first I want
to use phrase queries so used CommonGramsFilterFactory. Secondly, I dont
want those stopwords in my index, so I have used StopFilterFactory to remove
them.



>
> The commongrams filter turns each found occurrence of a word in the file
> into two tokens - one prepended with the token before it, one appended with
> the token after it.  If it's the first or last term in a field, it only
> produces one token.  When it gets to the stopfilter, the combined terms no
> longer match what's in stopwords.txt, so no action is taken.
>
> If I had to guess, what you are seeing in the top 10 terms is the
> concatenation of your most common stopword with another word.  If it were
> English, I would guess that to be "of_the" or something similar.  If my
> guess is wrong, then I'm not sure what's going on, and some cut/paste of
> what you're actually seeing might be in order.


term frequencyto 26164and 25804the 25566of 25022a 24918in 24590for 23646n23588
with 23055is 22510



>  Did you do delete and do a full reindex after you changed your schema?
>

Yup I did that a couple of times


>
> Thanks,
> Shawn
>
>
*Pranav Prakash*

"temet nosce"

Twitter <http://twitter.com/pranavprakash> | Blog <http://blog.myblive.com/>
 | Google <http://www.google.com/profiles/pranny>

Re: StopWords coming in Top 10 terms despite using StopFilterFactory

Reply via email to