hossman wrote:
> 
> 
> I'm not sure i understand what you mwan.  Why would protected words (in 
> regards to the stemmer) reduce recall ? ... i guess it depends on the 
> words you are protecting right ... but why would you wnat to reduce 
> recall?  isn't the goal usually to increases recall while keeping 
> precision high?
> 
> (disclaimer: i'm not very smart when it comes to theoretical IR, i'm more 
> of a hands on "practicallist" .. i try stuff, i draw on past experience to
> analyzer for
> decide if it's "better" and then i deploy it and if my user satisfaction 
> numbers go down i roll back.)
> 
> : It could be that a parallel approach using dismax boosting for fields
> such
> : as "product name" and "category" will,  beside increasing precision,
> also
> : reducing false hit recall?
> 
> Hmmm... i think it's safe to see that intellegent choice of qf, pf, 
> bf, and bq values (based on inherent knowledge of hte corpus) can increase 
> precision; but unless you use prohibitive fq clauses, i don't know that 
> you will actaully be reducing your false hit rate ... you're just making 
> their scores very small relative the top scoring docs.  a strict "mm" is 
> your best bet for reducing the number of "false hits" (because things that 
> don't match "enough" of the input terms will be weeded out)
> 
> 
> 
> 
> -Hoss
> 
> 
> 

Yes, i'd like to protect some words (at least among the most queried) from
being stemmed, but since this require some custom work at Lucene java class
level (as said before for italian), i was looking for possibly alternative
approaches.


A practical example: the words "sole" ("sun", but also "lonely") and the
word "solo" ("only", but also "alone") stems in "sol". I'd like to protect
"sole" from being stemmed. Maybe a solution would be to add the latter in
the stopwords.txt?

I think i still have to tune and play with all the dismax paramethers and
set as much strict as possible the "mm" as you said. I'm trying to have a
good balancing of boosting options

Anyway in most of the cases Solr works very well.

I'll let u know!


-- 
View this message in context: 
http://www.nabble.com/SnowballPorterFilterFactory-and-protected-words-tp15042758p15132455.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to