The problem here is defining "irrelevant". There's nothing in Solr
that magically can determine "this term is irrelevant in this doc, but
this other one isn't".
Best,
Erick
On Sat, Apr 23, 2016 at 11:08 AM, GW wrote:
> No. My project is retail based. I mean people putting in a slew of
> irreleva
No. My project is retail based. I mean people putting in a slew of
irrelevant keywords in addition to relevant keywords in an attempt to get
hits on searches and hits outside of context.
I used a filter factory to remove duplicates.
On 23 April 2016 at 11:30, Doug Turnbull <
dturnb...@opensourcec
By keyword spamming, do you mean stuffing the same term over and over to
game term frequency?
If so You might want to try tuning BM25 similarity for your needs. It has a
saturation point for term frequency.
http://opensourceconnections.com/blog/2015/10/16/bm25-the-next-generation-of-lucene-releva