No. My project is retail based. I mean people putting in a slew of
irrelevant keywords in addition to relevant keywords in an attempt to get
hits on searches and hits outside of context.

I used a filter factory to remove duplicates.

On 23 April 2016 at 11:30, Doug Turnbull <
dturnb...@opensourceconnections.com> wrote:

> By keyword spamming, do you mean stuffing the same term over and over to
> game term frequency?
>
> If so You might want to try tuning BM25 similarity for your needs. It has a
> saturation point for term frequency.
>
>
> http://opensourceconnections.com/blog/2015/10/16/bm25-the-next-generation-of-lucene-relevation/
>
> You can also write your own similarity that sets a max for term frequency.
>
> I'd also consider figuring out if you can build a page rank like measure
> that can signal content trustworthiness. Spammer sites won't be linked to
> very heavily by trusted sites.
>
> If you just mean spamming like lots of unique keywords, length
> normalization was built just for this reason: to bias relevance toward less
> verbose and more specific matches
>
> Hope that helps
>
> Doug
> On Sat, Apr 23, 2016 at 10:02 AM GW <thegeofo...@gmail.com> wrote:
>
> > Hey all,
> >
> > I'm just finishing up a project and I'm hoping for some direction on
> > dealing with keyword spamming.
> >
> > I don't have any urgent issues. I can foresee some bumps in the road.
> >
> > I'm using a custom spider that pulls inventory data from several dozen
> > sources into a single doc schema. 1 record per item per location.
> >
> > Data from several sources have an existing keyword field. Some records
> > coming in have empty or null data for keywords.
> >
> > I concatenated my category and keyword data into the keyword field so I
> > would not have any empty keyword data to satisfy a query builder.
> >
> > I have a recommended keyword list I could use to count hits before I
> index.
> > It's a painful thought.
> >
> > I want to be able to detect people that are trying to do keyword
> spamming.
> >
> > So my question is: Is there some kind of FM that I'm not aware of?
> >
> > Thanks in advance,
> >
> > GW
> >
>

Reply via email to