I don't completely understand. I think maybe you replaced your
domain-specific actualities with another example in an attempt to be
more general or not reveal your business, but just made your explanation
even more confusing!
But. At the point you are indexing, is it possible to know that "shoes"
should not be indexed for that record at all?
If so, then the best bet is indeed to prevent it from being indexed at
all at the point of indexing.
Depending on exactly what algorithm you can describe for how you know
when a given term should NOT have been included in the record, and
should not be indexed -- there may be ways to do it with Solr analysis.
Otherwise, you'd have to just preprocess before even giving it to Solr
for indexing.
Hope this gives you some ideas at least, I don't entirely understand
what you're trying to do.
On 4/5/2011 4:05 PM, Octavian Covalschi wrote:
Hi there,
I'm trying to use Solr in one of my projects and I've got a small problem
that I can't figure out.
Basically our application is collecting data submitted by users. Now the
problem is that submitted data may contain some incorrect info, like some
keywords that will mess up search results. A simple example:
I've got an article about a pair of glasses. The title of the page where
that product is located contains also "shoes" keyword, which is irrelevant
to glasses but relevant to that entire website. So, basically in this case I
need to exclude "shoes" when I have "glasses" keyword. We're using page's
title and other content to generate some keywords. If I'm searching "women
shoes" or even "shoes" I'll get those glasses as well, which is not what I
need.
Does it make any sense? Maybe someone has any idea? Or had similar problems
and found a decent solution? It looks to me I need a list of ignored
keywords for some terms, incompatible keywords from semantic point of view,
but I'm not sure if that's the best way to do... even so, I'm not sure how
to make blacklisted terms for other terms.
Thank you in advance.