In my application, I have documents like:

DOCUMENT 1:
part_num: ABC123 Spark Plug
application: 2008 Toyota Corolla
application: 2007 Honda Civic

DOCUMENT 2:
part_num: FGH234 Spark Plug
application: 2007 Toyota Corolla
application: 2008 Honda Civic

The "application" field is set up to be a multi-valued field, and I am using
the DisMax request handler.

My goal is to be able to have the user search for something like:

2008 Toyota Corolla Spark Plug

and have it match Document 1 in this case. This currently works by using
DisMax and having it search both the part_num and application field.
However, this search also finds Document 2 because all the terms, "2008",
"Toyota", and "Corolla" all appear in the application fields, even though
they do not belong together in this case.

I understand that it may be hard to eliminate Document 2 from the search
results because the search has to be allowed to be a little fuzzy, but if I
check the scores of the documents, Document 1 is just barely ahead of
Document 2 in its score. I would like to figure out a way to get Document 1
to score higher in this case, since part of the query matches the phrase in
its application exactly.

I've been playing around with the phrase fields (pf) and phrase slop (ps)
parameters to try to get it to realize that "2008 Toyota Corolla" is a
phrase, in this example, and weight it higher for Document 1, but I haven't
been able to get Solr to identify this as a phrase. I've been looking at the
debug query and it will identify it as a phrase if the user only types in
something like:

2008 Toyota Corolla

but as soon as the Spark Plug terms are added, it looks like Solr is trying
to make the entire search expression into one long phrase.

Does anyone have a recommendation of how this can be done, so it can break
the search expression down and automatically make a phrase out of part of
it? Or, should I approach this whole problem from a different angle? Thanks.

Reply via email to