On Oct 13, 2008, at 9:34 PM, abhishek007 wrote:
Svein Parnas-2 wrote:
One way to boost exact match of one occurrence of a multivalued field
is to add some kind of special start-of-field token and end-of-field
token in the data, eg:
<document>
<field name="professor">John Dane</field>
<field name="course">softok Algorithms eoftok</field>
<field name="course">softok Theory eoftok</field>
<field name="course">softok Computability, Complexity and Algorithms
eoftok</field>
</document>
Then, in your query you can boost hits with the complete phrase
"softok queryword eoftok" by doing something like
queryword OR "softok queryword eoftok"^10
I see what you are saying, but what if the query string itself
contains
multiple synonyms, for example something like "Algorithms, Theory".
With
this I would end up having "softok Algorithms, Theory eoftok" which
would
not match the indexed data.
I was just trying to point you in a direction, not giving a complete
solution. For multiword queries, the solution will depend on the query
syntax you are going to support and how you want the ranking to be
performed. For instance, if the interpretation of a simple two word
query would be: "Both words required, boost short field occurrences
before long but sort those hits where both words occure in the same
field occurrence first", the query could be rewritten to
+"softok wordA eoftok"~<x> +"softok wordB eoftok"~<x> "wordA
wordB"~<x>^50
where <x> is about the number of tokens in the longest occurrence of
the field in the index, but less than the field“s positionincrementgap.
The query parsing might get a bit messy if you are going to support
advanced syntax. If the syntax you are going to support is about the
same as DisMax, it could be an idea to modify DisMaxRequestHandler.
Another way to go would be to use DisMax as is, find all query terms
not prefixed with - in the query and add "softok word eoftok"~<x> to
the bq parameter.
Svein