: Hmmm was my mail so weird or my question so stupid ... or is there simply
: noone with an answer? Not even a hint? :(

patience my freind, i've got a backlog of ~~500 Lucene related messages in 
my INBOX, and i was just reading your original email when this reply came 
in.

In generally this is a fairly hard problem ... the easiest solution i know 
of that works in most cases is to do index time expansion using the 
SYnonymFilter, so regardless of wether a document contains "usbcable" 
"usb-cable" or "usb cable" all three varients get indexed, and then the 
user can search for any of them.

the downside is that it can throw off your tf/idf stats for some terms (if 
they apear by themselves, and as part of a compound) and it can result in 
false positives for esoteric phrase searches (but that tends to be more of 
a theoretical problem then an actual one.

: > But this never happens since with the DisMax Searcher the parser produces a
: > query like this:
: > 
: > ((category:blue | name:blue)~0.1 (category:tooth | name:tooth)~0.1)
        ...
: > to deal with this compound word problem? Is there another query parser that
: > already does the trick?

take a look at the FieldQParserPlugin ... it passes the raw query string 
to the analyser of a specified field -- this would let your TokenFilters 
see the "stream" of tokens (which isn't possible with the conventional 
QueryParser tokenization rules) but it doesn't have any of the 
"field/query matric cross product" goodness of dismax -- you'd only be 
able to query the one field.

(Hmmm.... i wonder if DisMaxQParser 2.0 could have an option to let you 
specify a FieldType whose analyzer was used to tokenize the query string 
instead of using the Lucene QueryParser JavaCC tokenization, and *then* 
the tokens resulting from that initial analyzer could be passed to the 
analyzers of the various qf fields ... hmmm, that might be just crazy 
enough to be too crazy to work)




-Hoss

Reply via email to