: Hmmm was my mail so weird or my question so stupid ... or is there simply : noone with an answer? Not even a hint? :(
patience my freind, i've got a backlog of ~~500 Lucene related messages in my INBOX, and i was just reading your original email when this reply came in. In generally this is a fairly hard problem ... the easiest solution i know of that works in most cases is to do index time expansion using the SYnonymFilter, so regardless of wether a document contains "usbcable" "usb-cable" or "usb cable" all three varients get indexed, and then the user can search for any of them. the downside is that it can throw off your tf/idf stats for some terms (if they apear by themselves, and as part of a compound) and it can result in false positives for esoteric phrase searches (but that tends to be more of a theoretical problem then an actual one. : > But this never happens since with the DisMax Searcher the parser produces a : > query like this: : > : > ((category:blue | name:blue)~0.1 (category:tooth | name:tooth)~0.1) ... : > to deal with this compound word problem? Is there another query parser that : > already does the trick? take a look at the FieldQParserPlugin ... it passes the raw query string to the analyser of a specified field -- this would let your TokenFilters see the "stream" of tokens (which isn't possible with the conventional QueryParser tokenization rules) but it doesn't have any of the "field/query matric cross product" goodness of dismax -- you'd only be able to query the one field. (Hmmm.... i wonder if DisMaxQParser 2.0 could have an option to let you specify a FieldType whose analyzer was used to tokenize the query string instead of using the Lucene QueryParser JavaCC tokenization, and *then* the tokens resulting from that initial analyzer could be passed to the analyzers of the various qf fields ... hmmm, that might be just crazy enough to be too crazy to work) -Hoss