Sorry, I missed this. We have the same problem.

None of our customers use query syntax, so I have considered making a
full-text query parser. Use the analyzer chain, then convert the result
into a big OR query, then pass it to the rest of Dismax. Shingles and
synonyms should work at query time with that approach.

This question should probably go to a Lucene list, too.

wunder

On 3/11/09 2:54 AM, "Tobias Dittrich" <dittr...@wave-computer.de> wrote:

> Hmmm was my mail so weird or my question so stupid ... or is
> there simply noone with an answer? Not even a hint? :(
> 
> Tobias Dittrich schrieb:
>> Hi all,
>> 
>> I know there are a lot of topics about compound word search already but
>> I haven't found anything for my specific problem yet. So if this is
>> already answered (which would be nice :)) then any hints or search
>> phrases for the mail archive would be apreciated.
>> 
>> Bascially I want users to be able to search my index for compound words
>> that are not really compounds but merely terms that can be written in
>> several ways.
>> 
>> For example I have the categories "usb" and "cable" in my index and I
>> want the user to be able to search for "usbcable" or "usb-cable" etc.
>> Also there is "bluetooth" in the index and I want the search for "blue
>> tooth" to return the corresponding documents.
>> 
>> My approach is to use ShingleFilterFactory followed by
>> WordDelimiterFilterFactory to index all possible combinations of words
>> and get rid of intra-word delimiters. This nicely covers the first part
>> of my requirements since the terms "usb" and "cable" somewhere along the
>> process get concatenated and "usbcable" is in the index.
>> 
>> Now I also want use this on the query side, so the user input "blue
>> tooth" (not as phrase) would become "bluetooth" for this field and
>> produce a hit. But this never happens since with the DisMax Searcher the
>> parser produces a query like this:
>> 
>> ((category:blue | name:blue)~0.1 (category:tooth | name:tooth)~0.1)
>> 
>> And the filters and analysers for this field never get to see the whole
>> user query and cannot perform their shingle and delimiter tasks :(
>> 
>> So my question now is: how can I get this working? Is there a preferable
>> way to deal with this compound word problem? Is there another query
>> parser that already does the trick?
>> 
>> Or would it make sense to write my own query parser that passes the user
>> query "as is" to the several fields?
>> 
>> Any hints on this are welcome.
>> 
>> Thanks in advance
>> Tobias
>> 

Reply via email to