On Apr 19, 2007, at 6:56 AM, Michael Kimsal wrote:
It's bugged us a little bit, because it's something that we need
(to be able to emulate the previous foo LIKE '%bar%' SQL behaviour we're
replacing), but can't offer our users yet.

I have also run into this issue and have intended to fix up Solr to allow configuring that switch on QueryParser. I'll eventually get to this, but someone supply a patch with a test case would get it done sooner.

I must, however, caveat discussion of leading wildcards with the underlying effect you get. If you use standard analysis and perform a leading wildcard query, you incur a (possibly) dramatic hit in terms of performance. Lucene has to scan *every* term in the specified field. In fact, with my 3.7M index, a fuzzy query for the very same reason, kills the query. There is also a switch on fuzzy query that needs to be configurable through Solr, to adjust the number of leading characters that are fixed to avoid this all term scanning.

There are techniques that can be used to improve the performance of in-string types of queries like this, at the expense of indexing time and size and clever query creation. One such technique I've used successfully is term rotation enumeration (cat => cat$, at$c, t $ca). This involves custom analyzers and query creation.

Once Solr supports this switch, you may find performance fine with leading wildcard queries, but at least be forewarned that there are scalability skeletons in this closet.

        Erik

Reply via email to