Re: Customizing Solr to handle Leading Wildcard queries

Neal Richter Wed, 28 Jan 2009 00:05:32 -0800

leading wildcard search is called grep ;-)

Ditto on the indexing reversed words suggestion.


Can you create a second field in solr that contains /only/ the words
from the fields you care to reverse?  Once you do that you could
pre-process the query and look for leading wildcards and address those
(after reversing the query) only against your special
reverse-meta-data field.

The *foo* case really is grep! You nearly by definition have to
linearly scan the index unless some magic is added.

Your options are to extend Otis' ngram suggestion and turn a word like
"baffoonery"
into:

(stored in "meta field")
baffoonery
affoonery
ffoonery
foonery
oonery
onery
nery
ery
ry

Now you can take a query like "*foo*" and drop the leading wildcard
and it will hit on 'foonery'.

Make sense?  You are trading index size for not doing a linear scan
like grep.  It's not advisable to do this for every word in your
document set ;-)

- Neal Richter

On Wed, Jan 28, 2009 at 12:19 AM, Jana, Kumar Raja <kj...@ptc.com> wrote:
> Hi,
>
> Thanks Otis, Newton and everyone else for the help on this issue.
>
> Most of the data I index are documents like pdfs, word Docs, open office
> documents, etc. I store the content of the document in a field called
> content and the remaining metadata of the document like name, id,
> created by, modified by, created on, etc in a copy field called
> metadata. I am not particularly interested in enabling leading wildcard
> characters in the content (although such a possibility would be a
> bonus). For this, I've tried implementing the suggestion to store
> reverse strings as well as the correct strings for the metadata field.
> All leading wildcard queries like "*abc" and searched as "cba*" against
> the reversed metadata field. So far so good. Thank you :)
>
> But now, I ran into the scenario where the query string is *abc* :( and
> the whole thing came down crashing again. I cannot ignore such queries.
> I would rather take the risk of Solr OOMing by enabling the leading
> wildcard query searches.
>
> Can someone please tell me the steps to turn on this feature in Lucene
> QueryParser? I am sure it will be helpful to many to document such a
> procedure on the Wiki or somewhere else. (I am definitely going to do
> that once I fix this. Too much trouble this seems to be)
> Also, which queryParser does Solr use by default?
>
> Thanks,
> Kumar
>
>
>
>
> -----Original Message-----
> From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com]
> Sent: Thursday, January 15, 2009 10:18 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Customizing Solr to handle Leading Wildcard queries
>
> Hi ramuK,
>
> I believe you can turn that "on" via the Lucene QueryParser, but of
> course such searches will be slo(oo)w.  You can also index reversed
> tokens (e.g. *kumar --> rakum*) or you could index n-grams with
> begin/end delim characters (e.g. kumar -> ^ k u m a r $, *kumar -> "k u
> m a r $")
>
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
>
> ----- Original Message ----
>> From: "Jana, Kumar Raja" <kj...@ptc.com>
>> To: solr-user@lucene.apache.org
>> Sent: Thursday, January 15, 2009 9:49:24 AM
>> Subject: RE: Customizing Solr to handle Leading Wildcard queries
>>
>> Hi Erik,
>>
>> Thanks for the quick reply.
>> I want to enable leading wildcard query searches in general. The case
>> mentioned in the earlier mail is just one of the many instances I use
>> this feature.
>>
>> -Kumar
>>
>>
>>
>>
>> -----Original Message-----
>> From: Erik Hatcher [mailto:e...@ehatchersolutions.com]
>> Sent: Thursday, January 15, 2009 7:59 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Customizing Solr to handle Leading Wildcard queries
>>
>>
>> On Jan 15, 2009, at 8:23 AM, Jana, Kumar Raja wrote:
>> > Not being able to perform Leading Wildcard queries is a major
>> > handicap.
>> > I want to be able to perform searches like *.pdf to fetch all pdf
>> > documents from Solr.
>>
>> For this particular case, I recommend indexing the document type as a
>
>> separate field.  Something like type:pdf (or use a MIME type string).
>
>> Then you can do a very direct and fast query to search or facet by
>> document types.
>>
>>     Erik
>
>

Re: Customizing Solr to handle Leading Wildcard queries

Reply via email to