Re: Customizing Solr to handle Leading Wildcard queries

Neal Richter Wed, 28 Jan 2009 00:11:02 -0800

Oh wait.. looks like Otis' suggestion of "index n-grams with begin/end
delim characters"  and relying on phrase-searching to link the chains
of characters.. logically doing a better version of my previous email.


- Neal

On Wed, Jan 28, 2009 at 1:04 AM, Neal Richter <nrich...@gmail.com> wrote:
> leading wildcard search is called grep ;-)
>
> Ditto on the indexing reversed words suggestion.
>
> Can you create a second field in solr that contains /only/ the words
> from the fields you care to reverse?  Once you do that you could
> pre-process the query and look for leading wildcards and address those
> (after reversing the query) only against your special
> reverse-meta-data field.
>
> The *foo* case really is grep! You nearly by definition have to
> linearly scan the index unless some magic is added.
>
> Your options are to extend Otis' ngram suggestion and turn a word like
> "baffoonery"
> into:
>
> (stored in "meta field")
> baffoonery
> affoonery
> ffoonery
> foonery
> oonery
> onery
> nery
> ery
> ry
>
> Now you can take a query like "*foo*" and drop the leading wildcard
> and it will hit on 'foonery'.
>
> Make sense?  You are trading index size for not doing a linear scan
> like grep.  It's not advisable to do this for every word in your
> document set ;-)
>
> - Neal Richter
>
> On Wed, Jan 28, 2009 at 12:19 AM, Jana, Kumar Raja <kj...@ptc.com> wrote:
>> Hi,
>>
>> Thanks Otis, Newton and everyone else for the help on this issue.
>>
>> Most of the data I index are documents like pdfs, word Docs, open office
>> documents, etc. I store the content of the document in a field called
>> content and the remaining metadata of the document like name, id,
>> created by, modified by, created on, etc in a copy field called
>> metadata. I am not particularly interested in enabling leading wildcard
>> characters in the content (although such a possibility would be a
>> bonus). For this, I've tried implementing the suggestion to store
>> reverse strings as well as the correct strings for the metadata field.
>> All leading wildcard queries like "*abc" and searched as "cba*" against
>> the reversed metadata field. So far so good. Thank you :)
>>
>> But now, I ran into the scenario where the query string is *abc* :( and
>> the whole thing came down crashing again. I cannot ignore such queries.
>> I would rather take the risk of Solr OOMing by enabling the leading
>> wildcard query searches.
>>
>> Can someone please tell me the steps to turn on this feature in Lucene
>> QueryParser? I am sure it will be helpful to many to document such a
>> procedure on the Wiki or somewhere else. (I am definitely going to do
>> that once I fix this. Too much trouble this seems to be)
>> Also, which queryParser does Solr use by default?
>>
>> Thanks,
>> Kumar
>>
>>
>>
>>
>> -----Original Message-----
>> From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com]
>> Sent: Thursday, January 15, 2009 10:18 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Customizing Solr to handle Leading Wildcard queries
>>
>> Hi ramuK,
>>
>> I believe you can turn that "on" via the Lucene QueryParser, but of
>> course such searches will be slo(oo)w.  You can also index reversed
>> tokens (e.g. *kumar --> rakum*) or you could index n-grams with
>> begin/end delim characters (e.g. kumar -> ^ k u m a r $, *kumar -> "k u
>> m a r $")
>>
>>
>> Otis
>> --
>> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>>
>>
>>
>> ----- Original Message ----
>>> From: "Jana, Kumar Raja" <kj...@ptc.com>
>>> To: solr-user@lucene.apache.org
>>> Sent: Thursday, January 15, 2009 9:49:24 AM
>>> Subject: RE: Customizing Solr to handle Leading Wildcard queries
>>>
>>> Hi Erik,
>>>
>>> Thanks for the quick reply.
>>> I want to enable leading wildcard query searches in general. The case
>>> mentioned in the earlier mail is just one of the many instances I use
>>> this feature.
>>>
>>> -Kumar
>>>
>>>
>>>
>>>
>>> -----Original Message-----
>>> From: Erik Hatcher [mailto:e...@ehatchersolutions.com]
>>> Sent: Thursday, January 15, 2009 7:59 PM
>>> To: solr-user@lucene.apache.org
>>> Subject: Re: Customizing Solr to handle Leading Wildcard queries
>>>
>>>
>>> On Jan 15, 2009, at 8:23 AM, Jana, Kumar Raja wrote:
>>> > Not being able to perform Leading Wildcard queries is a major
>>> > handicap.
>>> > I want to be able to perform searches like *.pdf to fetch all pdf
>>> > documents from Solr.
>>>
>>> For this particular case, I recommend indexing the document type as a
>>
>>> separate field.  Something like type:pdf (or use a MIME type string).
>>
>>> Then you can do a very direct and fast query to search or facet by
>>> document types.
>>>
>>>     Erik
>>
>>
>

Re: Customizing Solr to handle Leading Wildcard queries

Reply via email to