Re: Customizing Solr to handle Leading Wildcard queries

Otis Gospodnetic Wed, 28 Jan 2009 13:08:56 -0800

Yeah, I think the begin/end chars are very helpful here.  But I like the 
suggestion of figuring out which words really need to support leading 
wildcards...although that's typically impossible to predict, since people are 
typically free to enter whatever queries they feel like.



Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



----- Original Message ----
> From: Neal Richter <[email protected]>
> To: [email protected]
> Sent: Wednesday, January 28, 2009 3:10:29 AM
> Subject: Re: Customizing Solr to handle Leading Wildcard queries
> 
> Oh wait.. looks like Otis' suggestion of "index n-grams with begin/end
> delim characters"  and relying on phrase-searching to link the chains
> of characters.. logically doing a better version of my previous email.
> 
> - Neal
> 
> On Wed, Jan 28, 2009 at 1:04 AM, Neal Richter wrote:
> > leading wildcard search is called grep ;-)
> >
> > Ditto on the indexing reversed words suggestion.
> >
> > Can you create a second field in solr that contains /only/ the words
> > from the fields you care to reverse?  Once you do that you could
> > pre-process the query and look for leading wildcards and address those
> > (after reversing the query) only against your special
> > reverse-meta-data field.
> >
> > The *foo* case really is grep! You nearly by definition have to
> > linearly scan the index unless some magic is added.
> >
> > Your options are to extend Otis' ngram suggestion and turn a word like
> > "baffoonery"
> > into:
> >
> > (stored in "meta field")
> > baffoonery
> > affoonery
> > ffoonery
> > foonery
> > oonery
> > onery
> > nery
> > ery
> > ry
> >
> > Now you can take a query like "*foo*" and drop the leading wildcard
> > and it will hit on 'foonery'.
> >
> > Make sense?  You are trading index size for not doing a linear scan
> > like grep.  It's not advisable to do this for every word in your
> > document set ;-)
> >
> > - Neal Richter
> >
> > On Wed, Jan 28, 2009 at 12:19 AM, Jana, Kumar Raja wrote:
> >> Hi,
> >>
> >> Thanks Otis, Newton and everyone else for the help on this issue.
> >>
> >> Most of the data I index are documents like pdfs, word Docs, open office
> >> documents, etc. I store the content of the document in a field called
> >> content and the remaining metadata of the document like name, id,
> >> created by, modified by, created on, etc in a copy field called
> >> metadata. I am not particularly interested in enabling leading wildcard
> >> characters in the content (although such a possibility would be a
> >> bonus). For this, I've tried implementing the suggestion to store
> >> reverse strings as well as the correct strings for the metadata field.
> >> All leading wildcard queries like "*abc" and searched as "cba*" against
> >> the reversed metadata field. So far so good. Thank you :)
> >>
> >> But now, I ran into the scenario where the query string is *abc* :( and
> >> the whole thing came down crashing again. I cannot ignore such queries.
> >> I would rather take the risk of Solr OOMing by enabling the leading
> >> wildcard query searches.
> >>
> >> Can someone please tell me the steps to turn on this feature in Lucene
> >> QueryParser? I am sure it will be helpful to many to document such a
> >> procedure on the Wiki or somewhere else. (I am definitely going to do
> >> that once I fix this. Too much trouble this seems to be)
> >> Also, which queryParser does Solr use by default?
> >>
> >> Thanks,
> >> Kumar
> >>
> >>
> >>
> >>
> >> -----Original Message-----
> >> From: Otis Gospodnetic [mailto:[email protected]]
> >> Sent: Thursday, January 15, 2009 10:18 PM
> >> To: [email protected]
> >> Subject: Re: Customizing Solr to handle Leading Wildcard queries
> >>
> >> Hi ramuK,
> >>
> >> I believe you can turn that "on" via the Lucene QueryParser, but of
> >> course such searches will be slo(oo)w.  You can also index reversed
> >> tokens (e.g. *kumar --> rakum*) or you could index n-grams with
> >> begin/end delim characters (e.g. kumar -> ^ k u m a r $, *kumar -> "k u
> >> m a r $")
> >>
> >>
> >> Otis
> >> --
> >> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> >>
> >>
> >>
> >> ----- Original Message ----
> >>> From: "Jana, Kumar Raja" 
> >>> To: [email protected]
> >>> Sent: Thursday, January 15, 2009 9:49:24 AM
> >>> Subject: RE: Customizing Solr to handle Leading Wildcard queries
> >>>
> >>> Hi Erik,
> >>>
> >>> Thanks for the quick reply.
> >>> I want to enable leading wildcard query searches in general. The case
> >>> mentioned in the earlier mail is just one of the many instances I use
> >>> this feature.
> >>>
> >>> -Kumar
> >>>
> >>>
> >>>
> >>>
> >>> -----Original Message-----
> >>> From: Erik Hatcher [mailto:[email protected]]
> >>> Sent: Thursday, January 15, 2009 7:59 PM
> >>> To: [email protected]
> >>> Subject: Re: Customizing Solr to handle Leading Wildcard queries
> >>>
> >>>
> >>> On Jan 15, 2009, at 8:23 AM, Jana, Kumar Raja wrote:
> >>> > Not being able to perform Leading Wildcard queries is a major
> >>> > handicap.
> >>> > I want to be able to perform searches like *.pdf to fetch all pdf
> >>> > documents from Solr.
> >>>
> >>> For this particular case, I recommend indexing the document type as a
> >>
> >>> separate field.  Something like type:pdf (or use a MIME type string).
> >>
> >>> Then you can do a very direct and fast query to search or facet by
> >>> document types.
> >>>
> >>>     Erik
> >>
> >>
> >

Re: Customizing Solr to handle Leading Wildcard queries

Reply via email to