Oh wait.. looks like Otis' suggestion of "index n-grams with begin/end delim characters" and relying on phrase-searching to link the chains of characters.. logically doing a better version of my previous email.
- Neal On Wed, Jan 28, 2009 at 1:04 AM, Neal Richter <nrich...@gmail.com> wrote: > leading wildcard search is called grep ;-) > > Ditto on the indexing reversed words suggestion. > > Can you create a second field in solr that contains /only/ the words > from the fields you care to reverse? Once you do that you could > pre-process the query and look for leading wildcards and address those > (after reversing the query) only against your special > reverse-meta-data field. > > The *foo* case really is grep! You nearly by definition have to > linearly scan the index unless some magic is added. > > Your options are to extend Otis' ngram suggestion and turn a word like > "baffoonery" > into: > > (stored in "meta field") > baffoonery > affoonery > ffoonery > foonery > oonery > onery > nery > ery > ry > > Now you can take a query like "*foo*" and drop the leading wildcard > and it will hit on 'foonery'. > > Make sense? You are trading index size for not doing a linear scan > like grep. It's not advisable to do this for every word in your > document set ;-) > > - Neal Richter > > On Wed, Jan 28, 2009 at 12:19 AM, Jana, Kumar Raja <kj...@ptc.com> wrote: >> Hi, >> >> Thanks Otis, Newton and everyone else for the help on this issue. >> >> Most of the data I index are documents like pdfs, word Docs, open office >> documents, etc. I store the content of the document in a field called >> content and the remaining metadata of the document like name, id, >> created by, modified by, created on, etc in a copy field called >> metadata. I am not particularly interested in enabling leading wildcard >> characters in the content (although such a possibility would be a >> bonus). For this, I've tried implementing the suggestion to store >> reverse strings as well as the correct strings for the metadata field. >> All leading wildcard queries like "*abc" and searched as "cba*" against >> the reversed metadata field. So far so good. Thank you :) >> >> But now, I ran into the scenario where the query string is *abc* :( and >> the whole thing came down crashing again. I cannot ignore such queries. >> I would rather take the risk of Solr OOMing by enabling the leading >> wildcard query searches. >> >> Can someone please tell me the steps to turn on this feature in Lucene >> QueryParser? I am sure it will be helpful to many to document such a >> procedure on the Wiki or somewhere else. (I am definitely going to do >> that once I fix this. Too much trouble this seems to be) >> Also, which queryParser does Solr use by default? >> >> Thanks, >> Kumar >> >> >> >> >> -----Original Message----- >> From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] >> Sent: Thursday, January 15, 2009 10:18 PM >> To: solr-user@lucene.apache.org >> Subject: Re: Customizing Solr to handle Leading Wildcard queries >> >> Hi ramuK, >> >> I believe you can turn that "on" via the Lucene QueryParser, but of >> course such searches will be slo(oo)w. You can also index reversed >> tokens (e.g. *kumar --> rakum*) or you could index n-grams with >> begin/end delim characters (e.g. kumar -> ^ k u m a r $, *kumar -> "k u >> m a r $") >> >> >> Otis >> -- >> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch >> >> >> >> ----- Original Message ---- >>> From: "Jana, Kumar Raja" <kj...@ptc.com> >>> To: solr-user@lucene.apache.org >>> Sent: Thursday, January 15, 2009 9:49:24 AM >>> Subject: RE: Customizing Solr to handle Leading Wildcard queries >>> >>> Hi Erik, >>> >>> Thanks for the quick reply. >>> I want to enable leading wildcard query searches in general. The case >>> mentioned in the earlier mail is just one of the many instances I use >>> this feature. >>> >>> -Kumar >>> >>> >>> >>> >>> -----Original Message----- >>> From: Erik Hatcher [mailto:e...@ehatchersolutions.com] >>> Sent: Thursday, January 15, 2009 7:59 PM >>> To: solr-user@lucene.apache.org >>> Subject: Re: Customizing Solr to handle Leading Wildcard queries >>> >>> >>> On Jan 15, 2009, at 8:23 AM, Jana, Kumar Raja wrote: >>> > Not being able to perform Leading Wildcard queries is a major >>> > handicap. >>> > I want to be able to perform searches like *.pdf to fetch all pdf >>> > documents from Solr. >>> >>> For this particular case, I recommend indexing the document type as a >> >>> separate field. Something like type:pdf (or use a MIME type string). >> >>> Then you can do a very direct and fast query to search or facet by >>> document types. >>> >>> Erik >> >> >