Re: Searching "inside of words"

Otis Gospodnetic Mon, 19 May 2008 08:39:38 -0700

You are doing the right thing.  If you are creating n-grams at index time, you 
have to match that at query time.  If the query is "monitor", you need to pass 
that through n-gram tokenizer, too.  n-grams of length 18 look a little 
weird....



Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch


----- Original Message ----
> From: Daniel Löfquist <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Monday, May 19, 2008 7:14:52 AM
> Subject: Re: Searching "inside of words"
> 
> Thank you for your reply.
> I've been trying some things out this morning but I'm still not getting 
> it to work properly. I have a feeling that I'm on the right track 
> somewhat though.
> 
> The type in my schema.xml looks like this:
> 
> 
>     
>         
>         
>     
> 
>     
>         
>         
>     
> 
> 
> If I'm understanding everything correctly this should create tokens with 
> the size of 2 to 18 letters at the time of indexing, right?
> 
> However, I can't search properly now. I have to slice my search-string 
> up into 2-letter chunks. So if I'm searching for "monitor" I have to 
> send "mo+ni+to+r" to Solr. Like this:
> http://localhost:8080/solrtest/select/?q=mo+ni+to+r&q.op=AND
> when I want it to be like this:
> http://localhost:8080/solrtest/select/?q=monitor&q.op=AND
> 
> I'm sure I'm doing something completely wrong. I just need some one more 
> wise to the ways of Lucene and Solr to point directly at what it is 
> that's wrong ;-)
> 
> //Daniel
> 
> Chris Hostetter wrote:
> > : so the only ones I can utilize are EdgeNGramTokenizerFactory and
> > : NGramTokenizerFactory.
> > : 
> > : I've done some playing around with them but the best result I've gotten 
> > so 
> far
> > : is a field-type that enables searching for specific letters, for example 
> > I 
> can
> > : search for an item that contains the letters a and x, but it returns a 
> > hit 
> no
> > : matter where these letters are in the text, they don't have to be next to 
> each
> > : other, and that's not the result I was going for. If the field contains
> > : "monitor" I want a hit on a search for "onit" but not on "rint" for 
> > example.
> > 
> > NGramTokenizerFactory should work fine for this ... the key is to use it 
> > at indexing time with the appropriate min and max gram sizes to meet your 
> > needs -- at query time, don't use it at all (use keyword or 
> > whitespace tokenizer)
> > 
> > so the word "monitor" will be indexed as these tokens (but not 
> > neccessarily in this order)...
> > 
> >   m o n i t o r mo on ni it to or mon oni nit ... onit ...
> > 
> > and at search time when the user gives you "onit" that term will exist.
> > 
> > : I've never attempted to construct a new field-type of my own before and 
> > I'm
> > : finding the available documentation somewhat incomplete and not very 
> > helpful
> > 
> > FWIW: creating a new FieldType is almost never what you need if you 
> > are dealing with text .. creating new FieldTypes is something that 
> > typically only needs done in cases where you want specialized encoding or 
> > sorting.
> > 
> > -Hoss
> > 
> 
> -- 
> Daniel Löfquist
> Application Manager / Software Engineer
> 
> CDON.COM
> Bergsgatan 20, Box 385, SE 201 23 Malmö, Sweden
> 
> Office: +46 40 601 61 00
> Direct: +46 40 601 61 16
> Mobile: +46 702 92 21 75
> Fax: +46 40 601 61 20
> E-mail: [EMAIL PROTECTED] 
> 
> CDON.COM 
> 
> Confidentiality
> Information contained in this e-mail is intended for the use of the
> addressee only, and is confidential. Any dissemination, distribution,
> copying or use of this communication without prior permission of
> the addressee is strictly prohibited. If you are not the intended
> addressee you must delete this e-mail and its attachments.

Re: Searching "inside of words"

Reply via email to