You are doing the right thing. If you are creating n-grams at index time, you have to match that at query time. If the query is "monitor", you need to pass that through n-gram tokenizer, too. n-grams of length 18 look a little weird....
Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch ----- Original Message ---- > From: Daniel Löfquist <[EMAIL PROTECTED]> > To: solr-user@lucene.apache.org > Sent: Monday, May 19, 2008 7:14:52 AM > Subject: Re: Searching "inside of words" > > Thank you for your reply. > I've been trying some things out this morning but I'm still not getting > it to work properly. I have a feeling that I'm on the right track > somewhat though. > > The type in my schema.xml looks like this: > > > > > > > > > > > > > > If I'm understanding everything correctly this should create tokens with > the size of 2 to 18 letters at the time of indexing, right? > > However, I can't search properly now. I have to slice my search-string > up into 2-letter chunks. So if I'm searching for "monitor" I have to > send "mo+ni+to+r" to Solr. Like this: > http://localhost:8080/solrtest/select/?q=mo+ni+to+r&q.op=AND > when I want it to be like this: > http://localhost:8080/solrtest/select/?q=monitor&q.op=AND > > I'm sure I'm doing something completely wrong. I just need some one more > wise to the ways of Lucene and Solr to point directly at what it is > that's wrong ;-) > > //Daniel > > Chris Hostetter wrote: > > : so the only ones I can utilize are EdgeNGramTokenizerFactory and > > : NGramTokenizerFactory. > > : > > : I've done some playing around with them but the best result I've gotten > > so > far > > : is a field-type that enables searching for specific letters, for example > > I > can > > : search for an item that contains the letters a and x, but it returns a > > hit > no > > : matter where these letters are in the text, they don't have to be next to > each > > : other, and that's not the result I was going for. If the field contains > > : "monitor" I want a hit on a search for "onit" but not on "rint" for > > example. > > > > NGramTokenizerFactory should work fine for this ... the key is to use it > > at indexing time with the appropriate min and max gram sizes to meet your > > needs -- at query time, don't use it at all (use keyword or > > whitespace tokenizer) > > > > so the word "monitor" will be indexed as these tokens (but not > > neccessarily in this order)... > > > > m o n i t o r mo on ni it to or mon oni nit ... onit ... > > > > and at search time when the user gives you "onit" that term will exist. > > > > : I've never attempted to construct a new field-type of my own before and > > I'm > > : finding the available documentation somewhat incomplete and not very > > helpful > > > > FWIW: creating a new FieldType is almost never what you need if you > > are dealing with text .. creating new FieldTypes is something that > > typically only needs done in cases where you want specialized encoding or > > sorting. > > > > -Hoss > > > > -- > Daniel Löfquist > Application Manager / Software Engineer > > CDON.COM > Bergsgatan 20, Box 385, SE 201 23 Malmö, Sweden > > Office: +46 40 601 61 00 > Direct: +46 40 601 61 16 > Mobile: +46 702 92 21 75 > Fax: +46 40 601 61 20 > E-mail: [EMAIL PROTECTED] > > CDON.COM > > Confidentiality > Information contained in this e-mail is intended for the use of the > addressee only, and is confidential. Any dissemination, distribution, > copying or use of this communication without prior permission of > the addressee is strictly prohibited. If you are not the intended > addressee you must delete this e-mail and its attachments.