Re: Searching "inside of words"

Chris Hostetter Sat, 17 May 2008 10:09:08 -0700

: so the only ones I can utilize are EdgeNGramTokenizerFactory and
: NGramTokenizerFactory.
: 
: I've done some playing around with them but the best result I've gotten so far
: is a field-type that enables searching for specific letters, for example I can
: search for an item that contains the letters a and x, but it returns a hit no
: matter where these letters are in the text, they don't have to be next to each
: other, and that's not the result I was going for. If the field contains
: "monitor" I want a hit on a search for "onit" but not on "rint" for example.


NGramTokenizerFactory should work fine for this ... the key is to use it 
at indexing time with the appropriate min and max gram sizes to meet your 
needs -- at query time, don't use it at all (use keyword or 
whitespace tokenizer)

so the word "monitor" will be indexed as these tokens (but not 
neccessarily in this order)...

  m o n i t o r mo on ni it to or mon oni nit ... onit ...

and at search time when the user gives you "onit" that term will exist.

: I've never attempted to construct a new field-type of my own before and I'm
: finding the available documentation somewhat incomplete and not very helpful

FWIW: creating a new FieldType is almost never what you need if you 
are dealing with text .. creating new FieldTypes is something that 
typically only needs done in cases where you want specialized encoding or 
sorting.

-Hoss

Re: Searching "inside of words"

Reply via email to