Thank you for your reply.
I've been trying some things out this morning but I'm still not getting
it to work properly. I have a feeling that I'm on the right track
somewhat though.
The type in my schema.xml looks like this:
<fieldtype name="searchfield" class="solr.TextField">
<analyzer type="index">
<tokenizer class="solr.NGramTokenizerFactory" minGram="2"
maxGram="18"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldtype>
If I'm understanding everything correctly this should create tokens with
the size of 2 to 18 letters at the time of indexing, right?
However, I can't search properly now. I have to slice my search-string
up into 2-letter chunks. So if I'm searching for "monitor" I have to
send "mo+ni+to+r" to Solr. Like this:
http://localhost:8080/solrtest/select/?q=mo+ni+to+r&q.op=AND
when I want it to be like this:
http://localhost:8080/solrtest/select/?q=monitor&q.op=AND
I'm sure I'm doing something completely wrong. I just need some one more
wise to the ways of Lucene and Solr to point directly at what it is
that's wrong ;-)
//Daniel
Chris Hostetter wrote:
: so the only ones I can utilize are EdgeNGramTokenizerFactory and
: NGramTokenizerFactory.
:
: I've done some playing around with them but the best result I've gotten so far
: is a field-type that enables searching for specific letters, for example I can
: search for an item that contains the letters a and x, but it returns a hit no
: matter where these letters are in the text, they don't have to be next to each
: other, and that's not the result I was going for. If the field contains
: "monitor" I want a hit on a search for "onit" but not on "rint" for example.
NGramTokenizerFactory should work fine for this ... the key is to use it
at indexing time with the appropriate min and max gram sizes to meet your
needs -- at query time, don't use it at all (use keyword or
whitespace tokenizer)
so the word "monitor" will be indexed as these tokens (but not
neccessarily in this order)...
m o n i t o r mo on ni it to or mon oni nit ... onit ...
and at search time when the user gives you "onit" that term will exist.
: I've never attempted to construct a new field-type of my own before and I'm
: finding the available documentation somewhat incomplete and not very helpful
FWIW: creating a new FieldType is almost never what you need if you
are dealing with text .. creating new FieldTypes is something that
typically only needs done in cases where you want specialized encoding or
sorting.
-Hoss
--
Daniel Löfquist
Application Manager / Software Engineer
CDON.COM
Bergsgatan 20, Box 385, SE 201 23 Malmö, Sweden
Office: +46 40 601 61 00
Direct: +46 40 601 61 16
Mobile: +46 702 92 21 75
Fax: +46 40 601 61 20
E-mail: [EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>
CDON.COM <http://www.cdon.com/>
Confidentiality
Information contained in this e-mail is intended for the use of the
addressee only, and is confidential. Any dissemination, distribution,
copying or use of this communication without prior permission of
the addressee is strictly prohibited. If you are not the intended
addressee you must delete this e-mail and its attachments.