Thank you for your reply.
I've been trying some things out this morning but I'm still not getting it to work properly. I have a feeling that I'm on the right track somewhat though.

The type in my schema.xml looks like this:

<fieldtype name="searchfield" class="solr.TextField">
        <analyzer type="index">
                <tokenizer class="solr.NGramTokenizerFactory" minGram="2" 
maxGram="18"/>
                <filter class="solr.LowerCaseFilterFactory"/>
        </analyzer>

        <analyzer type="query">
                <tokenizer class="solr.WhitespaceTokenizerFactory"/>
                <filter class="solr.LowerCaseFilterFactory"/>
        </analyzer>
</fieldtype>

If I'm understanding everything correctly this should create tokens with the size of 2 to 18 letters at the time of indexing, right?

However, I can't search properly now. I have to slice my search-string up into 2-letter chunks. So if I'm searching for "monitor" I have to send "mo+ni+to+r" to Solr. Like this:
http://localhost:8080/solrtest/select/?q=mo+ni+to+r&q.op=AND
when I want it to be like this:
http://localhost:8080/solrtest/select/?q=monitor&q.op=AND

I'm sure I'm doing something completely wrong. I just need some one more wise to the ways of Lucene and Solr to point directly at what it is that's wrong ;-)

//Daniel

Chris Hostetter wrote:
: so the only ones I can utilize are EdgeNGramTokenizerFactory and
: NGramTokenizerFactory.
: : I've done some playing around with them but the best result I've gotten so far
: is a field-type that enables searching for specific letters, for example I can
: search for an item that contains the letters a and x, but it returns a hit no
: matter where these letters are in the text, they don't have to be next to each
: other, and that's not the result I was going for. If the field contains
: "monitor" I want a hit on a search for "onit" but not on "rint" for example.

NGramTokenizerFactory should work fine for this ... the key is to use it at indexing time with the appropriate min and max gram sizes to meet your needs -- at query time, don't use it at all (use keyword or whitespace tokenizer)

so the word "monitor" will be indexed as these tokens (but not neccessarily in this order)...

  m o n i t o r mo on ni it to or mon oni nit ... onit ...

and at search time when the user gives you "onit" that term will exist.

: I've never attempted to construct a new field-type of my own before and I'm
: finding the available documentation somewhat incomplete and not very helpful

FWIW: creating a new FieldType is almost never what you need if you are dealing with text .. creating new FieldTypes is something that typically only needs done in cases where you want specialized encoding or sorting.

-Hoss


--
Daniel Löfquist
Application Manager / Software Engineer

CDON.COM
Bergsgatan 20, Box 385, SE 201 23 Malmö, Sweden

Office: +46 40 601 61 00
Direct: +46 40 601 61 16
Mobile: +46 702 92 21 75
Fax: +46 40 601 61 20
E-mail: [EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>

CDON.COM <http://www.cdon.com/>

Confidentiality
Information contained in this e-mail is intended for the use of the
addressee only, and is confidential. Any dissemination, distribution,
copying or use of this communication without prior permission of
the addressee is strictly prohibited. If you are not the intended
addressee you must delete this e-mail and its attachments.

Reply via email to