On 5/31/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
You say the "tokenized" attribute is not settable from the schema, but the
output from IndexSchema.readConfig shows that the properties are indeed
read, and the resulting SchemaField object retains these properties: are
they then ignored?

Not sure off the top of my head, but don't use it... it's shouldn't be
documented anywhere.
It probably slipped through as part of generic options parsing.

> "untokenized" means don't use the analyzer.   If you don't want an
> analyzer, then use the "string" type.
>
This is true only in the simplest of cases. An analyzer can do far more
than tokenize: it can stem, change to lower case, etc. What if you want
one or more of these things to happen, but you don't want tokenization?

From a Lucene perspective, if you create an untokenized field, the
analyzer will not be used at all.  It should have probably been named
unanalyzed, as that's more accurate.

KeywordTokenizer (via KeywordTokenizerFactory) is probably what you
are looking for.
Create a new text field type with that as the tokenizer, followed by
whatever filters you want (like lowercasing).

-Yonik

Reply via email to