On 5/31/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
You say the "tokenized" attribute is not settable from the schema, but the output from IndexSchema.readConfig shows that the properties are indeed read, and the resulting SchemaField object retains these properties: are they then ignored?
Not sure off the top of my head, but don't use it... it's shouldn't be documented anywhere. It probably slipped through as part of generic options parsing.
> "untokenized" means don't use the analyzer. If you don't want an > analyzer, then use the "string" type. > This is true only in the simplest of cases. An analyzer can do far more than tokenize: it can stem, change to lower case, etc. What if you want one or more of these things to happen, but you don't want tokenization?
From a Lucene perspective, if you create an untokenized field, the
analyzer will not be used at all. It should have probably been named unanalyzed, as that's more accurate. KeywordTokenizer (via KeywordTokenizerFactory) is probably what you are looking for. Create a new text field type with that as the tokenizer, followed by whatever filters you want (like lowercasing). -Yonik