: > it also starts to get ito the realm of "arbitrary processing of values : > prior to storing/indexing ... which could be useful in other ways (ie: : > parsing alternate date formats) and for other field types (ie: limit : > numeric fiels to a certain range, round float input to an int, etc...) : > which is something i've been hoping to work on for a while now ... let any : > FieldType have an analyzer, and add/abuse a new <analyzer : > type="preprocess"> for modifying the values before they are stored or : > analyzed by the "index" analyzer. : > : : We can use UpdateRequestProcessor for arbitary processing before storing. I : believe that is one of the use-cases for that API.
Some of it be done using an UpdateRequestProcessor, assuming a Processor was created that exposed Analysis Factory like configuration to end users (w/o needing to write java) but that only happens when adding docs -- it wouldn't let you automaticly round dates submitted at query time to the granularity you know you are using -- it also wouldn't let you convert "yes" to "true" for a boolean field, etc... There's also a subtle but important distinction between the nature of the data in the index which should be expressed in the schema.xml via/fields and fieldtypes; and how the person who is responsible for this solr installation wants the data to be used, which should be expressed in the solrconfig.xml. you could imagine have two pieces of code that achieve very similar things (ie: rounding dates) -- one of which could be configured in the schema.xml as a fieldType attirbute, and one in the solrconfig.xml as an update processor option for index time, and/or a query component option. the schema.xml fieldType option would be a way to say "in this schema about books, fields of this type must never contain anything more granular then days." (or minutes, or hours, or what have you) and it doesn't matter who uses that index, or how they get to it, or whether they are updating the index or querying the index, or whether it's a master index or a slave indx: that field type is not a "date" field type, it is now "day" fieldtype. the solrconfig.xml options however would be a way to say "in this *instance* of an index using a schema about books, i don't want to deal with dates more granular then a day" ... this option might be different on between two differnet instances (ie: an instance containing data for publish on demand books vs. and instance containing data about medival books) or even between two copies of the same index (ie: on the master you can index any granularity, on slaveA internal users can query with any granularity, on slaveB the general public can only query with day granularity to improve caching) both approaches have their uses. -Hoss