: > it also starts to get ito the realm of "arbitrary processing of values
: > prior to storing/indexing ... which could be useful in other ways (ie:
: > parsing alternate date formats) and for other field types (ie: limit
: > numeric fiels to a certain range, round float input to an int, etc...)
: > which is something i've been hoping to work on for a while now ... let any
: > FieldType have an analyzer, and add/abuse a new <analyzer
: > type="preprocess"> for modifying the values before they are stored or
: > analyzed by the "index" analyzer.
: >
: 
: We can use UpdateRequestProcessor for arbitary processing before storing. I
: believe that is one of the use-cases for that API.

Some of it be done using an UpdateRequestProcessor, assuming a Processor 
was created that exposed Analysis Factory like configuration to end users 
(w/o needing to write java) but that only happens when adding docs -- it 
wouldn't let you automaticly round dates submitted at query time to the 
granularity you know you are using -- it also wouldn't let you convert 
"yes" to "true" for a boolean field, etc...

There's also a subtle but important distinction between the nature of 
the data in the index which should be expressed in the schema.xml 
via/fields and fieldtypes; and how the person who is responsible for 
this solr installation wants the data to be used, which should be 
expressed in the solrconfig.xml.   

you could imagine have two pieces of code that achieve very similar 
things (ie: rounding dates) -- one of which could be configured in the 
schema.xml as a fieldType attirbute, and one in the solrconfig.xml as an 
update processor option for index time, and/or a query component option.

the schema.xml fieldType option would be a way to say "in this schema 
about books, fields of this type must never contain anything more granular 
then days." (or minutes, or hours, or what have you) and it doesn't matter 
who uses that index, or how they get to it, or whether they are updating 
the index or querying the index, or whether it's a master index or a slave 
indx: that field type is not a "date" field type, it is now "day" 
fieldtype.

the solrconfig.xml options however would be a way to say "in this 
*instance* of an index using a schema about books, i don't want to deal 
with dates more granular then a day" ... this option might be different on 
between two differnet instances (ie: an instance containing data 
for publish on demand books vs. and instance containing data about 
medival books) or even between two copies of the same index (ie: on the 
master you can index any granularity, on slaveA internal users can query 
with any granularity, on slaveB the general public can only query with day 
granularity to improve caching)

both approaches have their uses.

-Hoss

Reply via email to