: Is there a way to prepare a document the described way with Lucene/Solr,
: before I analyze it?
: My use case is to categorize several documents in an automatic way, which
: includes that I have to "create" data from the given input doing some
: information retrieval.

As Ryan mentioned earlier: this is what the UpdateRequestProcessor API 
is for -- it allows you to modify Documents (regardless of how they were 
added: csv, xml, dih) prior to Solr processing them...

http://old.nabble.com/Custom-Analyzer-Tokenizer-works-but-results-were-not-saved-to27026739.html

Personally, i think you may be looking at your problem from the wrong 
dirrection...

: >> Imagine you would analyze, index and store them like you normally do and
: >> afterwards you want to set, whether the document belongs to the expensive
: >> item-group or not.
: >> If the price for the item is higher than 500$, it belongs to the
: >> expensive
: >> ones, otherwise not.

...for a situation like that, i wouldn't attempt to "classify" the docs as 
"expensive" or "cheap" when adding them.  instead i would use numeric 
ranges for faceting and filtering to show me how many docs where 
"expensive" or "cheap" at query time -- that way when the ecomony tanks i 
can redifine my definition of "expensive" on the fly w/o needing to 
reindex a million documents.



-Hoss

Reply via email to