: Is there a way to prepare a document the described way with Lucene/Solr, : before I analyze it? : My use case is to categorize several documents in an automatic way, which : includes that I have to "create" data from the given input doing some : information retrieval.
As Ryan mentioned earlier: this is what the UpdateRequestProcessor API is for -- it allows you to modify Documents (regardless of how they were added: csv, xml, dih) prior to Solr processing them... http://old.nabble.com/Custom-Analyzer-Tokenizer-works-but-results-were-not-saved-to27026739.html Personally, i think you may be looking at your problem from the wrong dirrection... : >> Imagine you would analyze, index and store them like you normally do and : >> afterwards you want to set, whether the document belongs to the expensive : >> item-group or not. : >> If the price for the item is higher than 500$, it belongs to the : >> expensive : >> ones, otherwise not. ...for a situation like that, i wouldn't attempt to "classify" the docs as "expensive" or "cheap" when adding them. instead i would use numeric ranges for faceting and filtering to show me how many docs where "expensive" or "cheap" at query time -- that way when the ecomony tanks i can redifine my definition of "expensive" on the fly w/o needing to reindex a million documents. -Hoss