On Sep 6, 2007, at 2:56 PM, Matthew Runo wrote:
On a related note, it'd be great if we could set up a series of
transformations to be done on data when it comes into the index,
before being indexed. I guess a custom tokenizer might be the best
way to do this though..?
ie:
-Post
-Data is cleaned up, properly escaped, etc
-Then data is passed to whatever tokenizer we want to use.
Solr should do more work on the data indexing side, to allow clients
to more easily hand documents to it and modify them. XML isn't
necessarily the prettiest way, and we see other formats being
supported with the CSV and rich document indexing.
A custom tokenizer or token filter make great sense in the single
field sense of data transformation, but parsing some request data
into multiple fields must be done at a higher level.
Erik