On Sep 6, 2007, at 2:56 PM, Matthew Runo wrote:
On a related note, it'd be great if we could set up a series of transformations to be done on data when it comes into the index, before being indexed. I guess a custom tokenizer might be the best way to do this though..?

ie:

-Post
-Data is cleaned up, properly escaped, etc
-Then data is passed to whatever tokenizer we want to use.

Solr should do more work on the data indexing side, to allow clients to more easily hand documents to it and modify them. XML isn't necessarily the prettiest way, and we see other formats being supported with the CSV and rich document indexing.

A custom tokenizer or token filter make great sense in the single field sense of data transformation, but parsing some request data into multiple fields must be done at a higher level.

        Erik

Reply via email to