On 1/21/07, Ryan McKinley <[EMAIL PROTECTED]> wrote:
> >
> > I want something that is equivalent to splitting the string on the
> > client side and filling multiple *fields* not just tokens.
>
> Oh, I was talking about indexing only.
>
aaah.
> Why is it that multiple fields are needed? Multiple tokens are
> indistinguishable from multiple fields during search.
>
When the app displays search results, it shows a list of subjects.
(from the returned doc list). That should be split properly.
(Ideally without knowledge of the schema)
> Actually splitting things into different fields normally happens in
> the client (outside Solr), or in a specialized handler (like CSV, SQL,
> etc).
>
In the case I'm looking at, it would be cleaner and more safe to have
it on the server side...
Safer? It precludes adding a subject with a ';' in it...
Solr currently assumes your data is structured. Lucene does too... an
analyzer in lucene can't create more fields or take info from one
field and add it to another.
An aside: your need sounds like it's part of that much bigger issue of
processing documents and splitting them up into multiple fields, or at
least processing certain fields in a way that can add other fields.
I'm not sure what a general solution would look like in that case.
For example, you might have a field called "mail-headers", and want
that split up into multiple fields.
Another longer term thing to keep our eye on is UIMA (added to the
Apache incubator not that long ago).
-Yonik