Hello,
I've got a stored, indexed field that contains some actual text, and some
metainfo, like this:

   one two three four [METAINFO] oneprime twoprime threeprime fourprime

I have written a Tokenizer that skips past the [METAINFO] marker and uses
the last four words as the tokens for the field, mapping to the first four
words.  E.g. "twoprime" is the second token, with startposition=4 and
endposition=8.

When someone searches for "twoprime", therefore, they get back a highlighted
result like

   one <em>two</em> three ...

This is great and serves my needs, but I hate that I'm storing all that
METAINFO uselessly (there's actually a good deal more than in this
simplified example).  After I've used it to make my tokens, I'd really like
to convert the stored field to just

   one two three four

and store that.

I thought about using an UpdateRequestProcessor to do this, but that happens
*before* the Analyzers run, so if I strip the [METAINFO] there I can't use
it to build my tokens.  I also thought about sending the data in in two
fields, like

   f1: one two three four
   f1_meta: oneprime twoprime threeprime fourprime

but I can't figure out a way for f1's analyzer to grab the stream from
f1_meta.

Is there some clever way that I'm missing to build my token stream outside
of Solr, and store just the original text and index my token stream?

Thanks in advance!

Reply via email to