On Fri, Jul 10, 2009 at 5:56 PM, Michael _ <solrco...@gmail.com> wrote:
> Hello, > I've got a stored, indexed field that contains some actual text, and some > metainfo, like this: > > one two three four [METAINFO] oneprime twoprime threeprime fourprime > > I have written a Tokenizer that skips past the [METAINFO] marker and uses > the last four words as the tokens for the field, mapping to the first four > words. E.g. "twoprime" is the second token, with startposition=4 and > endposition=8. > > When someone searches for "twoprime", therefore, they get back a > highlighted > result like > > one <em>two</em> three ... > > This is great and serves my needs, but I hate that I'm storing all that > METAINFO uselessly (there's actually a good deal more than in this > simplified example). After I've used it to make my tokens, I'd really like > to convert the stored field to just > > one two three four > > and store that. > > I thought about using an UpdateRequestProcessor to do this, but that > happens > *before* the Analyzers run, so if I strip the [METAINFO] there I can't use > it to build my tokens. I also thought about sending the data in in two > fields, like > > f1: one two three four > f1_meta: oneprime twoprime threeprime fourprime > > but I can't figure out a way for f1's analyzer to grab the stream from > f1_meta. > > Is there some clever way that I'm missing to build my token stream outside > of Solr, and store just the original text and index my token stream? > Can't you have two fields like this? f1 (indexed, not stored) -> one two three four [METAINFO] oneprime twoprime threeprime fourprime f2 (not indexed, stored) -> one two three four -- Regards, Shalin Shekhar Mangar.