I ended up not using an XML attribute for the payload since I need to return the payload in query response. So I ended up going with:
<field name="title">2.0|Solr In Action</field> My payload is numeric so I can pick a non-numeric delimiter (ie '|'). Putting the payload in front means I don't have to worry about the delimiter appearing in the value. The payload is required in my case so I can simply look for the first occurrence of the delimiter and ignore the possibility of the delimiter appearing in the value. I ended up writing a custom Tokenizer and a copy field with a PatternTokenizerFactory to filter out the delimiter and payload. That's is straight forward in terms of implementation. On top of that I can still use the CSV loader, which I really like because of its speed. Bill. On Thu, Aug 20, 2009 at 10:36 PM, Chris Hostetter <hossman_luc...@fucit.org>wrote: > > : of the field are correct but the delimiter and payload are stored so they > : appear in the response also. Here is an example: > ... > : I am thinking maybe I can do this instead when indexing: > : > : XML for indexing: > : <field name="title" payload="2.0">Solr In Action</field> > : > : This will simplify indexing as I don't have to repeat the payload for > each > > but now you're into a custom request handler for the updates to deal with > the custom XML attribute so you can't use DIH, or CSV loading. > > It seems like it might be simpler have two new (generic) UpdateProcessors: > one that can clone fieldA into fieldB, and one that can do regex mutations > on fieldB ... neither needs to know about payloads at all, but the first > can made a copy of "2.0|Solr In Action" and the second can strip off the > "2.0|" from the copy. > > then you can write a new NumericPayloadRegexTokenizer that takes in two > regex expressions -- one that knows how to extract the payload from a > piece of input, and one that specifies the tokenization. > > those three classes seem easier to implemnt, easier to maintain, and more > generally reusable then a custom xml request handler for your updates. > > > -Hoss > >