On Sun, May 27, 2012 at 11:54:02PM -0400, Jack Krupansky wrote: > You can create your own "update processor" that gets control between the > output of Tika and the indexing of the document. > > See: > http://wiki.apache.org/solr/UpdateRequestProcessor
Seems to be exactly what I was looking for, thanks a lot ! I just started an (almost working) implementation but I've one notice: Let's get a field valueS: > Collection v = doc.getFieldValues( "author" ); ( in my `processAdd(AddUpdateCommand cmd)` ) and push a doc, say using: > `curl -F content=@my.pdf -F literal.author=a -F literal.author=b -F > literal.author="c d"` Then `log.warn("author: " + v + ":" + v.size());` throws: > WARN: author: [pdfauthor, a b c d] : 2 It's not (yet) a blocker in my personal case but I fear it's important enough to be noted: using a custom UpdateRequestProcessor, the access to individual literal fields seems (currently) very limited as they appear to be flattened. I'm quite sure there should already an hidden bug report about this somewhere. Other than that and unless I hit some other unexpected issue, this way to customize the request processor perfectly suits my needs. thanks !