On Sun, May 27, 2012 at 11:54:02PM -0400, Jack Krupansky wrote:
> You can create your own "update processor" that gets control between the 
> output of Tika and the indexing of the document.
> 
> See:
> http://wiki.apache.org/solr/UpdateRequestProcessor

Seems to be exactly what I was looking for, thanks a lot !

I just started an (almost working) implementation but I've one notice:

Let's get a field valueS:
> Collection v = doc.getFieldValues( "author" );
( in my `processAdd(AddUpdateCommand cmd)` )

and push a doc, say using:
> `curl -F content=@my.pdf -F literal.author=a -F literal.author=b -F 
> literal.author="c d"`

Then `log.warn("author: " + v + ":" + v.size());` throws:
> WARN: author: [pdfauthor, a b c d] : 2

It's not (yet) a blocker in my personal case but I fear it's important
enough to be noted: using a custom UpdateRequestProcessor, the access to
individual literal fields seems (currently) very limited as they appear
to be flattened. I'm quite sure there should already an hidden bug report
about this somewhere.


Other than that and unless I hit some other unexpected issue, this way
to customize the request processor perfectly suits my needs.


thanks !

Reply via email to