Glad to hear it's working. The trick (as you've probably discovered) is to properly map the meta-data to Solr fields. The extracting request handler does this, but the real underlying issue is that there's no real standard. Word docs might have "last_editor", PDFs might have just "author". And on and on and on.
Anyway, sounds like you're on your way. The code snippet Shawn referenced dumps all the meta-data Tika finds so you can figure out what you need. Best, Erick On Thu, Mar 2, 2017 at 11:56 AM, Phil Scadden <p.scad...@gns.cri.nz> wrote: > Got it all working with Tika and SolrJ. (Got the correct artifacts). Much > faster now too which is good. Thanks very much for your help. > Notice: This email and any attachments are confidential and may not be used, > published or redistributed without the prior written consent of the Institute > of Geological and Nuclear Sciences Limited (GNS Science). If received in > error please destroy and immediately notify GNS Science. Do not copy or > disclose the contents.