I am researching Solr and seeing if it would be a good fit for a document
search service I am helping to develop.  One of the requirements is that we
will need to be able to customize how file contents are parsed beyond the
default configurations that are offered out of the box by Tika.  For
example, we know that we will be indexing .pdf files that will contain a
cover page with a project start date, and would like to pull this date out
into a searchable field that is separate from the file content.  I have seen
several sources saying you can do this by overriding the
ExtractingRequestHandler.createFactory() method, but I have not been able to
find much documentation on how to implement a new parser.  Can someone point
me in the right direction on where to look, or let me know if the scenario I
described above is even possible?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Tika-Override-tp4053552.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to