I am researching Solr and seeing if it would be a good fit for a document search service I am helping to develop. One of the requirements is that we will need to be able to customize how file contents are parsed beyond the default configurations that are offered out of the box by Tika. For example, we know that we will be indexing .pdf files that will contain a cover page with a project start date, and would like to pull this date out into a searchable field that is separate from the file content. I have seen several sources saying you can do this by overriding the ExtractingRequestHandler.createFactory() method, but I have not been able to find much documentation on how to implement a new parser. Can someone point me in the right direction on where to look, or let me know if the scenario I described above is even possible?
-- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Tika-Override-tp4053552.html Sent from the Solr - User mailing list archive at Nabble.com.