Has anybody had any experience bypassing ExtractingRequestHandler and simply managing Tika manually? I want to make a small modification to Tika to get and save additional data from my PDFs, but I have been procrastinating in no small part due to the unpleasant prospect of setting up a development environment where I could compile and debug modifications that might run through PDFBox, Tika, and ExtractingRequestHandler. It occurs to me that it would be much easier if the two were separate, so I could have direct control over Tika and just submit the text to Solr after extraction. Am I going to regret this approach? I'm not sure what ExtractingRequestHandler really does for me that Tika doesn't already do.
Also, I was reading this <http://stackoverflow.com/questions/33292776/solr-tika-processor-not-crawling-my-pdf-files-prefectly> stackoverflow entry and someone offhandedly mentioned that ExtractingRequestHandler might be separated in the future anyway. Is there a public roadmap for the project, or does one have to keep up with the developer's mailing list and hunt through JIRA entries to keep up with the pulse of the project? Thanks, Justin