Hi all, I'm wondering if there are any knobs or levers i can set in solrconfig.xml that affect how pdfbox text extraction is performed by the extraction handler. I would like to take advantage of pdfbox's ability to normalize diacritics and ligatures [1], but that doesn't seem to be the default behavior. Is there a way to enable this?
Thanks, --jay [1] http://pdfbox.apache.org/apidocs/index.html?org/apache/pdfbox/util/TextNormalize.html