Hi all,

I'm wondering if there are any knobs or levers i can set in
solrconfig.xml that affect how pdfbox text extraction is performed by
the extraction handler. I would like to take advantage of pdfbox's
ability to normalize diacritics and ligatures [1], but that doesn't
seem to be the default behavior. Is there a way to enable this?

Thanks,
--jay

[1] 
http://pdfbox.apache.org/apidocs/index.html?org/apache/pdfbox/util/TextNormalize.html

Reply via email to