tika/pdfbox knobs & levers

Jay Luker Wed, 13 Apr 2011 13:53:46 -0700

Hi all,

I'm wondering if there are any knobs or levers i can set in
solrconfig.xml that affect how pdfbox text extraction is performed by
the extraction handler. I would like to take advantage of pdfbox's
ability to normalize diacritics and ligatures [1], but that doesn't
seem to be the default behavior. Is there a way to enable this?


Thanks,
--jay

[1] 
http://pdfbox.apache.org/apidocs/index.html?org/apache/pdfbox/util/TextNormalize.html

tika/pdfbox knobs & levers

Reply via email to