Hi,
I have checked the PDF Jira issue but there isn't solution into this because some users experienced the same issue with different CMAP entries. Could it possible to update the PDFBOX library in the SolR installation?

Thanks,
Marcello

On 11/15/2013 06:27 PM, Furkan KAMACI wrote:
You should check the Apache PDFBox project. A similar question:
https://issues.apache.org/jira/browse/PDFBOX-940


2013/11/15 Marcello Lorenzi <mlore...@sorint.it>

Hi,
during you testing of Apache SOLR 4.3, we have noticed some errors
occurred for PDF indexing:

ERROR - 2013-11-15 15:14:26.248; org.apache.pdfbox.pdmodel.font.PDCIDFont;
Error: Could not parse predefined CMAP file for 'PDFXC30-Indentity0-UCS2'
ERROR - 2013-11-15 15:14:36.108; org.apache.pdfbox.pdmodel.font.PDCIDFont;
Error: Could not parse predefined CMAP file for '--UCS2'

and

ERROR - 2013-11-15 15:12:18.928; org.apache.pdfbox.filter.FlateFilter;
FlateFilter: stop reading corrupt stream due to a DataFormatException

Could these errors related to PDF  files format?

Thanks,
Marcello


Reply via email to