Hi,
I have checked the PDF Jira issue but there isn't solution into this
because some users experienced the same issue with different CMAP
entries. Could it possible to update the PDFBOX library in the SolR
installation?
Thanks,
Marcello
On 11/15/2013 06:27 PM, Furkan KAMACI wrote:
You should check the Apache PDFBox project. A similar question:
https://issues.apache.org/jira/browse/PDFBOX-940
2013/11/15 Marcello Lorenzi <mlore...@sorint.it>
Hi,
during you testing of Apache SOLR 4.3, we have noticed some errors
occurred for PDF indexing:
ERROR - 2013-11-15 15:14:26.248; org.apache.pdfbox.pdmodel.font.PDCIDFont;
Error: Could not parse predefined CMAP file for 'PDFXC30-Indentity0-UCS2'
ERROR - 2013-11-15 15:14:36.108; org.apache.pdfbox.pdmodel.font.PDCIDFont;
Error: Could not parse predefined CMAP file for '--UCS2'
and
ERROR - 2013-11-15 15:12:18.928; org.apache.pdfbox.filter.FlateFilter;
FlateFilter: stop reading corrupt stream due to a DataFormatException
Could these errors related to PDF files format?
Thanks,
Marcello