Thank you for your reply.
I had the assumption Tika could also extract text content from various
documenttypes instead of only meta data. I'll use the CLI tools from
http://www.foolabs.com/xpdf/ to extract text manually.
-
Markus Jelsma Buyways B.V.
Technisch Architect
What I could try to say is that if you want to index a Pdf, then you should
use a Pdf extractor. A Pdf Extractor is able to extract the text content and
the metadata of the files. I suppose you have just opened and indexed the
pdf as is. So you stored bynary data and stop. For my applciation I've u
Anyone has a clue?
> List,
>
>
> I somehow fail to index certain pdf files using the
> ExtractingRequestHandler in Solr 1.4 with default solrconfig.xml but
> modified schema. I have a very simple schema for this case using only
> and ID field, a timestamp field and two dynamic fields; ignored_
List,
I somehow fail to index certain pdf files using the
ExtractingRequestHandler in Solr 1.4 with default solrconfig.xml but
modified schema. I have a very simple schema for this case using only
and ID field, a timestamp field and two dynamic fields; ignored_* and
attr_* both indexed, stored an