Re: Tika trouble

2009-11-16 Thread Markus Jelsma - Buyways B.V.
Thank you for your reply. I had the assumption Tika could also extract text content from various documenttypes instead of only meta data. I'll use the CLI tools from http://www.foolabs.com/xpdf/ to extract text manually. - Markus Jelsma Buyways B.V. Technisch Architect

Re: Tika trouble

2009-11-16 Thread Antonio Calò
What I could try to say is that if you want to index a Pdf, then you should use a Pdf extractor. A Pdf Extractor is able to extract the text content and the metadata of the files. I suppose you have just opened and indexed the pdf as is. So you stored bynary data and stop. For my applciation I've u

Re: Tika trouble

2009-11-16 Thread Markus Jelsma - Buyways B.V.
Anyone has a clue? > List, > > > I somehow fail to index certain pdf files using the > ExtractingRequestHandler in Solr 1.4 with default solrconfig.xml but > modified schema. I have a very simple schema for this case using only > and ID field, a timestamp field and two dynamic fields; ignored_

Tika trouble

2009-11-12 Thread Markus Jelsma - Buyways B.V.
List, I somehow fail to index certain pdf files using the ExtractingRequestHandler in Solr 1.4 with default solrconfig.xml but modified schema. I have a very simple schema for this case using only and ID field, a timestamp field and two dynamic fields; ignored_* and attr_* both indexed, stored an