RE: Solr 3.1 and ExtractingRequestHandler resulting in blank content

2010-07-28 Thread David Thibault
From: Lance Norskog [mailto:goks...@gmail.com] Sent: Tuesday, July 27, 2010 8:09 PM To: solr-user@lucene.apache.org Subject: Re: Solr 3.1 and ExtractingRequestHandler resulting in blank content There are two different datasets that Solr (Lucene really) saves from a document: raw storage and the in

Re: Solr 3.1 and ExtractingRequestHandler resulting in blank content

2010-07-27 Thread Lance Norskog
There are two different datasets that Solr (Lucene really) saves from a document: raw storage and the indexed terms. I don't think the ExtractingRequestHandler ever automatically stored the raw data; in fact Lucene works in Strings internally, not raw byte arrays (this is changing). It should be i

Solr 3.1 and ExtractingRequestHandler resulting in blank content

2010-07-26 Thread David Thibault
Hello all, I’m working on a project with Solr. I had 1.4.1 working OK using ExtractingRequestHandler except that it was crashing on some PDFs. I noticed that Tika bundled with 1.4.1 was 0.4, which was kind of old. I decided to try updating to 0.7 as per the directions here: http://wiki.apac