Try setting the StreamType to application/pdf, that way Tika doesn't have to infer it. BTW the second argument to ExtractParameters is the unique key... a value of "*" probably doesn't make sense.
-- Mauricio On Wed, Dec 7, 2011 at 5:50 PM, Soumitra Banerjee < soumitrabaner...@gmail.com> wrote: > All - > > I am using SOLR 3.5, SOLRNet 0.4.0.2001, Tom Cat 7.0 and am running a job > to extract the text from pds, stored on my local hard disk. > > *Tomcat StdErr log Shows:* > > INFO: [core1] webapp=/Solr path=/update/extract params={extractOnly=true& > literal.id=*&resource.name > =C:\XXX\10310.pdf&extractFormat=text&version=2.2} > status=0 QTime=125 > Dec 7, 2011 12:29:36 PM org.apache.solr.update.processor.LogUpdateProcessor > finish > INFO: {} 0 141 > Dec 7, 2011 12:29:36 PM org.apache.solr.core.SolrCore execute > INFO: [core1] webapp=/Solr path=/update/extract params={extractOnly=true& > literal.id=*&resource.name=C:XXX\10311.pdf&extractFormat=text&version=2.2} > status=0 QTime=141 > Dec 7, 2011 12:29:36 PM org.apache.solr.update.processor.LogUpdateProcessor > finish > INFO: {} 0 125 > Dec 7, 2011 12:29:36 PM org.apache.solr.core.SolrCore execute > INFO: [core1] webapp=/Solr path=/update/extract params={extractOnly=true& > literal.id=*&resource.name > =C:\XXX\3M_US_EN_10313.pdf&extractFormat=text&version=2.2} > status=0 QTime=125 > > *Catalina Log Shows:* > ** > INFO: {} 0 281 > Dec 7, 2011 12:29:04 PM org.apache.solr.core.SolrCore execute > INFO: [core1] webapp=/Solr path=/update/extract params={extractOnly=true& > literal.id=*&resource.name > =C:\XXX\11511.pdf&extractFormat=text&version=2.2} > status=0 QTime=281 > Dec 7, 2011 12:29:05 PM org.apache.solr.update.processor.LogUpdateProcessor > finish > INFO: {} 0 391 > Dec 7, 2011 12:29:05 PM org.apache.solr.core.SolrCore execute > INFO: [core1] webapp=/Solr path=/update/extract params={extractOnly=true& > literal.id=*&resource.name > =C:XXX\_11513.pdf&extractFormat=text&version=2.2} > status=0 QTime=391 > Dec 7, 2011 12:29:05 PM org.apache.solr.update.processor.LogUpdateProcessor > finish > INFO: {} 0 328 > Dec 7, 2011 12:29:05 PM org.apache.solr.core.SolrCore execute > INFO: [core1] webapp=/Solr path=/update/extract params={extractOnly=true& > literal.id=*&resource.name > =C:\XXX\11514.pdf&extractFormat=text&version=2.2} > status=0 QTime=328 > > The average pdf file size is around 50 KB. My questions are as follows: > > 1. Can I improve performance by updating any configutaion file for - > SolrConfig, Tomcat, others? > 2. Since I am using : > > var response = solr.Extract(new ExtractParameters(pdffile, "*") > > > from SOLRNet 0.4.0.2001, which just came out (Beta), is this a known issue > to be fixed in upcomming versions? > > > Any help/pointers from the experts will be highly appreciated. Also let me > know if you would need additional information and will be more than happy > to provide that. > > Regards, Soumitra >