Hi,
I need to submit thousands of online PDF/html files to Solr. I can submit
one file using SolrJ (StreamingUpdateSolrServer and
..solr.common.util.ContentStreamBase.URLStream), setting literal.id
parameter to the url. I can't do the same with a batch of multiple files, as
their 'id' should be un
Hi,
while posting a sample pdf (that comes with Solr dist'n) to solr, i'm
getting a TikaException.
Using Solr-1.4, SolrJ (StreamingUpdateSolrServer) for posting pdf to solr.
Other sample pdfs can be parsed and indexed successfully.. I;m getting same
error with some other pdfs also (but adobe read
Lance,
I can submit and extract pdf contents using Solr and SolrJ, as i indicated
earlier.
I've made 'id' a mandatory field and i had to submit its value while
submitting (request.addParams("literal.id",url))..
If i put multiple files/streams in the request, then i can't put 'id' this
way as the
Can somebody suggest something similar or this is not possible to autofill
'id' using configuration only?
--
View this message in context:
http://n3.nabble.com/Autofill-id-field-with-the-URL-of-files-posted-to-Solr-tp727985p739606.html
Sent from the Solr - User mailing list archive at Nabble.com
Mark,
did you managed to get it work?
I did try latest Tika (0.7) command line and successfully parsed earlier
problematic pdf. Then i replaced Tika related jars in Solr-1.4
contrib/extraction/lib folder with new ones. Now it doesn;t throw any
exception, but no content extraction, only metadata!
Peter,
It seems that your solution (SOLR-1872) requires authentication too (and be
tracked via ur uuid), but my users will be general public using browsers,
and i can't force any such auth restrictions. Also you didn't mention if you
are already persisting the audit data.. Or i may need to extend