The DataImportHandler has a tool for doing PDF extraction. This allows
you to create new fields, do multiple files, and supply lists of
access to get the multiple files.

http://wiki.apache.org/solr/TikaEntityProcessor

On Sun, Apr 18, 2010 at 9:52 AM, pk <pkal...@gmail.com> wrote:
>
> Hi,
> I need to submit thousands of online PDF/html files to Solr. I can submit
> one file using SolrJ (StreamingUpdateSolrServer and
> ..solr.common.util.ContentStreamBase.URLStream), setting literal.id
> parameter to the url. I can't do the same with a batch of multiple files, as
> their 'id' should be unique (set to their urls).
>
> I couldn't get this to work. Is there a way to somehow get the 'id' field
> set automatically to the url of the files posted to Solr (something like to
> 'stream_name')? How to set this in solrconfig.xml or schema.xml?  or any
> other way?
>
> If their url can be put in some other field (like 'url' iitself) that will
> also serve my purpose.
>
> Thanks for your help.
> --
> View this message in context: 
> http://n3.nabble.com/Autofill-id-field-with-the-URL-of-files-posted-to-Solr-tp727985p727985.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Lance Norskog
goks...@gmail.com

Reply via email to