The DataImportHandler has a tool for doing PDF extraction. This allows you to create new fields, do multiple files, and supply lists of access to get the multiple files.
http://wiki.apache.org/solr/TikaEntityProcessor On Sun, Apr 18, 2010 at 9:52 AM, pk <pkal...@gmail.com> wrote: > > Hi, > I need to submit thousands of online PDF/html files to Solr. I can submit > one file using SolrJ (StreamingUpdateSolrServer and > ..solr.common.util.ContentStreamBase.URLStream), setting literal.id > parameter to the url. I can't do the same with a batch of multiple files, as > their 'id' should be unique (set to their urls). > > I couldn't get this to work. Is there a way to somehow get the 'id' field > set automatically to the url of the files posted to Solr (something like to > 'stream_name')? How to set this in solrconfig.xml or schema.xml? or any > other way? > > If their url can be put in some other field (like 'url' iitself) that will > also serve my purpose. > > Thanks for your help. > -- > View this message in context: > http://n3.nabble.com/Autofill-id-field-with-the-URL-of-files-posted-to-Solr-tp727985p727985.html > Sent from the Solr - User mailing list archive at Nabble.com. > -- Lance Norskog goks...@gmail.com