On 10 June 2013 18:53, Roland Everaert <reveatw...@gmail.com> wrote: > Sorry if it was not clear. > > What I would like is to know how to construct an XML/JSON request that > provide any necessary information (supposedly the full path on disk) to > solr to retrieve and index a pdf/ms word document. > > So, an XML request could look like this: > > <add> > <doc> > <field name="id">doc10</field> > <field name="name">BLAH</field> > <field name="path">/path/to/file.pdf</field> > </doc> > </add> [...]
You cannot directly do this with the ExtractingRequestHandler. One possibility is to use the DataImportHandler, with XPathEntityProcessor or FileListEntityProcessor to get the filename, and then use TikaEntityProcessor to actually process the file. Please see http://wiki.apache.org/solr/DataImportHandler and the various sections within it. Regards, Gora