Re: Adding pdf/word file using JSON/XML

Gora Mohanty Mon, 10 Jun 2013 06:57:47 -0700

On 10 June 2013 18:53, Roland Everaert <reveatw...@gmail.com> wrote:
> Sorry if it was not clear.
>
> What I would like is to know how to construct an XML/JSON request that
> provide any necessary information (supposedly the full path on disk) to
> solr to retrieve and index a pdf/ms word document.
>
> So, an XML request could look like this:
>
> <add>
> <doc>
> <field name="id">doc10</field>
> <field name="name">BLAH</field>
> <field name="path">/path/to/file.pdf</field>
> </doc>
> </add>
[...]


You cannot directly do this with the ExtractingRequestHandler.
One possibility is to use the DataImportHandler, with
XPathEntityProcessor or FileListEntityProcessor to get the filename,
and then use TikaEntityProcessor to actually process the file.
Please see http://wiki.apache.org/solr/DataImportHandler and
the various sections within it.

Regards,
Gora

Re: Adding pdf/word file using JSON/XML

Reply via email to