Consider writing a SolrJ program that extracts the data from the
PDF file and combines it with the XML data. Here's an example
to get you started, it shows how to do the PDF extraction at least.
The other part of the code is a database connection, ignore that part.

You'll have to read in the XML, parse it, extract the relevant bits
and add them to the SolrInputDocument (see the example)

http://www.lucidimagination.com/blog/2012/02/14/indexing-with-solrj/

Best
Erick

On Mon, Mar 26, 2012 at 9:25 AM, Anupam Bhattacharya
<anupam...@gmail.com> wrote:
> I have a set/group of documents of XML and PDF type.
>
> Each XML document contains the bibliographic information and has a
> reference to the supporting PDF document.
> How can i index this Parent-Child doc types in SOLR schema as one doc. The
> PDF should be full text indexed for searching & only the corresponding
> Parent XML details should be shown if the PDF contains the searched
> keyword.
>
> How to design this kind of functionality in SOLR ?
>
> Appreciate any help on this.
>
> Regards
> Anupam

Reply via email to