Thanks Alex, indeed, the relative path to PDF document is stored in the 
database. I will try to use your approach.

Regards,

Yusniel Hidalgo

----- Mensaje original -----
De: "Alexandre Rafalovitch" <arafa...@gmail.com>
Para: "solr-user" <solr-user@lucene.apache.org>
Enviados: Sábado, 24 de Enero 2015 18:19:48
Asunto: Re: How to index data from multiple data source

You could use nested entities in DIH.

So, if you store - for example - path to the PDF in the database, you
could do a nested entity with TikaEntityProcessor to load the content.
Just make sure the field names do not conflict.

Regards,
   Alex.

----
Sign up for my Solr resources newsletter at http://www.solr-start.com/


On 24 January 2015 at 18:11, Yusniel Hidalgo <yhdelg...@uci.cu> wrote:
> Dear Solr community,
>
> I am diving into Solr recently and I need help in the following usage 
> scenery. I am working on a project for extract and search bibliographic 
> metadata from PDF files. Firstly, my PDF files are processed to extract 
> bibliographic metadata such as title, authors, affiliations, keywords and 
> abstract. These metadata are stored in a relational database and then are 
> indexed in Solr via DIH, however, I need to index also the fulltext of PDF 
> and maintain the same ID between metadata indexed from DIH and fulltext of 
> PDF indexed in Solr index. How to do that? How to configure sorlconfig.xml 
> and schema.xml to do it?
>
> Thanks in advance.
>
> Best regards.
>
> Yusniel Hidalgo
>
>
> ---------------------------------------------------
> XII Aniversario de la creación de la Universidad de las Ciencias 
> Informáticas. 12 años de historia junto a Fidel. 12 de diciembre de 2014.
>


---------------------------------------------------
XII Aniversario de la creación de la Universidad de las Ciencias Informáticas. 
12 años de historia junto a Fidel. 12 de diciembre de 2014.

Reply via email to