You could use nested entities in DIH.

So, if you store - for example - path to the PDF in the database, you
could do a nested entity with TikaEntityProcessor to load the content.
Just make sure the field names do not conflict.

Regards,
   Alex.

----
Sign up for my Solr resources newsletter at http://www.solr-start.com/


On 24 January 2015 at 18:11, Yusniel Hidalgo <yhdelg...@uci.cu> wrote:
> Dear Solr community,
>
> I am diving into Solr recently and I need help in the following usage 
> scenery. I am working on a project for extract and search bibliographic 
> metadata from PDF files. Firstly, my PDF files are processed to extract 
> bibliographic metadata such as title, authors, affiliations, keywords and 
> abstract. These metadata are stored in a relational database and then are 
> indexed in Solr via DIH, however, I need to index also the fulltext of PDF 
> and maintain the same ID between metadata indexed from DIH and fulltext of 
> PDF indexed in Solr index. How to do that? How to configure sorlconfig.xml 
> and schema.xml to do it?
>
> Thanks in advance.
>
> Best regards.
>
> Yusniel Hidalgo
>
>
> ---------------------------------------------------
> XII Aniversario de la creación de la Universidad de las Ciencias 
> Informáticas. 12 años de historia junto a Fidel. 12 de diciembre de 2014.
>

Reply via email to