You could use nested entities in DIH. So, if you store - for example - path to the PDF in the database, you could do a nested entity with TikaEntityProcessor to load the content. Just make sure the field names do not conflict.
Regards, Alex. ---- Sign up for my Solr resources newsletter at http://www.solr-start.com/ On 24 January 2015 at 18:11, Yusniel Hidalgo <yhdelg...@uci.cu> wrote: > Dear Solr community, > > I am diving into Solr recently and I need help in the following usage > scenery. I am working on a project for extract and search bibliographic > metadata from PDF files. Firstly, my PDF files are processed to extract > bibliographic metadata such as title, authors, affiliations, keywords and > abstract. These metadata are stored in a relational database and then are > indexed in Solr via DIH, however, I need to index also the fulltext of PDF > and maintain the same ID between metadata indexed from DIH and fulltext of > PDF indexed in Solr index. How to do that? How to configure sorlconfig.xml > and schema.xml to do it? > > Thanks in advance. > > Best regards. > > Yusniel Hidalgo > > > --------------------------------------------------- > XII Aniversario de la creación de la Universidad de las Ciencias > Informáticas. 12 años de historia junto a Fidel. 12 de diciembre de 2014. >