Hi Yusniel, Solr manages documents as a whole. This means updating an existing document means replacing. So you should/could index metadata and full text in one step, one solr document under one unique ID. That would the simplest case. You could also also use nested child documents to use block joins(depending on what version of Solr you are using, more info here: http://blog.griddynamics.com/2013/09/solr-block-join-support.html), but in my opinion this would be an overkill. We also manage a type of "semantic - linked data" mimic using additional fields(named by real ontology predicate/property names to join documents that are related, see https://wiki.apache.org/solr/Join). So you could add the full text as an additional document with it's own ID and fill a solr document field with the ID of the parent metadata document. The on query time you can join them. Joins in solr always give as result the joined document(TO), not both (it's no like a SQL join, more like and inner query), so we experimented with self joins (the field holding the parent ID document also holds it's own ID), but as you can understand this is in no way optimal.
Related: We are using a Digital Objects Repository (Fedora Commons + Islandora) to archive exactly what you wan't to do. Our PDF files, and also many other type of data and metadata, are ingested as objects inside the repository, including technical metadata, MODS, DC, binary stream and full text. Then this whole object (as a FOXML) goes through an XSLT transformation and into Solr. If you are interested you can browse Islandoras google group. https://groups.google.com/forum/#!forum/islandora and visit Islandora's WIKI. https://wiki.duraspace.org/display/ISLANDORA714/Islandora. There is much documentation under the fedoragsearch module that does the real indexing. You can see our schemas and solr config there. Feel free to write me if you need/wan't more data. Cheers Diego Pino Navarro Krayon Media Pedro de Valdivia 575 Pucón - Chile F:+56-45-2442469 On Jan 21, 2015, at 2:43 AM, Yusniel Hidalgo Delgado <yhdelg...@uci.cu> wrote: > > > Dear Solr community, > > > > > I am diving into Solr recently and I need help in the following usage > scenery. I am working on a project for extract and search bibliographic > metadata from PDF files. Firstly, my PDF files are processed to extract > bibliographic metadata such as title, authors, affiliations, keywords and > abstract. These metadata are stored in a relational database and then are > indexed in Solr via DIH, however, I need to index also the fulltext of PDF > and maintain the same ID between metadata indexed and fulltext of PDF indexed > in Solr index. How to do that? How to configure sorlconfig.xml and schema.xml > to do it? > > > > > Thanks in advance. > > > > > Best regards > > Yusniel Hidalgo Delgado > Semantic Web Research Group > University of Informatics Sciences > http://gws-uci.blogspot.com/ > Havana, Cuba > > > > > --------------------------------------------------- > XII Aniversario de la creación de la Universidad de las Ciencias > Informáticas. 12 años de historia junto a Fidel. 12 de diciembre de 2014.