Hi, You can find several examples of configuring tika+dih to index pdf in internet (e.g. https://tuxdna.wordpress.com/2013/02/04/indexing-the-documents-stored-in-a-database-using-apache-solr-and-apache-tika/ )
Regards. On Jan 21, 2015 6:54 AM, "Yusniel Hidalgo Delgado" <yhdelg...@uci.cu> wrote: > > > Dear Solr community, > > > > > I am diving into Solr recently and I need help in the following usage > scenery. I am working on a project for extract and search bibliographic > metadata from PDF files. Firstly, my PDF files are processed to extract > bibliographic metadata such as title, authors, affiliations, keywords and > abstract. These metadata are stored in a relational database and then are > indexed in Solr via DIH, however, I need to index also the fulltext of PDF > and maintain the same ID between metadata indexed and fulltext of PDF > indexed in Solr index. How to do that? How to configure sorlconfig.xml and > schema.xml to do it? > > > > > Thanks in advance. > > > > > Best regards > > Yusniel Hidalgo Delgado > Semantic Web Research Group > University of Informatics Sciences > http://gws-uci.blogspot.com/ > Havana, Cuba > > > > > --------------------------------------------------- > XII Aniversario de la creación de la Universidad de las Ciencias > Informáticas. 12 años de historia junto a Fidel. 12 de diciembre de 2014. >