Hi,

You can find several examples of configuring tika+dih to index pdf in
internet (e.g.
https://tuxdna.wordpress.com/2013/02/04/indexing-the-documents-stored-in-a-database-using-apache-solr-and-apache-tika/
)

Regards.
On Jan 21, 2015 6:54 AM, "Yusniel Hidalgo Delgado" <yhdelg...@uci.cu> wrote:

>
>
> Dear Solr community,
>
>
>
>
> I am diving into Solr recently and I need help in the following usage
> scenery. I am working on a project for extract and search bibliographic
> metadata from PDF files. Firstly, my PDF files are processed to extract
> bibliographic metadata such as title, authors, affiliations, keywords and
> abstract. These metadata are stored in a relational database and then are
> indexed in Solr via DIH, however, I need to index also the fulltext of PDF
> and maintain the same ID between metadata indexed and fulltext of PDF
> indexed in Solr index. How to do that? How to configure sorlconfig.xml and
> schema.xml to do it?
>
>
>
>
> Thanks in advance.
>
>
>
>
> Best regards
>
> Yusniel Hidalgo Delgado
> Semantic Web Research Group
> University of Informatics Sciences
> http://gws-uci.blogspot.com/
> Havana, Cuba
>
>
>
>
> ---------------------------------------------------
> XII Aniversario de la creación de la Universidad de las Ciencias
> Informáticas. 12 años de historia junto a Fidel. 12 de diciembre de 2014.
>

Reply via email to