Thanks Alex, indeed, the relative path to PDF document is stored in the
database. I will try to use your approach.
Regards,
Yusniel Hidalgo
- Mensaje original -
De: "Alexandre Rafalovitch"
Para: "solr-user"
Enviados: Sábado, 24 de Enero 2015 18:19:48
Asunto: R
You could use nested entities in DIH.
So, if you store - for example - path to the PDF in the database, you
could do a nested entity with TikaEntityProcessor to load the content.
Just make sure the field names do not conflict.
Regards,
Alex.
Sign up for my Solr resources newsletter at ht
Dear Solr community,
I am diving into Solr recently and I need help in the following usage scenery.
I am working on a project for extract and search bibliographic metadata from
PDF files. Firstly, my PDF files are processed to extract bibliographic
metadata such as title, authors, affiliations
Hi Yusniel,
Solr manages documents as a whole. This means updating an existing document
means replacing. So you should/could index metadata and full text in one step,
one solr document under one unique ID. That would the simplest case. You could
also also use nested child documents to use bloc
On 1/20/2015 10:43 PM, Yusniel Hidalgo Delgado wrote:
> I am diving into Solr recently and I need help in the following usage
> scenery. I am working on a project for extract and search bibliographic
> metadata from PDF files. Firstly, my PDF files are processed to extract
> bibliographic metada
Hi,
You can find several examples of configuring tika+dih to index pdf in
internet (e.g.
https://tuxdna.wordpress.com/2013/02/04/indexing-the-documents-stored-in-a-database-using-apache-solr-and-apache-tika/
)
Regards.
On Jan 21, 2015 6:54 AM, "Yusniel Hidalgo Delgado" wrote:
>
>
> Dear Solr co
Dear Solr community,
I am diving into Solr recently and I need help in the following usage scenery.
I am working on a project for extract and search bibliographic metadata from
PDF files. Firstly, my PDF files are processed to extract bibliographic
metadata such as title, authors, affilia