On 11/06/2015 14:19, Paden wrote:
I'm trying to figure out if Solr is a good fit for my project.
I have two sets of data. On the one hand there is a bunch of files sitting
in a local file system in a Linux file system. On the other is a set of
metadata FOR the files that is located in a MySQL database.
I need a program that can merge BOTH sets of data into one index. Meaning
that the metadata in the database will attach/merge with the file data(the
text) from the file system to create one searchable indexed item for each
document in the file system. The metadata located in the database contains
information that is vital to a faceted search of the documents located in
the file system.
Would Solr accomplish my goals? And if so, what tools can it provide to do
so?
If you can link the files and the metadata easily, then this shouldn't
be hard (i.e. you have some common identifier). We would write an
indexer in Python that extracted data from MySQL, crawled the filesystem
and used Apache Tika to extract plain text from the files, then
submitted a combined record to Solr for indexing. You'll need to decide
on a schema for the combined record of course.
There are alternatives (DataImportHandler for the database, SolrCell for
submitting the files directly) but we prefer to keep the file handling
in particular outside of Solr (as large PDFs for example can kill Tika
and thus Solr itself).
Cheers
Charlie
--
View this message in context:
http://lucene.472066.n3.nabble.com/Merging-Sets-of-Data-from-Two-Different-Sources-tp4211166.html
Sent from the Solr - User mailing list archive at Nabble.com.
--
Charlie Hull
Flax - Open Source Enterprise Search
tel/fax: +44 (0)8700 118334
mobile: +44 (0)7767 825828
web: www.flax.co.uk