Re: Merging Sets of Data from Two Different Sources

2015-06-12 Thread Alessandro Benedetti
is achievable using > DataImportHandler. Probably worth a try before writing code... > > -Original Message- > From: Paden [mailto:rumsey...@gmail.com] > Sent: Thursday, June 11, 2015 4:14 PM > To: solr-user@lucene.apache.org > Subject: Re: Merging Sets of Data from Two D

RE: Merging Sets of Data from Two Different Sources

2015-06-11 Thread Reitzel, Charles
result is achievable using DataImportHandler. Probably worth a try before writing code... -Original Message- From: Paden [mailto:rumsey...@gmail.com] Sent: Thursday, June 11, 2015 4:14 PM To: solr-user@lucene.apache.org Subject: Re: Merging Sets of Data from Two Different Sources So you

Re: Merging Sets of Data from Two Different Sources

2015-06-11 Thread Paden
The filepath is the key in both the filesystem and the database -- View this message in context: http://lucene.472066.n3.nabble.com/Merging-Sets-of-Data-from-Two-Different-Sources-tp4211166p4211253.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Merging Sets of Data from Two Different Sources

2015-06-11 Thread Paden
Both sources, the filesystem and the database, contain the file path for each individual file -- View this message in context: http://lucene.472066.n3.nabble.com/Merging-Sets-of-Data-from-Two-Different-Sources-tp4211166p4211251.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Merging Sets of Data from Two Different Sources

2015-06-11 Thread Paden
So you're saying I could merge both the metadata in the database and their files in the file system into one query-able item in solr by just customizing the DIH correctly and getting the right schema? (I'm sorry this sounds like a redundant question but I've been trying to find an answer for the

Re: Merging Sets of Data from Two Different Sources

2015-06-11 Thread Jack Krupansky
One question is which source defines the key - do you crawl the files and then look up the file name in the database, or scan the database and there is a field to specify the file name? IOW, given a database key, is there a fixed method to determine the file name path? And vice versa. -- Jack Kru

Re: Merging Sets of Data from Two Different Sources

2015-06-11 Thread Alessandro Benedetti
I agree with all the ideas so far explained, but actually I would have suggested the DIH ( Data Import Handler) as a first plan. It does already allow out of the box indexing from different datasources. It supports Jdbc datasources with extensive processors and it does support also a file system d

Re: Merging Sets of Data from Two Different Sources

2015-06-11 Thread Erick Erickson
Here's a skeleton that uses Tika from a SolrJ client. It mixes in a database too, but the parts are pretty separate. https://lucidworks.com/blog/indexing-with-solrj/ Best, Erick On Thu, Jun 11, 2015 at 7:14 AM, Paden wrote: > You were very VERY helpful. Thank you very much. If I could bug you f

Re: Merging Sets of Data from Two Different Sources

2015-06-11 Thread Paden
You were very VERY helpful. Thank you very much. If I could bug you for one last question. Do you know where the documentation is that would help me write my own indexer? -- View this message in context: http://lucene.472066.n3.nabble.com/Merging-Sets-of-Data-from-Two-Different-Sources-tp42111

Re: Merging Sets of Data from Two Different Sources

2015-06-11 Thread Charlie Hull
On 11/06/2015 14:57, Paden wrote: So you're saying that Tika can parse the text OUTSIDE of Solr. So I would still be able to process my PDF's with Tika just outside of Solr specifically correct? Yes. Charlie -- View this message in context: http://lucene.472066.n3.nabble.com/Merging-Sets

Re: Merging Sets of Data from Two Different Sources

2015-06-11 Thread Paden
So you're saying that Tika can parse the text OUTSIDE of Solr. So I would still be able to process my PDF's with Tika just outside of Solr specifically correct? -- View this message in context: http://lucene.472066.n3.nabble.com/Merging-Sets-of-Data-from-Two-Different-Sources-tp4211166p421117

Re: Merging Sets of Data from Two Different Sources

2015-06-11 Thread Charlie Hull
On 11/06/2015 14:38, Paden wrote: I do have a link between both sets of data and that would be the filepath that could be indexed by both. Great. I do, however, have large PDF's that do need to be indexed. So just for clarification, I could write an indexer that used both the DIH and SolrCell

Re: Merging Sets of Data from Two Different Sources

2015-06-11 Thread Paden
I do have a link between both sets of data and that would be the filepath that could be indexed by both. I do, however, have large PDF's that do need to be indexed. So just for clarification, I could write an indexer that used both the DIH and SolrCell to submit a combined record to Solr or would

Re: Merging Sets of Data from Two Different Sources

2015-06-11 Thread Charlie Hull
On 11/06/2015 14:19, Paden wrote: I'm trying to figure out if Solr is a good fit for my project. I have two sets of data. On the one hand there is a bunch of files sitting in a local file system in a Linux file system. On the other is a set of metadata FOR the files that is located in a MySQL da