Re: Some sort of join in SOLR?

Michael Lackhoff Thu, 17 Jan 2008 08:42:52 -0800

On 17.01.2008 16:53 Erick Erickson wrote:

I would *strongly* encourage you to store them together
as one document. There's no real method of doing
DB like joins in the underlying Lucene search engine.


Thanks, that was also my preference.

But that's generic advice. The question I have for you is
"What's the big deal about coordinating the sources?"
That is, you have to have something that allows you to
make a 1:1 correspondence between your data sources
or you couldn't relate them in the first place. Is it really
that onerous to check?

I don't have an index to check. Both sources come in huge text files,one of them daily, the other irregular. One has the ID, the other has adifferent ID that must be mapped first to the ID of the first source. Sothere is no easy way of saying: "Give me the record to this ID from theother set of records". It is all burried in plain text files.

If it is, why not build an index and search it when you
want to know?

That is what I will do now: Build a SQLite database with just twocolumns: ID and contents with an index on the ID. Then when I rebuildthe SOLR index by processing the other data I will lookup the SQLite DBif there is a corresponding record from the other source.

My hope was that I could avoid this intermediate database.

You haven't described enough of your problem
space for me to render any opinion of whether
this is premature optimization or not, but it
sure smells like it from a distance <G>...

I don't think it was premature optimization. It was just the attempt tokeep the nightly rebuild of the index as easy as possible and to avoidunnecessary complexity. But if it is necessary I will go this way.


-Michael

Re: Some sort of join in SOLR?

Reply via email to