I am trying to integrate solr search results with results from a rdbms query. It's working ok, but fairly complicated due to large size of the results from the database, and many different sort requirements.

I know that solr/lucene was not designed to intelligently handle multiple document types in the same collection, i.e. provide join features, but I'm wondering if anyone on this list has any thoughts on how to do it in lucene, and how it might be integrated into a custom solr deployment. I can't see going back to vanilla lucene after solr!

My basic idea is to add an objType field that would be used to define a "table". There would be one main objType, any related objTypes would have a field pointing back to the main objs via id, like a foreign key.

I'd run multiple parallel searches and merge the results based on foreign keys, either using a Filter or just using custom code. I'm anticipating that iterating through the results to retrieve the foreign key values will be too slow.

Our data is highly textual, temporal and spatial, which pretty much correspond to the 3 tables I would have. I can de-normalize a lot of the data, but the combination of times, locations and textual representations would be way too large to fully flatten.

I'm about to start experimenting with different strategies, and I would appreciate any insight anyone can provide. Would the faceting code help here somehow?

Thanks --Joachim





Reply via email to