Chris,
I think what I am trying to do is actually much simpler than what you
are talking about here.
I do plan on returning document ids and retrieving full entity data from
the database- solr would
just be used for the search, not for results display.
The problem is that some data cannot be "flattened", for example when a
document has repeating
fields that are complex types, such as address.
The best example I can think of is a resume database. You could
certainly just put the whole resume
document into the text index and do full text searches. But to answer
the question of what people
received a Harvard MBA in the last 10 years and have worked at Intel in
the last 5 years, you have
to correlate the years of attendance with the schoolName entry.
Otherwise you might be getting years
for some other education/work history entry.
By adding an objType field and combining search results, you can be sure
that the year/schoolName
query matched a unique education record. The tricky bit is in getting a
list of field values (e.g. foreign
keys, which are essentially facets) for a result set very quickly.
If this can be done, figuring out a generic way of specifying multiple
searches and relationships between
result sets (without reinventing SQL) becomes the challenge.
We'll see. I have my doubts that it will work for any but the smallest
of collections, which ours certainly
isn't.
Thanks --Joachim
Chris Hostetter wrote:
While it's certianly possible to "join" the results of multiple indexes, i
would do so only when absolutely neccessary -- in my experience the only
time i've found that it makes sense, is when one aspect of the data
changes extremely rapidly compared to everything else, making complex
reindexing a pain, but reindexing just the changed data in it's own index
is a lot more feasible.
As a rule of thumb, when building "paginated" style search applications, I
would advise people to try and flatten their index as much as possible, so
that the application can do one "user query" (based on the users input)
to get a single page of results, and then use the uniqueKeys from that
page of results to lookup ancillary data from any other indexes (or
databases that you need) -- the key being that all the data you want to
search on, and all hte data you need to sort are in the index, but other
data you needto return to the user can come from other sources.
If you find yourself wanting to "join" to indexes for hte purposes of
matching or sorting, the amount of work you wind up doing tends to be
prohibitive on really large indexes -- and if your indxes aren't that
large, it would probably just be easier to puteverything in one index and
rebuild it frequently.
: I am trying to integrate solr search results with results from a rdbms
: query. It's working ok, but fairly complicated due to large size of
: the results from the database, and many different sort requirements.
:
: I know that solr/lucene was not designed to intelligently handle
: multiple document types in the same collection, i.e. provide join
: features, but I'm wondering if anyone on this list has any thoughts on
: how to do it in lucene, and how it might be integrated into a custom
: solr deployment. I can't see going back to vanilla lucene after solr!
:
: My basic idea is to add an objType field that would be used to define a
: "table". There would be one main objType, any related objTypes would
: have a field pointing back to the main objs via id, like a foreign key.
:
: I'd run multiple parallel searches and merge the results based on
: foreign keys, either using a Filter or just using custom code. I'm
: anticipating that iterating through the results to retrieve the foreign
: key values will be too slow.
:
: Our data is highly textual, temporal and spatial, which pretty much
: correspond to the 3 tables I would have. I can de-normalize a lot of
: the data, but the combination of times, locations and textual
: representations would be way too large to fully flatten.
:
: I'm about to start experimenting with different strategies, and I would
: appreciate any insight anyone can provide. Would the faceting code help
: here somehow?
-Hoss