Re: relational design in solr?

Joachim Martin Fri, 22 Sep 2006 08:04:42 -0700

Chris,

I think what I am trying to do is actually much simpler than what youare talking about here.I do plan on returning document ids and retrieving full entity data fromthe database- solr would

just be used for the search, not for results display.

The problem is that some data cannot be "flattened", for example when adocument has repeating

fields that are complex types, such as address.

The best example I can think of is a resume database. You couldcertainly just put the whole resumedocument into the text index and do full text searches. But to answerthe question of what peoplereceived a Harvard MBA in the last 10 years and have worked at Intel inthe last 5 years, you haveto correlate the years of attendance with the schoolName entry.Otherwise you might be getting years

for some other education/work history entry.

By adding an objType field and combining search results, you can be surethat the year/schoolNamequery matched a unique education record. The tricky bit is in getting alist of field values (e.g. foreign

keys, which are essentially facets) for a result set very quickly.

If this can be done, figuring out a generic way of specifying multiplesearches and relationships between

result sets (without reinventing SQL) becomes the challenge.

We'll see. I have my doubts that it will work for any but the smallestof collections, which ours certainly

isn't.

Thanks --Joachim

Chris Hostetter wrote:

While it's certianly possible to "join" the results of multiple indexes, i
would do so only when absolutely neccessary -- in my experience the only
time i've found that it makes sense, is when one aspect of the data
changes extremely rapidly compared to everything else, making complex
reindexing a pain, but reindexing just the changed data in it's own index
is a lot more feasible.

As a rule of thumb, when building "paginated" style search applications, I
would advise people to try and flatten their index as much as possible, so
that the application can do one "user query" (based on the users input)
to get a single page of results, and then use the uniqueKeys from that
page of results to lookup ancillary data from any other indexes (or
databases that you need) -- the key being that all the data you want to
search on, and all hte data you need to sort are in the index, but other
data you needto return to the user can come from other sources.

If you find yourself wanting to "join" to indexes for hte purposes of
matching or sorting, the amount of work you wind up doing tends to be
prohibitive on really large indexes -- and if your indxes aren't that
large, it would probably just be easier to puteverything in one index and
rebuild it frequently.

: I am trying to integrate solr search results with results from a rdbms
: query.  It's working ok, but fairly complicated  due to large size of
: the results from the database, and many different sort requirements.
:
: I know that solr/lucene was not designed to intelligently handle
: multiple document types in the same collection, i.e. provide join
: features, but I'm wondering if anyone on this list has any thoughts on
: how to do it in lucene, and how it might be integrated into a custom
: solr deployment.  I can't see going back to vanilla lucene after solr!
:
: My basic idea is to add an objType field that would be used to define a
: "table".  There would be one main objType, any related objTypes would
: have a field pointing back to the main objs via id, like a foreign key.
:
: I'd run multiple parallel searches and merge the results based on
: foreign keys, either using a Filter or just using custom code.  I'm
: anticipating that iterating through the results to retrieve the foreign
: key values will be too slow.
:
: Our data is highly textual, temporal and spatial, which pretty much
: correspond to the 3 tables I would have.  I can de-normalize a lot of
: the data, but the combination of times, locations and textual
: representations would be way too large to fully flatten.
:
: I'm about to start experimenting with different strategies, and I would
: appreciate any insight anyone can provide.  Would the faceting code help
: here somehow?



-Hoss

Re: relational design in solr?

Reply via email to