On Dec 14, 2007, at 9:55 AM, Stuart Sierra wrote:
On Dec 13, 2007 9:20 PM, solruser2 <[EMAIL PROTECTED]> wrote:
Let's say I have a database containing people, groups, and projects
(these
all have different fields). I want to index these different kinds
of objects
with a view to eventually present search results from all three
types mashed
together and sorted by relevance. Using separate indices (and thus
separate
Solr processes) would make mashing the results together very
difficult so
I'm guessing I just add the separate fields to the schema along
with an
'object_type' field or equivalent?
That is the approach I would take. Having three separate indices
would make your searches slower and more complicated.
I agree.
Secondly should I just store the database row id for each object
(while
still indexing the field contents) so a query on the index returns
a list of
id's that I can then fetch from the database?
It depends. :) If you want highlighted snippets in your search
results, then you have to store the field contents in the index. In
some situations you can make your search pages faster by storing all
the critical fields (the ones you want to appear in search results) in
the index, so that you don't have to fetch a dozen records from the
database just to display a list of search results. On the other hand,
if your database records are small and you don't need highlighting, it
may be faster to only store database ID's in the index.
I agree with this also. However, I've never seen a case where a
separate database query to retrieve metadata stored in a database will
be faster than just storing the necessary fields directly in the
search index and retrieving them with the search results. I've
found it helpful to think of the full-text index as a very simple,
very fast, very flat database engine. You may not be able to do outer
joins and correlated subqueries on it, but you can get a list of
documents and titles really fast.
Hope this sheds some light,
-Stuart Sierra
AltLaw.org