In the library subject heading context, I wonder if a layered approach
would bring performance into the acceptable range. Since Library of
Congress Subject Headings break into standard parts, you could have
first-tier facets representing the main heading, second-tier facets with
the main heading and first subdivision, etc. So to extract the subject
headings from a given result set, you'd first test all the first-tier
facets like "Body, Human", then where warranted test the associated
second-tier facets like "Body, Human--Social aspects.". If the
first-tier facets represent a small enough subset of the set of subject
headings as a whole, that might be enough to reduce the total number of
facet tests.

I'm told by our metadata librarian, by the way, that there are 280,000
subject headings defined in LCSH at the moment (including
cross-references), so that provides a rough upper limit on distinct
values...

Peter 

-----Original Message-----
From: Chris Hostetter [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, February 07, 2007 2:02 PM
To: solr-user@lucene.apache.org
Subject: Re: facet optimizing


: Andrew, I haven't yet found a successful way to implement the SOLR
: faceting for library catalog data.  I developed my own system, so for

Just to clarify: the "out of hte box" faceting support Solr has at the
moment is very deliberately refered to as "SimpleFacets" ... it's
intended to solve Simple problems where you want Facets based on all of
the values in a field, or one specific hardcoded queries.  It was
primarily written as a demonstration of what is possiblewhen writting a
custom SolrRequestHandler.

when you start talking about really large data sets, with an extremely
large vloume of unique field values for fields you want to facet on,
then "generic" solutions stop being very feasible, and you have to start
ooking at solutions more tailored to your dataset.  at CNET, when
dealing with Product data, we don't make any attempt to use the Simple
Facet support Solr provides to facet on things like Manufacturer or
Operating System because enumerating through every Manufacturer in the
catalog on every query would be too expensive -- instead we have
strucured metadata that drives the logic: only compute the constraint
counts for this subset of manufactures where looking at teh Desktops
category, only look at teh Operating System facet when in these
categories, etc...  rules like these need to be defined based on your
user experience, and it can be easy to build them using the metadata in
your index -- but they really need to be precomputed, not calculated on
the fly every time.

For something like a LIbrary system, where you might want to facet on
Author, but you have way to many to be practical, a system that either
required a category to be picked first (allowing you to constrain the
list of authors you need to worry about) or precomputed the top 1000
authors for displaying initially (when the user hasn't provided any
other
constraints) are examples of the types of things a RequestHandler Solr
Plugin might do -- but the logic involved would probably be domain
specific.



-Hoss

Reply via email to