Re: facet optimizing

2007-03-01 Thread Yonik Seeley
On 3/1/07, Gunther, Andrew <[EMAIL PROTECTED]> wrote: Can someone post their magic formula for filterCache (Erik?) We've hit a plateau around 1.7mill docs and my response times have suffered when filtering. Is this for field faceting (facet.field)? Have adjusted filtercache up and down all d

RE: facet optimizing

2007-03-01 Thread Gunther, Andrew
ubject: Re: facet optimizing On Feb 7, 2007, at 4:42 PM, Yonik Seeley wrote: > Solr relies on the filter cache for faceting, and if it's not big > enough you're going to get a near 0% hit rate. Check the statistics > page and make sure there aren't any evictions after

Re: facet optimizing

2007-02-09 Thread Yonik Seeley
On 2/9/07, Yonik Seeley <[EMAIL PROTECTED]> wrote: If you exclude both the high df counts from the tree, and the "bits" they contribute, then it becomes mandatory to calculate the intersections for those high df terms. It also will hopefully act as a good boostrap to raise the min_df of the queu

Re: facet optimizing

2007-02-09 Thread Yonik Seeley
On 2/9/07, Chris Hostetter <[EMAIL PROTECTED]> wrote: I freely admit that i'm totally lost on most of what you're suggestion ... it seems like you're suggesting that organizing the terms in a facet field into a tree structure would help us know which terms to compute the counts for first for a gi

Re: facet optimizing

2007-02-08 Thread Chris Hostetter
I freely admit that i'm totally lost on most of what you're suggestion ... it seems like you're suggesting that organizing the terms in a facet field into a tree structure would help us know which terms to compute the counts for first for a given query -- but it's not clear to me why that would be

Re: facet optimizing

2007-02-08 Thread Chris Hostetter
: > query would be too expensive -- instead we have strucured metadata that : > drives the logic: only compute the constraint counts for this subset of : > manufactures where looking at teh Desktops category, only look at teh : > Operating System facet when in these categories, etc... rules like

Re: facet optimizing

2007-02-08 Thread Erik Hatcher
And to add some fuel to this fire, I'm seeing in the (first 100k of UVa MARC records) data I'm processing that the facets are sparse with documents. There are a lot of documents that simply don't have a subject genre on them, for example... like almost 50%. Maybe the data will get cleaner

Re: facet optimizing

2007-02-08 Thread Yonik Seeley
A little more brainstorming on this... pruning by df is going to be one of the most important features here... so a variation (or optimization) would be to keep a list of the highest terms by df, and then build the facet tree excluding those top terms. That should lower the dfs in the tree nodes

RE: facet optimizing

2007-02-08 Thread Binkley, Peter
Yonik wrote: Thinking all this stuff up from scratch seems like the hard way... Does anyone know how other people have implemented this stuff? It's not really what Yonik was asking for, but on the semantic front, one thing that might help is OCLC's FAST project (Faceted Application of Subject

Re: facet optimizing

2007-02-07 Thread Yonik Seeley
On 2/7/07, Erik Hatcher <[EMAIL PROTECTED]> wrote: Yonik - I like the way you think Yeah! It's turtles (err, trees) all the way down. Heh... I'm still thinking/brainstorming about it... it only helps if you can effectively prune though. Each node in the tree could also keep the max d

Re: facet optimizing

2007-02-07 Thread Erik Hatcher
Yonik - I like the way you think Yeah! It's turtles (err, trees) all the way down. Erik /me Pulling the Algorithms book off my shelf so I can vaguely follow along. On Feb 7, 2007, at 8:22 PM, Yonik Seeley wrote: On 2/7/07, Binkley, Peter <[EMAIL PROTECTED]> wrote: In the

Re: facet optimizing

2007-02-07 Thread Yonik Seeley
On 2/7/07, Binkley, Peter <[EMAIL PROTECTED]> wrote: In the library subject heading context, I wonder if a layered approach would bring performance into the acceptable range. Since Library of Congress Subject Headings break into standard parts, you could have first-tier facets representing the ma

Re: facet optimizing

2007-02-07 Thread rubdabadub
Hi: when you start talking about really large data sets, with an extremely large vloume of unique field values for fields you want to facet on, then "generic" solutions stop being very feasible, and you have to start ooking at solutions more tailored to your dataset. at CNET, when dealing with

RE: facet optimizing

2007-02-07 Thread Chris Hostetter
: headings from a given result set, you'd first test all the first-tier : facets like "Body, Human", then where warranted test the associated : second-tier facets like "Body, Human--Social aspects.". If the : first-tier facets represent a small enough subset of the set of subject : headings as a w

RE: facet optimizing

2007-02-07 Thread Binkley, Peter
rovides a rough upper limit on distinct values... Peter -Original Message- From: Chris Hostetter [mailto:[EMAIL PROTECTED] Sent: Wednesday, February 07, 2007 2:02 PM To: solr-user@lucene.apache.org Subject: Re: facet optimizing : Andrew, I haven't yet found a successful way t

Re: facet optimizing

2007-02-07 Thread Chris Hostetter
: Is it just that the cache size needs to be bigger then the number of : distinct values for a field? basically yes, but the cache is going to be used for all filters -- not just those for a single facet (so your cache might be big enough that faceting on fieldA or fieldB is fine, but if you face

Re: facet optimizing

2007-02-07 Thread Ryan McKinley
Are there any simple automatic test we can run to see what fields would support fast faceting? Is it just that the cache size needs to be bigger then the number of distinct values for a field? If so, it would be nice to add an /admin page that lists each field, the distinct value count and a gre

Re: facet optimizing

2007-02-07 Thread Erik Hatcher
On Feb 7, 2007, at 4:42 PM, Yonik Seeley wrote: Solr relies on the filter cache for faceting, and if it's not big enough you're going to get a near 0% hit rate. Check the statistics page and make sure there aren't any evictions after you do a query with facets. If there are, make the cache lar

Re: facet optimizing

2007-02-07 Thread Yonik Seeley
On 2/7/07, Gunther, Andrew <[EMAIL PROTECTED]> wrote: Any suggestions on how to optimize the loading of facets? My index is roughly 35,000 35,000 documents? That's not that big. and I am asking solr to return 6 six facet fields on every query. On large result sets with facet params set to

Re: facet optimizing

2007-02-07 Thread Chris Hostetter
: Andrew, I haven't yet found a successful way to implement the SOLR : faceting for library catalog data. I developed my own system, so for Just to clarify: the "out of hte box" faceting support Solr has at the moment is very deliberately refered to as "SimpleFacets" ... it's intended to solve S

Re: facet optimizing

2007-02-07 Thread Mike Klaas
On 2/7/07, Gunther, Andrew <[EMAIL PROTECTED]> wrote: Yes most all terms are multi-valued which I can't avoid. Since the data is coming from a library catalogue I am translating a subject field to make a subject facet. That facet alone is the biggest, hovering near 39k. If I remove this facet.f

Re: facet optimizing

2007-02-07 Thread Andrew Nagy
Gunther, Andrew wrote: Yes most all terms are multi-valued which I can't avoid. Since the data is coming from a library catalogue I am translating a subject field to make a subject facet. That facet alone is the biggest, hovering near 39k. If I remove this facet.field things return faster. So a

RE: facet optimizing

2007-02-07 Thread Gunther, Andrew
t: Re: facet optimizing How many unique values do you have for those 6 fields? And are those fields multiValued or not? Single valued facets are much faster (though not realistic in my domain). Lots of values per field do not good facets make. Erik On Feb 7, 2007, at 11:10 AM, Gu

Re: facet optimizing

2007-02-07 Thread Erik Hatcher
How many unique values do you have for those 6 fields? And are those fields multiValued or not? Single valued facets are much faster (though not realistic in my domain). Lots of values per field do not good facets make. Erik On Feb 7, 2007, at 11:10 AM, Gunther, Andrew wrote: