On 3/1/07, Gunther, Andrew <[EMAIL PROTECTED]> wrote:
Can someone post their magic formula for filterCache (Erik?) We've hit
a plateau around 1.7mill docs and my response times have suffered when
filtering.
Is this for field faceting (facet.field)?
Have adjusted filtercache up and down all d
ubject: Re: facet optimizing
On Feb 7, 2007, at 4:42 PM, Yonik Seeley wrote:
> Solr relies on the filter cache for faceting, and if it's not big
> enough you're going to get a near 0% hit rate. Check the statistics
> page and make sure there aren't any evictions after
On 2/9/07, Yonik Seeley <[EMAIL PROTECTED]> wrote:
If you exclude both the high df
counts from the tree, and the "bits" they contribute, then it becomes
mandatory to calculate the intersections for those high df terms. It
also will hopefully act as a good boostrap to raise the min_df of the
queu
On 2/9/07, Chris Hostetter <[EMAIL PROTECTED]> wrote:
I freely admit that i'm totally lost on most of what you're suggestion ...
it seems like you're suggesting that organizing the terms in a facet field
into a tree structure would help us know which terms to compute the
counts for first for a gi
I freely admit that i'm totally lost on most of what you're suggestion ...
it seems like you're suggesting that organizing the terms in a facet field
into a tree structure would help us know which terms to compute the
counts for first for a given query -- but it's not clear to me why that
would be
: > query would be too expensive -- instead we have strucured metadata that
: > drives the logic: only compute the constraint counts for this subset of
: > manufactures where looking at teh Desktops category, only look at teh
: > Operating System facet when in these categories, etc... rules like
And to add some fuel to this fire, I'm seeing in the (first 100k of
UVa MARC records) data I'm processing that the facets are sparse with
documents. There are a lot of documents that simply don't have a
subject genre on them, for example... like almost 50%. Maybe the
data will get cleaner
A little more brainstorming on this...
pruning by df is going to be one of the most important features
here... so a variation (or optimization) would be to keep a list of
the highest terms by df, and then build the facet tree excluding those
top terms. That should lower the dfs in the tree nodes
Yonik wrote:
Thinking all this stuff up from scratch seems like the hard way...
Does anyone know how other people have implemented this stuff?
It's not really what Yonik was asking for, but on the semantic front,
one thing that might help is OCLC's FAST project (Faceted Application of
Subject
On 2/7/07, Erik Hatcher <[EMAIL PROTECTED]> wrote:
Yonik - I like the way you think
Yeah!
It's turtles (err, trees) all the way down.
Heh...
I'm still thinking/brainstorming about it... it only helps if you can
effectively prune though.
Each node in the tree could also keep the max d
Yonik - I like the way you think
Yeah!
It's turtles (err, trees) all the way down.
Erik
/me Pulling the Algorithms book off my shelf so I can vaguely follow
along.
On Feb 7, 2007, at 8:22 PM, Yonik Seeley wrote:
On 2/7/07, Binkley, Peter <[EMAIL PROTECTED]> wrote:
In the
On 2/7/07, Binkley, Peter <[EMAIL PROTECTED]> wrote:
In the library subject heading context, I wonder if a layered approach
would bring performance into the acceptable range. Since Library of
Congress Subject Headings break into standard parts, you could have
first-tier facets representing the ma
Hi:
when you start talking about really large data sets, with an extremely
large vloume of unique field values for fields you want to facet on, then
"generic" solutions stop being very feasible, and you have to start ooking
at solutions more tailored to your dataset. at CNET, when dealing with
: headings from a given result set, you'd first test all the first-tier
: facets like "Body, Human", then where warranted test the associated
: second-tier facets like "Body, Human--Social aspects.". If the
: first-tier facets represent a small enough subset of the set of subject
: headings as a w
rovides a rough upper limit on distinct
values...
Peter
-Original Message-
From: Chris Hostetter [mailto:[EMAIL PROTECTED]
Sent: Wednesday, February 07, 2007 2:02 PM
To: solr-user@lucene.apache.org
Subject: Re: facet optimizing
: Andrew, I haven't yet found a successful way t
: Is it just that the cache size needs to be bigger then the number of
: distinct values for a field?
basically yes, but the cache is going to be used for all filters -- not
just those for a single facet (so your cache might be big enough that
faceting on fieldA or fieldB is fine, but if you face
Are there any simple automatic test we can run to see what fields
would support fast faceting?
Is it just that the cache size needs to be bigger then the number of
distinct values for a field?
If so, it would be nice to add an /admin page that lists each field,
the distinct value count and a gre
On Feb 7, 2007, at 4:42 PM, Yonik Seeley wrote:
Solr relies on the filter cache for faceting, and if it's not big
enough you're going to get a near 0% hit rate. Check the statistics
page and make sure there aren't any evictions after you do a query
with facets. If there are, make the cache lar
On 2/7/07, Gunther, Andrew <[EMAIL PROTECTED]> wrote:
Any suggestions on how to optimize the loading of facets? My index is
roughly 35,000
35,000 documents? That's not that big.
and I am asking solr to return 6 six facet fields on
every query. On large result sets with facet params set to
: Andrew, I haven't yet found a successful way to implement the SOLR
: faceting for library catalog data. I developed my own system, so for
Just to clarify: the "out of hte box" faceting support Solr has at the
moment is very deliberately refered to as "SimpleFacets" ... it's intended
to solve S
On 2/7/07, Gunther, Andrew <[EMAIL PROTECTED]> wrote:
Yes most all terms are multi-valued which I can't avoid.
Since the data is coming from a library catalogue I am translating a
subject field to make a subject facet. That facet alone is the biggest,
hovering near 39k. If I remove this facet.f
Gunther, Andrew wrote:
Yes most all terms are multi-valued which I can't avoid.
Since the data is coming from a library catalogue I am translating a
subject field to make a subject facet. That facet alone is the biggest,
hovering near 39k. If I remove this facet.field things return faster.
So a
t: Re: facet optimizing
How many unique values do you have for those 6 fields? And are
those fields multiValued or not? Single valued facets are much
faster (though not realistic in my domain). Lots of values per field
do not good facets make.
Erik
On Feb 7, 2007, at 11:10 AM, Gu
How many unique values do you have for those 6 fields? And are
those fields multiValued or not? Single valued facets are much
faster (though not realistic in my domain). Lots of values per field
do not good facets make.
Erik
On Feb 7, 2007, at 11:10 AM, Gunther, Andrew wrote:
24 matches
Mail list logo