On Mon, Aug 3, 2009 at 8:26 PM, Mark Bennett<mbenn...@ideaeng.com> wrote: > Yonik, can you confirm reasoning below for 1.4 for a text field?
The bit about warming? Looks right to me - a big base docset can trigger short-circuit logic in the enum faceting code... using a docset of size 1 currently avoids this. -Yonik http://www.lucidimagination.com > ( Of course faceting is so much faster in 1.4 anyway, it's probably worth > the upgrade. > https://issues.apache.org/jira/browse/SOLR-475 ) > > A warning for folks NOT using 1.4: > > At the bottom of this wiki page: (very bottom) > http://wiki.apache.org/solr/SimpleFacetParameters > It says: > Warming > facet.field queries using the term enumeration method can avoid the > evaluation of some terms for greater efficiency. To force the evaluation of > all terms for warming, the base query should match a single document. > > I think this is OK in the newer version, because as of 1.4 the default is > "fc", not "enum". But prior to 1.4 there was no fc! > > Wiki info on the default (enum vs. fc) > http://wiki.apache.org/solr/SimpleFacetParameters > > facet.method > This parameter indicates what type of algorithm/method to use when > faceting a field. > > enum > Enumerates all terms in a field, calculating the set intersection of > documents that match the term with documents that match the query. This was > the default (and only) method for faceting multi-valued fields prior to Solr > 1.4. > > fc (stands for field cache) > The facet counts are calculated by iterating over documents that match > the query and summing the terms that appear in each document. This was the > default method for single valued fields prior to Solr 1.4. > > The default value is fc (except for BoolField) since it tends to use less > memory and is faster when a field has many unique terms in the index. > > > -- > Mark Bennett / New Idea Engineering, Inc. / mbenn...@ideaeng.com > Direct: 408-733-0387 / Main: 866-IDEA-ENG / Cell: 408-829-6513 > > > On Mon, Aug 3, 2009 at 2:49 PM, Yonik Seeley > <yo...@lucidimagination.com>wrote: > >> Sounds like faceting? >> q=state:CA&facet=true&facet.field=title&facet.limit=1000 >> >> -Yonik >> http://www.lucidimagination.com >> >> >> On Mon, Aug 3, 2009 at 5:39 PM, Mark Bennett<mbenn...@ideaeng.com> wrote: >> > You can get a nice list of terms for a field using the Luke handler: >> > http://localhost:8983/solr/admin/luke?fl=title&numTerms=1000 >> > >> > But what I'd really like is to get the terms for the docs that match a >> > particular slice of the index. >> > >> > For example, let's say I have records for all 50 states, but I want to >> get >> > the top 1,000 terms for documents in California. >> > >> > I'd like to add q or fq like this: >> > >> http://localhost:8983/solr/admin/luke?fl=title&numTerms=1000&q=state:CA >> > OR >> > >> http://localhost:8983/solr/admin/luke?fl=title&numTerms=1000&fq=state:CA >> > >> > Although I don't get any errors, this syntax doesn't seem to filter the >> > terms. Not a bug, nobody ever said it would. >> > >> > But has anybody written a utility to get term instances for a subset of >> the >> > index, based on a query? And to be clear, I was hoping to get all of the >> > terms in matching documents, not just terms that are also present in the >> > query. >> > >> > Thanks, >> > Mark >> > >> > -- >> > Mark Bennett / New Idea Engineering, Inc. / mbenn...@ideaeng.com >> > Direct: 408-733-0387 / Main: 866-IDEA-ENG / Cell: 408-829-6513 >> > >> >