Re: Using Luke to get terms for docs matching a specific query filter?

Mark Bennett Mon, 03 Aug 2009 17:27:43 -0700

Yonik, can you confirm reasoning below for 1.4 for a text field?

( Of course faceting is so much faster in 1.4 anyway, it's probably worth
the upgrade.
     https://issues.apache.org/jira/browse/SOLR-475  )


A warning for folks NOT using 1.4:

At the bottom of this wiki page: (very bottom)
    http://wiki.apache.org/solr/SimpleFacetParameters
It says:
    Warming
    facet.field queries using the term enumeration method can avoid the
evaluation of some terms for greater efficiency. To force the evaluation of
all terms for warming, the base query should match a single document.

I think this is OK in the newer version, because as of 1.4 the default is
"fc", not "enum".  But prior to 1.4 there was no fc!

Wiki info on the default (enum vs. fc)
    http://wiki.apache.org/solr/SimpleFacetParameters

facet.method
    This parameter indicates what type of algorithm/method to use when
faceting a field.

enum
    Enumerates all terms in a field, calculating the set intersection of
documents that match the term with documents that match the query. This was
the default (and only) method for faceting multi-valued fields prior to Solr
1.4.

fc (stands for field cache)
    The facet counts are calculated by iterating over documents that match
the query and summing the terms that appear in each document. This was the
default method for single valued fields prior to Solr 1.4.

The default value is fc (except for BoolField) since it tends to use less
memory and is faster when a field has many unique terms in the index.


--
Mark Bennett / New Idea Engineering, Inc. / mbenn...@ideaeng.com
Direct: 408-733-0387 / Main: 866-IDEA-ENG / Cell: 408-829-6513


On Mon, Aug 3, 2009 at 2:49 PM, Yonik Seeley <yo...@lucidimagination.com>wrote:

> Sounds like faceting?
> q=state:CA&facet=true&facet.field=title&facet.limit=1000
>
> -Yonik
> http://www.lucidimagination.com
>
>
> On Mon, Aug 3, 2009 at 5:39 PM, Mark Bennett<mbenn...@ideaeng.com> wrote:
> > You can get a nice list of terms for a field using the Luke handler:
> >    http://localhost:8983/solr/admin/luke?fl=title&numTerms=1000
> >
> > But what I'd really like is to get the terms for the docs that match a
> > particular slice of the index.
> >
> > For example, let's say I have records for all 50 states, but I want to
> get
> > the top 1,000 terms for documents in California.
> >
> > I'd like to add q or fq like this:
> >
> http://localhost:8983/solr/admin/luke?fl=title&numTerms=1000&q=state:CA
> >        OR
> >
> http://localhost:8983/solr/admin/luke?fl=title&numTerms=1000&fq=state:CA
> >
> > Although I don't get any errors, this syntax doesn't seem to filter the
> > terms.  Not a bug, nobody ever said it would.
> >
> > But has anybody written a utility to get term instances for a subset of
> the
> > index, based on a query?  And to be clear, I was hoping to get all of the
> > terms in matching documents, not just terms that are also present in the
> > query.
> >
> > Thanks,
> > Mark
> >
> > --
> > Mark Bennett / New Idea Engineering, Inc. / mbenn...@ideaeng.com
> > Direct: 408-733-0387 / Main: 866-IDEA-ENG / Cell: 408-829-6513
> >
>

Re: Using Luke to get terms for docs matching a specific query filter?

Reply via email to