On Jul 30, 2009, at 1:00 PM, Shalin Shekhar Mangar wrote:

On Thu, Jul 30, 2009 at 9:53 PM, <dar...@ontrenet.com> wrote:

Hi,
I am exploring the faceted search results of Solr. My query is like this.


http://localhost:8983/solr/select?q=*:*&facet=true&facet.field=text&facet.limit=500&facet.prefix=wick

If I don't use the prefix, I get back totals for words like 1,a,of, 2,3,4. 1 letter/number occurrences in my documents. Its not really useful since
all the documents have some free floating single-digit numbers.

Is there a way to restrict the word frequency results for a facet based on
the length so I can set it to > 3 or is there a better way?


Yes, you can specify facet.mincount=3 to return only those terms present in more than 3 documents. On a related note, a tokenized field (such as text type in the example schema) will create a large number of unqiue terms. Faceting on such a field may not be very useful and/or efficient. Typically
faceting is done on untokenized fields (such as string type).

I think what was meant by > 3 was if faceting only returned terms of length greater than 3, not count.

You could copyField your text field to another field, set the analyzer to include a LengthFilterFactory with a minimum length specified, and also have other analysis tweaks to have numbers and other stop words removed.

        Erik

Reply via email to