On Jul 30, 2009, at 1:00 PM, Shalin Shekhar Mangar wrote:
On Thu, Jul 30, 2009 at 9:53 PM, <dar...@ontrenet.com> wrote:
Hi,
I am exploring the faceted search results of Solr. My query is like
this.
http://localhost:8983/solr/select?q=*:*&facet=true&facet.field=text&facet.limit=500&facet.prefix=wick
If I don't use the prefix, I get back totals for words like 1,a,of,
2,3,4.
1 letter/number occurrences in my documents. Its not really useful
since
all the documents have some free floating single-digit numbers.
Is there a way to restrict the word frequency results for a facet
based on
the length so I can set it to > 3 or is there a better way?
Yes, you can specify facet.mincount=3 to return only those terms
present in
more than 3 documents. On a related note, a tokenized field (such as
text
type in the example schema) will create a large number of unqiue
terms.
Faceting on such a field may not be very useful and/or efficient.
Typically
faceting is done on untokenized fields (such as string type).
I think what was meant by > 3 was if faceting only returned terms of
length greater than 3, not count.
You could copyField your text field to another field, set the analyzer
to include a LengthFilterFactory with a minimum length specified, and
also have other analysis tweaks to have numbers and other stop words
removed.
Erik