Hi Erik, Thanks for the tip. Hmmmm, well that's a good point, or maybe I will just do the word filtering upfront and store it separately now that I think about it more.
Darren On Thu, 2009-07-30 at 13:05 -0400, Erik Hatcher wrote: > On Jul 30, 2009, at 1:00 PM, Shalin Shekhar Mangar wrote: > > > On Thu, Jul 30, 2009 at 9:53 PM, <dar...@ontrenet.com> wrote: > > > >> Hi, > >> I am exploring the faceted search results of Solr. My query is like > >> this. > >> > >> > >> http://localhost:8983/solr/select?q=*:*&facet=true&facet.field=text&facet.limit=500&facet.prefix=wick > >> > >> If I don't use the prefix, I get back totals for words like 1,a,of, > >> 2,3,4. > >> 1 letter/number occurrences in my documents. Its not really useful > >> since > >> all the documents have some free floating single-digit numbers. > >> > >> Is there a way to restrict the word frequency results for a facet > >> based on > >> the length so I can set it to > 3 or is there a better way? > >> > > > > Yes, you can specify facet.mincount=3 to return only those terms > > present in > > more than 3 documents. On a related note, a tokenized field (such as > > text > > type in the example schema) will create a large number of unqiue > > terms. > > Faceting on such a field may not be very useful and/or efficient. > > Typically > > faceting is done on untokenized fields (such as string type). > > I think what was meant by > 3 was if faceting only returned terms of > length greater than 3, not count. > > You could copyField your text field to another field, set the analyzer > to include a LengthFilterFactory with a minimum length specified, and > also have other analysis tweaks to have numbers and other stop words > removed. > > Erik >