On Wed, Aug 25, 2010 at 10:07 AM, Eric Grobler <impalah...@googlemail.com> wrote: > I use Solr 1.41 > There are 14000 cities in the index. > The type is just a simple string: <fieldType name="string" > class="solr.StrField" sortMissingLast="true" omitNorms="true"/> > The facet method is fc. > > You are right I do not need 5000 cities, I was just surprised to see this > big difference, there are places where I do need to sort count and return > about 500 items. > > If Solr was also slow in locating the highest count city it would be less > surprising. > In other words, if I set the limit to 1, then solr returns Berlin as the > city with the highest count within 3ms which seems to indicate that the > facet is internally sorted by count. > However, the speed regresses linearly, 30ms for 10, 300ms for 1000 etc.
The priority queue collecting values will be larger of course, but in this specific instance I bet most of the time is being taken up in converting from term number to term value. Here's a snippet of a comment from the implementation: * To further save memory, the terms (the actual string values) are not all stored in * memory, but a TermIndex is used to convert term numbers to term values only * for the terms needed after faceting has completed. Only every 128th term value * is stored, along with it's corresponding term number, and this is used as an * index to find the closest term and iterate until the desired number is hit (very * much like Lucene's own internal term index). This is something that Lucene has improved in trunk, and that solr can make improvements to also. Besides optimizations, we could also implement options to store all values and eliminate the need to read the index to do the ord->string conversions. -Yonik http://lucenerevolution.org Lucene/Solr Conference, Boston Oct 7-8 > Regards > Eric > > On Wed, Aug 25, 2010 at 3:28 PM, Yonik Seeley <yo...@lucidimagination.com> > wrote: >> >> On Wed, Aug 25, 2010 at 7:22 AM, Eric Grobler <impalah...@googlemail.com> >> wrote: >> > There is a huge difference doing facet sorting on lex vs count >> > The strange thing is that count sorting is fast when setting a small >> > limit. >> > I realize I can do sorting in the client, but I am just curious why this >> > is. >> >> There are a lot of optimizations to make things fast for the common >> case - and setting a really high limit makes some of those >> ineffective. Hopefully you don't really need to return the top 5000 >> cities? >> What version of Solr is this? What faceting method is used? Is this a >> multi-valued field? How many unique values are in the city field? >> How many docs in the index? >> >> -Yonik >> http://lucenerevolution.org Lucene/Solr Conference, Boston Oct 7-8 >> >> >> > FAST - 16ms >> > facet.field=city >> > f.city.facet.limit=5000 >> > f.city.facet.sort=lex >> > >> > FAST - 20 ms >> > facet.field=city >> > f.city.facet.limit=50 >> > f.city.facet.sort=count >> > >> > SLOW - over 1 second >> > facet.field=city >> > f.city.facet.limit=5000 >> > f.city.facet.sort=count >> > >> > Regards >> > ericz >> > > >