It does end up in the right order (sorted), but it's very expensive. Sorting by a couple fields that each have fewer unique index values seems to limit the memory consumption greatly.
-----Original Message----- From: Walter Underwood [mailto:wunderw...@netflix.com] Sent: Tuesday, April 07, 2009 11:12 AM To: solr-user@lucene.apache.org Subject: Re: Coming up with a model of memory usage Why tokenize the date? It sorts just fine as a string. --wunder On 4/7/09 8:50 AM, "Erick Erickson" <erickerick...@gmail.com> wrote: > Your observations about date sorting are probably correct. The > issue is that the sort caches in Lucene look at the unique terms. > There are many more unique terms (nearly every one) in > 2008-08-12T12:18:26.510 > > then when the field is split. You can reduce memory consumption > when sorting even more by splitting into more fields, but that's up > to you to decide whether or not it's worth the effort.... > > Best > Erick > > On Tue, Apr 7, 2009 at 10:55 AM, Joe Pollard > <joe.poll...@bazaarvoice.com>wrote: > >> It doesn't seem to matter whether fields are stored or not, but I've >> found a rather striking difference in the memory requirements during >> sorting. Sorting on a string field representing datetime like >> '2008-08-12T12:18:26.510' is about twice as memory intense as sorting >> first by '2008-08-12' and then by '121826'. >> >> Any other tips/guidance like this would be great! >> >> Thanks, >> -Joe >> >> On Mon, 2009-04-06 at 15:43 -0500, Joe Pollard wrote: >>> To combat our frequent OutOfMemory Exceptions, I'm attempting to come up >>> with a model so that we can determine how much memory to give Solr based >>> on how much data we have (as we expand to more data types eligible to be >>> supported this becomes more important). >>> >>> Are there any published guidelines on how much memory a particular >>> document takes up in memory, based on the data types, etc? >>> >>> I have several stored fields, numerous other non-stored fields, a >>> largish copyTo field, and I am doing some sorting on indexed, non-stored >>> fields. >>> >>> Any pointers would be appreciated! >>> >>> Thanks, >>> -Joe >>> >> >>