Why tokenize the date? It sorts just fine as a string. --wunder On 4/7/09 8:50 AM, "Erick Erickson" <erickerick...@gmail.com> wrote:
> Your observations about date sorting are probably correct. The > issue is that the sort caches in Lucene look at the unique terms. > There are many more unique terms (nearly every one) in > 2008-08-12T12:18:26.510 > > then when the field is split. You can reduce memory consumption > when sorting even more by splitting into more fields, but that's up > to you to decide whether or not it's worth the effort.... > > Best > Erick > > On Tue, Apr 7, 2009 at 10:55 AM, Joe Pollard > <joe.poll...@bazaarvoice.com>wrote: > >> It doesn't seem to matter whether fields are stored or not, but I've >> found a rather striking difference in the memory requirements during >> sorting. Sorting on a string field representing datetime like >> '2008-08-12T12:18:26.510' is about twice as memory intense as sorting >> first by '2008-08-12' and then by '121826'. >> >> Any other tips/guidance like this would be great! >> >> Thanks, >> -Joe >> >> On Mon, 2009-04-06 at 15:43 -0500, Joe Pollard wrote: >>> To combat our frequent OutOfMemory Exceptions, I'm attempting to come up >>> with a model so that we can determine how much memory to give Solr based >>> on how much data we have (as we expand to more data types eligible to be >>> supported this becomes more important). >>> >>> Are there any published guidelines on how much memory a particular >>> document takes up in memory, based on the data types, etc? >>> >>> I have several stored fields, numerous other non-stored fields, a >>> largish copyTo field, and I am doing some sorting on indexed, non-stored >>> fields. >>> >>> Any pointers would be appreciated! >>> >>> Thanks, >>> -Joe >>> >> >>