Why tokenize the date? It sorts just fine as a string. --wunder

On 4/7/09 8:50 AM, "Erick Erickson" <erickerick...@gmail.com> wrote:

> Your observations about date sorting are probably correct. The
> issue is that the sort caches in Lucene look at the unique terms.
> There are many more unique terms (nearly every one) in
> 2008-08-12T12:18:26.510
> 
> then when the field is split. You can reduce memory consumption
> when sorting even more by splitting into more fields, but that's up
> to you to decide whether or not it's worth the effort....
> 
> Best
> Erick
> 
> On Tue, Apr 7, 2009 at 10:55 AM, Joe Pollard
> <joe.poll...@bazaarvoice.com>wrote:
> 
>> It doesn't seem to matter whether fields are stored or not, but I've
>> found a rather striking difference in the memory requirements during
>> sorting.  Sorting on a string field representing datetime like
>> '2008-08-12T12:18:26.510' is about twice as memory intense as sorting
>> first by '2008-08-12' and then by '121826'.
>> 
>> Any other tips/guidance like this would be great!
>> 
>> Thanks,
>> -Joe
>> 
>> On Mon, 2009-04-06 at 15:43 -0500, Joe Pollard wrote:
>>> To combat our frequent OutOfMemory Exceptions, I'm attempting to come up
>>> with a model so that we can determine how much memory to give Solr based
>>> on how much data we have (as we expand to more data types eligible to be
>>> supported this becomes more important).
>>> 
>>> Are there any published guidelines on how much memory a particular
>>> document takes up in memory, based on the data types, etc?
>>> 
>>> I have several stored fields, numerous other non-stored fields, a
>>> largish copyTo field, and I am doing some sorting on indexed, non-stored
>>> fields.
>>> 
>>> Any pointers would be appreciated!
>>> 
>>> Thanks,
>>> -Joe
>>> 
>> 
>> 

Reply via email to