Hi Yonik,

Thanks again for the detail input.

Let me try to re-confirm my understanding -

1. What you say is - if sorting is asked for a field, the same field from all 
documents, which are indexed, would be put in a memory in an un-inverted form. 
So given this if I have a field of String type with say 20 characters, then 
(assuming no multibyte characters - all ascii) for 200M documents I need to 
have at least 20x200 MB, i.e. 4GB memory.

2. So, if I want to have sorting on 2 such fields I need to allocate at least 8 
GB of memory.

3. Another case is - if there are 2 search requests concurrently hitting the 
server, each with sorting on the same 20 character date field, then also it 
would need 2x2GB memory. So if I know that I need to support at least 4 
concurrent search requests, I need to start the JVM at least with 8 GB heap 
size. 

Please let me know if my understanding is correct.

Regards,
Sourav

-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik Seeley
Sent: Monday, November 24, 2008 6:03 PM
To: solr-user@lucene.apache.org
Subject: Re: Sorting and JVM heap size ....

On Mon, Nov 24, 2008 at 8:48 PM, souravm <[EMAIL PROTECTED]> wrote:
> I have around 200M documents in index. The field I'm sorting on is a date 
> string (containing date and time in dd-mmm-yyyy  hh:mm:yy format) and the 
> field is part of the search criteria.
>
> Also please note that the number of documents returned by the search criteria 
> is much less than 200M. In fact even in case of 0 hit I found jvm out of 
> memory exception.

Right... that's just how the Lucene FieldCache used for sorting works right now.
The entire field is un-inverted and held in memory.

200M docs is a *lot*... you might try indexing your date fields as
integer types that would take only 4 bytes per doc - and that will
still take up 800M.  Given that 2 searchers can overlap, that still
adds up to more than your heap - you will need to up that.

The other option is to split your index across multiple nodes and use
distributed search.  If you want to do any faceting in the future, or
sort on multiple fields, you will need to do this anyway.

-Yonik

**************** CAUTION - Disclaimer *****************
This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely 
for the use of the addressee(s). If you are not the intended recipient, please 
notify the sender by e-mail and delete the original message. Further, you are 
not 
to copy, disclose, or distribute this e-mail or its contents to any other 
person and 
any such actions are unlawful. This e-mail may contain viruses. Infosys has 
taken 
every reasonable precaution to minimize this risk, but is not liable for any 
damage 
you may sustain as a result of any virus in this e-mail. You should carry out 
your 
own virus checks before opening the e-mail or attachment. Infosys reserves the 
right to monitor and review the content of all messages sent to or from this 
e-mail 
address. Messages sent to or from this e-mail address may be stored on the 
Infosys e-mail system.
***INFOSYS******** End of Disclaimer ********INFOSYS***

Reply via email to