Thanks Yonik. It explains. Regards, Sourav
-----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik Seeley Sent: Monday, November 24, 2008 7:07 PM To: solr-user@lucene.apache.org Subject: Re: Sorting and JVM heap size .... On Mon, Nov 24, 2008 at 9:19 PM, souravm <[EMAIL PROTECTED]> wrote: > Hi Yonik, > > Thanks again for the detail input. > > Let me try to re-confirm my understanding - > > 1. What you say is - if sorting is asked for a field, the same field from all > documents, which are indexed, would be put in a memory in an un-inverted > form. So given this if I have a field of String type with say 20 characters, > then (assuming no multibyte characters - all ascii) for 200M documents I need > to have at least 20x200 MB, i.e. 4GB memory. That's the general idea, yes. For Strings, it's actually just the unique values in a String[], plus an int[200000000] of offsets into that String[] for each document. See Lucene's FieldCache and StringIndex. -Yonik > 2. So, if I want to have sorting on 2 such fields I need to allocate at least > 8 GB of memory. > > 3. Another case is - if there are 2 search requests concurrently hitting the > server, each with sorting on the same 20 character date field, then also it > would need 2x2GB memory. So if I know that I need to support at least 4 > concurrent search requests, I need to start the JVM at least with 8 GB heap > size. > > Please let me know if my understanding is correct. > > Regards, > Sourav > > -----Original Message----- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik Seeley > Sent: Monday, November 24, 2008 6:03 PM > To: solr-user@lucene.apache.org > Subject: Re: Sorting and JVM heap size .... > > On Mon, Nov 24, 2008 at 8:48 PM, souravm <[EMAIL PROTECTED]> wrote: >> I have around 200M documents in index. The field I'm sorting on is a date >> string (containing date and time in dd-mmm-yyyy hh:mm:yy format) and the >> field is part of the search criteria. >> >> Also please note that the number of documents returned by the search >> criteria is much less than 200M. In fact even in case of 0 hit I found jvm >> out of memory exception. > > Right... that's just how the Lucene FieldCache used for sorting works right > now. > The entire field is un-inverted and held in memory. > > 200M docs is a *lot*... you might try indexing your date fields as > integer types that would take only 4 bytes per doc - and that will > still take up 800M. Given that 2 searchers can overlap, that still > adds up to more than your heap - you will need to up that. > > The other option is to split your index across multiple nodes and use > distributed search. If you want to do any faceting in the future, or > sort on multiple fields, you will need to do this anyway. > > -Yonik > > **************** CAUTION - Disclaimer ***************** > This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely > for the use of the addressee(s). If you are not the intended recipient, please > notify the sender by e-mail and delete the original message. Further, you are > not > to copy, disclose, or distribute this e-mail or its contents to any other > person and > any such actions are unlawful. This e-mail may contain viruses. Infosys has > taken > every reasonable precaution to minimize this risk, but is not liable for any > damage > you may sustain as a result of any virus in this e-mail. You should carry out > your > own virus checks before opening the e-mail or attachment. Infosys reserves the > right to monitor and review the content of all messages sent to or from this > e-mail > address. Messages sent to or from this e-mail address may be stored on the > Infosys e-mail system. > ***INFOSYS******** End of Disclaimer ********INFOSYS*** >