First of all thanks for your answers. Those OOMEs are pretty nasty for our production environment. I didn't try the solution of ordering by function as it was a solr 1.5 feature and we prefer to use a stable version 1.4.
I made a temporary patch that it looks is working fine. I patched the lucene-core-2.9.1 source code adding those this lines in the abstract static class Cache's get method ... public Object get(IndexReader reader, Entry key) throws IOException { Map innerCache; Object value; + final Object readerKey = reader.getFieldCacheKey(); + CacheEntry[] cacheEntries = wrapper.getCacheEntries(); + if(cacheEntries.length>A_TUNED_INT_VALUE){ + readerCache.clear(); + } ... I didn't notice any delay or concurrence problem. On 22 June 2010 07:27, Lance Norskog <goks...@gmail.com> wrote: > No, this is basic to how Lucene works. You will need larger EC2 instances. > > On Mon, Jun 21, 2010 at 2:08 AM, Matteo Fiandesio > <matteo.fiande...@gmail.com> wrote: >> Compiling solr with lucene 2.9.3 instead of 2.9.1 will solve this issue? >> Regards, >> Matteo >> >> On 19 June 2010 02:28, Lance Norskog <goks...@gmail.com> wrote: >>> The Lucene implementation of sorting creates an array of four-byte >>> ints for every document in the index, and another array of the unique >>> values in the field. >>> If the timestamps are 'date' or 'tdate' in the schema, they do not >>> need the second array. >>> >>> You can also sort by a field's with a function query. This does not >>> build the arrays, but might be a little slower. >>> Yes, the sort arrays (and also facet values for a field) should be >>> controlled by a fixed-size cache, but they are not. >>> >>> On Fri, Jun 18, 2010 at 7:52 AM, Matteo Fiandesio >>> <matteo.fiande...@gmail.com> wrote: >>>> Hello, >>>> we are experiencing OOM exceptions in our single core solr instance >>>> (on a (huge) amazon EC2 machine). >>>> We investigated a lot in the mailing list and through jmap/jhat dump >>>> analyzing and the problem resides in the lucene FieldCache that fills >>>> the heap and blows up the server. >>>> >>>> Our index is quite small but we have a lot of sort queries on fields >>>> that are dynamic,of type long representing timestamps and are not >>>> present in all the documents. >>>> Those queries apply sorting on 12-15 of those fields. >>>> >>>> We are using solr 1.4 in production and the dump shows a lot of >>>> Integer/Character and Byte Array filled up with 0s. >>>> With solr's trunk code things does not change. >>>> >>>> In the mailing list we saw a lot of messages related to this issues: >>>> we tried truncating the dates to day precision,using missingSortLast = >>>> true,changing the field type from slong to long,setting autowarming to >>>> different values,disabling and enabling caches with different values >>>> but we did not manage to solve the problem. >>>> >>>> We were thinking to implement an LRUFieldCache field type to manage >>>> the FieldCache as an LRU and preventing but, before starting a new >>>> development, we want to be sure that we are not doing anything wrong >>>> in the solr configuration or in the index generation. >>>> >>>> Any help would be appreciated. >>>> Regards, >>>> Matteo >>>> >>> >>> >>> >>> -- >>> Lance Norskog >>> goks...@gmail.com >>> >> > > > > -- > Lance Norskog > goks...@gmail.com >