Take a look at Solr's use of DocValues: https://cwiki.apache.org/confluence/display/solr/DocValues.
There are docValues options that use less memory then the FieldCache. Joel Bernstein Search Engineer at Heliosearch On Thu, May 15, 2014 at 6:39 AM, Jeongseok Son <invictu...@gmail.com> wrote: > Hello, I'm struggling with large data indexed and searched by Solr. > > The schema of the documents consist of date(YYYY-MM-DD), text(tokenized and > indexed with Natural Language Toolkit), and several numerical fields. > > Each document is small-sized but but the number of the docs is very large, > which is around 10 million per each date. The server has 32GB of memory and > I allocated around 30GB for Solr JVM. > > My Solr server has to return documents sorted by one of the numerical > fields when is requested with specific date and text.(ex. > q=date:YYYY-MM-DD+text:KEYWORD) The problem is that sorting in Lucene > requires lots of Field Cache and Solr can't handle Field Cache well. The > Field Cache is getting larger as more queries are executed and is not > evicted. When the whole memory is filled with Field Cache, Solr server > stops or generates Out of Memory exception. > > Solr cannot control Lucene field cache at all so I have a difficult time to > solve this problem. I'm considering these three ways to solve this. > > 1) Add more memory. > This can relieve the problem but I don't think it can completely solve it. > Anyway the memory would fill up with field cache as the server handles > search requests. > 2) Separate numerical data from text data > I find Solr/Lucene isn't suitable for sorting large numerical data. > Therefore I'm thinking of storing numerical data in another DB(HBase, > MongoDB ...), then Solr server will just do some text search. > 3) Switching to Elasticsearch > According to this page( > > http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-fielddata.html > ) > Elasticsearch can control field cache. I think ES could solve my > problem. > > I'm likely to try 2nd, or 3rd way. Are these appropriate solutions? If you > have any better ideas please let me know. I've went through too many > troubles so it's time to make a decision. I want my choices reviewed by > many other excellent Solr users and developers and also want to find better > solutions. > I really appreciate any help you can provide. >