Dear List, We are using solr-4.2 to build an index of 5M docs each limited to 6K in size. Conceptually we are modelling a stack of documents. Here is a excerpt from our schema.xml
<dynamicField name="publicationBody_*" type="string" indexed="false" stored="true" multiValued="false" termVectors="false" /> <copyField source="publicationBody_*" dest="publicationBodies"/> We have publicationBody_1: ..., publicationBody_2: ... maximum of 30 with max 10K of data in each. We run this index in 8 solr sharded in 8 solr cores on a single host an m2.4xlarge EC2 instances. We do not use zookeeper (because of operational issues on our live indexes) and manage the sharding ourselves. For this index we run with -Xmx30G and observe in (jsconsole) that the solr runs with approximately 25G. Autocommit kills solr, it sends heap memory usage to max and kills solr. The reason appears to be committing to all cores in parallel. Disabling autoCommit and running a loop like while(true); do for i in $(seq 0 7); do curl -s "http://localhost:8085/solr/core${i}/update?commit=true&wt=json" done produces: {"responseHeader":{"status":0,"QTime":8297}} {"responseHeader":{"status":0,"QTime":8358}} {"responseHeader":{"status":0,"QTime":9552}} {"responseHeader":{"status":0,"QTime":8368}} {"responseHeader":{"status":0,"QTime":9296}} {"responseHeader":{"status":0,"QTime":8527}} {"responseHeader":{"status":0,"QTime":9458}} {"responseHeader":{"status":0,"QTime":8929}} 8 seconds to process a commit where with no changes to the index!?! Transaction Logs ------------------------ 55M /mnt/solr-stack/solr.data.0/tlog 45M /mnt/solr-stack/solr.data.1/tlog 28M /mnt/solr-stack/solr.data.2/tlog 17M /mnt/solr-stack/solr.data.3/tlog 118M /mnt/solr-stack/solr.data.4/tlog 123M /mnt/solr-stack/solr.data.5/tlog 68M /mnt/solr-stack/solr.data.6/tlog 63M /mnt/solr-stack/solr.data.7/tlog Index ------- 2.8G /mnt/solr-stack/solr.data.0/index 2.7G /mnt/solr-stack/solr.data.1/index 3.2G /mnt/solr-stack/solr.data.2/index 2.7G /mnt/solr-stack/solr.data.3/index 3.1G /mnt/solr-stack/solr.data.4/index 2.7G /mnt/solr-stack/solr.data.5/index 2.9G /mnt/solr-stack/solr.data.6/index 3.0G /mnt/solr-stack/solr.data.7/index Why does solr need such a large heap space for this index (it dies with 10G and 20G and is constant at 28G in jconsole)? Why does running a commits in parallel via autoCommit or the command exhaust the memory? Are we using dynamic fields incorrectly? We have also tried to run the same index on an SSD-disk backed hi1.4xlarge Amazon instance. Here autoCommit every 30 seconds works, rotating transaction logs files correctly. -- Raghav Senior backend developer - www.issuu.com