Dear List,

We are using solr-4.2 to build an index of 5M docs each limited to 6K
in size. Conceptually we are modelling a stack of documents. Here is a
excerpt from our schema.xml

       <dynamicField name="publicationBody_*"   type="string"
indexed="false" stored="true"  multiValued="false" termVectors="false"
/>
       <copyField    source="publicationBody_*"   dest="publicationBodies"/>

We have publicationBody_1: ..., publicationBody_2: ... maximum of 30
with max 10K of data in each.

We run this index in 8 solr sharded in 8 solr cores on a single host
an m2.4xlarge EC2 instances. We do not use zookeeper (because of
operational issues on our live indexes) and manage the sharding
ourselves.

For this index we run with -Xmx30G and observe in (jsconsole) that the
solr runs with approximately 25G.
Autocommit kills solr, it sends heap memory usage to max and kills
solr. The reason appears to be committing to all cores in parallel.
Disabling autoCommit and  running a loop like
    while(true); do for i in $(seq 0 7); do curl -s
"http://localhost:8085/solr/core${i}/update?commit=true&wt=json"; done

produces:

{"responseHeader":{"status":0,"QTime":8297}}
{"responseHeader":{"status":0,"QTime":8358}}
{"responseHeader":{"status":0,"QTime":9552}}
{"responseHeader":{"status":0,"QTime":8368}}
{"responseHeader":{"status":0,"QTime":9296}}
{"responseHeader":{"status":0,"QTime":8527}}
{"responseHeader":{"status":0,"QTime":9458}}
{"responseHeader":{"status":0,"QTime":8929}}

8 seconds to process a commit where with no changes to the index!?!

Transaction Logs
------------------------
55M     /mnt/solr-stack/solr.data.0/tlog
45M     /mnt/solr-stack/solr.data.1/tlog
28M     /mnt/solr-stack/solr.data.2/tlog
17M     /mnt/solr-stack/solr.data.3/tlog
118M    /mnt/solr-stack/solr.data.4/tlog
123M    /mnt/solr-stack/solr.data.5/tlog
68M     /mnt/solr-stack/solr.data.6/tlog
63M     /mnt/solr-stack/solr.data.7/tlog

Index
-------
2.8G    /mnt/solr-stack/solr.data.0/index
2.7G    /mnt/solr-stack/solr.data.1/index
3.2G    /mnt/solr-stack/solr.data.2/index
2.7G    /mnt/solr-stack/solr.data.3/index
3.1G    /mnt/solr-stack/solr.data.4/index
2.7G    /mnt/solr-stack/solr.data.5/index
2.9G    /mnt/solr-stack/solr.data.6/index
3.0G    /mnt/solr-stack/solr.data.7/index

Why does solr need such a large heap space for this index (it dies
with 10G and 20G and is constant at 28G in jconsole)?
Why does running a commits in parallel via autoCommit or the command
exhaust the memory?
Are we using dynamic fields incorrectly?

We have also tried to run the same index on an SSD-disk backed
hi1.4xlarge Amazon instance. Here autoCommit every 30 seconds works,
rotating transaction logs files correctly.

--
Raghav
Senior backend developer - www.issuu.com

Reply via email to