We have a distributed setup that has been experiencing glacially slow commit times on only some of the shards. (10s on a good shard, 263s on a slow shard.) Each shard for this index has about 10GB of lucene index data and the documents are segregated by an md5 hash, so the distribution of document/data types should be equal across all shards. I've turned off our postcommit hooks to isolate the problem, so it's not a snapshot run amok or anything. I also moved the indexes over to new machines and the same indexes that were slow in production are also slow on the test machines. During the slow commit, the jetty process is 100% CPU / 50% RAM on a 8GB quad core machine. The slow commit happens every time after I add at least one document. (If I don't add any documents the commit is immediate.)
What can I do to look into this problem?