Hi all, I am trying to solve a serious performance problem with our Solr search index. We're running under Solr 1.3. We've sharded our index into 4 shards. Index data is stored on a network mount that is accessed over Fibre Channel. Each document's text is indexed, but not stored. Each day, roughly 10K - 20K new documents are added. After a document is submitted, it is compared, sentence by sentence, against every document we have indexed in its category. It's a requirement that we keep our index as up-to-date as possible. We reload our indexes once a minute in order to miss as few matches as possible. We are not expecting to find matches, so our document cache hits rates are abysmal. We also don't expect many repeated sentences across documents, so cached query hits rates are also practically zero.
After running fine for over 9 months, the system broke down this week. The queries per second are around 17 to 18, and our paper backlog is well north of 14,000. The number of papers in the index has hit 3.7 million, and each shard is 2.3GB in size (roughly 925K papers in each index). In order to increase throughput, we tried to stand up additional read-only Solr instances pointed at the shared indexes, but got I/O errors from the secondary Solr instances when the reload time came. We tried switching the locking mechanize from single to simple, but the I/O error continued. We're running on 64-bit Linux with a 64-bit JVM (Java 1.6.something), with 4GB of RAM assigned to each Solr instance. Has anyone else seen a problem like this before? Can anyone suggest any solutions? Will Solr 1.4 help (and is Solr 1.4 ready for production use)? Any answers would be greatly appreciated. Thanks, Jon -- View this message in context: http://www.nabble.com/Solr-Performance-bottleneck-tp23209595p23209595.html Sent from the Solr - User mailing list archive at Nabble.com.