Hi, I have an 260M documents index (90GB) with this structure:
<field name="fragment" type="text_general" indexed="true" stored="true" multiValued="false" termVectors="false" termPositions="false" termOffsets="false" /> <field name="parentId" type="long" indexed="false" stored="true" multiValued="false"/> <field name="fragmentContentType" type="string" indexed="false" stored="true" multiValued="false"/> <field name="creationDate" type="date" indexed="true" stored="true" multiValued="false"/> <field name="creationTimestamp" type="date" indexed="true" stored="true" multiValued="false"/> <field name="visibility" type="string" indexed="true" stored="true" multiValued="false"/> <field name="category" type="string" indexed="true" stored="true" multiValued="false"/> <field name="marked" type="string" indexed="true" stored="true" multiValued="false"/> <!-- catchall field, containing all other searchable text fields (implemented via copyField further on in this schema --> <field name="text" type="text_general" indexed="true" stored="false" multiValued="true"/> <copyField source="fragment" dest="text"/> <copyField source="parentId" dest="text"/> <copyField source="fragmentContentType" dest="text"/> <copyField source="creationDate" dest="text"/> <copyField source="visibility" dest="text"/> <copyField source="category" dest="text"/> <copyField source="marked" dest="text"/> where the fragmetnt field contains XML messagges. There is a search function that provide the messagges satisfying a search criterion. TARGET: To find the best configuration to optimize the response time of a two solr instances cloud with 2 VM with 8 core and 32 GB TEST RESULTS: 1. Configurations: 1. the better configuration without replicas - CONF1: 16 shards of 17M documents (8 per VM) 1. configuration with replica - CONF 2: 8 shards of 35M documents with replication factor of 1 - CONF 3: 16 shards of 35M documents with replication factor of 1 1. Executed tests - sequential requests - 5 parallel requests - 10 parallel requests - 20 parallel requests in two scenarios: during an indexing phase and not Call are: http://localhost:8983/solr/sepa/select? q=+fragment%3A*AAA*+&fq=marked%3AT&fq=-fragmentContentType %3ABULK&start=0&rows=100&sort=creationTimestamp+desc%2Cid+asc 1. Test results All the test have point out an I/O utilization of 100MB/s during loading data on disk cache, disk cache utilization of 20GB and core utilization of 100% (all 8 cores) - No indexing - CONF1 (time average and maximum time) - sequential: 4,1 6,9 - 5 parallel: 15,6 19,1 - 10 parallel: 23,6 30,2 - 20 parallel: 48 52,2 - CONF2 - sequential: 12,3 17,4 - 5 parallel: 32,5 34,2 - 10 parallel: 45,4 49 - 20 parallel: 64,6 74 - CONF3 - sequential: 6,9 9,9 - 5 parallel: 33,2 37,5 - 10 parallel: 46 51 - 20 parallel: 68 83 - Indexing (into the solr admin console is it possible to view the total throughput? I find it only relative to a single shard). CONF1 - sequential: 7,7 9,5 - 5 parallel: 26,8 28,4 - 10 parallel: 31,8 37,8 - 20 parallel: 42 52,5 - CONF2 - sequential: 12,3 19 - 5 parallel: 39 40,8 - 10 parallel: 56,6 62,9 - 20 parallel: 79 116 - CONF3 - sequential: 10 18,9 - 5 parallel: 36,5 41,9 - 10 parallel: 63,7 64,1 - 20 parallel: 85 120 I have two question: - the response times of the configuration with replica are worse (in test case of sequential requests worse of about three time) than the response times of the configuration without replica. Is it an expected result? - Why during index inserting and updating replicas doesn’t help to reduce the response time?