Hi Luca, not sure if I understood well. Your question is "Why are index times on a solr cloud collecton with 2 replicas higher than on solr cloud with 1 replica" right? Well with 2 replicas all docs have to be deparately indexed in 2 places and solr has to confirm that both indexing went well. Indexing times are lower on a solrcloud collection with 2 shards (just one replica, the leader, per shard) because docs are indexed just once and the load is spread on 2 servers instead of one
2015-12-30 2:03 GMT+01:00 Luca Quarello <lucaquare...@gmail.com>: > Hi, > > I have an 260M documents index (90GB) with this structure: > > > <field name="fragment" type="text_general" indexed="true" stored="true" > multiValued="false" termVectors="false" termPositions="false" > termOffsets="false" /> > > <field name="parentId" type="long" indexed="false" stored="true" > multiValued="false"/> > > <field name="fragmentContentType" type="string" indexed="false" > stored="true" multiValued="false"/> > > <field name="creationDate" type="date" indexed="true" stored="true" > multiValued="false"/> > > <field name="creationTimestamp" type="date" indexed="true" stored="true" > multiValued="false"/> > > <field name="visibility" type="string" indexed="true" stored="true" > multiValued="false"/> > > <field name="category" type="string" indexed="true" stored="true" > multiValued="false"/> > > <field name="marked" type="string" indexed="true" stored="true" > multiValued="false"/> > > <!-- catchall field, containing all other searchable text fields > (implemented > > via copyField further on in this schema --> > > <field name="text" type="text_general" indexed="true" stored="false" > multiValued="true"/> > > <copyField source="fragment" dest="text"/> > > <copyField source="parentId" dest="text"/> > > <copyField source="fragmentContentType" dest="text"/> > > <copyField source="creationDate" dest="text"/> > > <copyField source="visibility" dest="text"/> > > <copyField source="category" dest="text"/> > > <copyField source="marked" dest="text"/> > > > where the fragmetnt field contains XML messagges. > > There is a search function that provide the messagges satisfying a search > criterion. > > > TARGET: > > To find the best configuration to optimize the response time of a two solr > instances cloud with 2 VM with 8 core and 32 GB > > > TEST RESULTS: > > > 1. > > Configurations: > 1. > > the better configuration without replicas > - CONF1: 16 shards of 17M documents (8 per VM) > 1. > > configuration with replica > - CONF 2: 8 shards of 35M documents with replication factor of 1 > - CONF 3: 16 shards of 35M documents with replication factor of 1 > > > > 1. > > Executed tests > > > - sequential requests > - 5 parallel requests > - 10 parallel requests > - 20 parallel requests > > in two scenarios: during an indexing phase and not > > > Call are: http://localhost:8983/solr/sepa/select? > q=+fragment%3A*AAA*+&fq=marked%3AT&fq=-fragmentContentType > %3ABULK&start=0&rows=100&sort=creationTimestamp+desc%2Cid+asc > > > 1. > > Test results > > All the test have point out an I/O utilization of 100MB/s during > > loading data on disk cache, disk cache utilization of 20GB and core > utilization of 100% (all 8 cores) > > > > - > > No indexing > - > > CONF1 (time average and maximum time) > - > > sequential: 4,1 6,9 > - > > 5 parallel: 15,6 19,1 > - > > 10 parallel: 23,6 30,2 > - > > 20 parallel: 48 52,2 > - > > CONF2 > - > > sequential: 12,3 17,4 > - > > 5 parallel: 32,5 34,2 > - > > 10 parallel: 45,4 49 > - > > 20 parallel: 64,6 74 > - > > CONF3 > - > > sequential: 6,9 9,9 > - > > 5 parallel: 33,2 37,5 > - > > 10 parallel: 46 51 > - > > 20 parallel: 68 83 > > > > - > > Indexing (into the solr admin console is it possible to view the > total throughput? > I find it only relative to a single shard). > > > CONF1 > > - > > sequential: 7,7 9,5 > - > > 5 parallel: 26,8 28,4 > - > > 10 parallel: 31,8 37,8 > - > > 20 parallel: 42 52,5 > - > > CONF2 > - > > sequential: 12,3 19 > - > > 5 parallel: 39 40,8 > - > > 10 parallel: 56,6 62,9 > - > > 20 parallel: 79 116 > - > > CONF3 > - > > sequential: 10 18,9 > - > > 5 parallel: 36,5 41,9 > - > > 10 parallel: 63,7 64,1 > - > > 20 parallel: 85 120 > > > > I have two question: > > - > > the response times of the configuration with replica are worse (in test > case of sequential requests worse of about three time) than the response > times of the configuration without replica. Is it an expected result? > - Why during index inserting and updating replicas doesn’t help to > reduce the response time? >