Hi Matteo, the questions are two: - "Why are response times on a solr cloud collecton with 1 replica higher than on solr cloud without replica"
Configuration1: solrCloud with two 8 cores VMs each with 8 shards of 17M docs Configuration2: solrClous with two 8 cores VMs each with 8 shards of 17M docs (8 master and 8 replicas) I registered worst response time for replicas configuration (conf2) when: - Scenario1: I do queries without inserting record into the index - Scenario2: I do queries inserting record into the index I expect similar response times in Scenario1 and better response times for configuration2 in Scenario2. Is it correct? Thanks, Luca On Fri, Jan 8, 2016 at 3:56 PM, Luca Quarello <lucaquare...@gmail.com> wrote: > Hi Erick, > I used solr5.3.1 and I sincerely expected response times with replica > configuration near to response times without replica configuration. > > Do you agree with me? > > I read here > http://lucene.472066.n3.nabble.com/Solr-Cloud-Query-Scaling-td4110516.html > that > "Queries do not need to be routed to leaders; they can be handled by any > replica in a shard. Leaders are only needed for handling update requests. > " > > I haven't found this behaviour. In my case CONF2 e CONF3 have all replicas > on VM2 but analyzing core utilization during a request is 100% on both > machines. Why? > > Best, > Luca > > On Tue, Jan 5, 2016 at 5:08 PM, Erick Erickson <erickerick...@gmail.com> > wrote: > >> What version of Solr? Prior to 5.2 the replicas were doing lots of >> unnecessary work/being blocked, see: >> >> https://lucidworks.com/blog/2015/06/10/indexing-performance-solr-5-2-now-twice-fast/ >> >> Best, >> Erick >> >> On Tue, Jan 5, 2016 at 6:09 AM, Matteo Grolla <matteo.gro...@gmail.com> >> wrote: >> > Hi Luca, >> > not sure if I understood well. Your question is >> > "Why are index times on a solr cloud collecton with 2 replicas higher >> than >> > on solr cloud with 1 replica" right? >> > Well with 2 replicas all docs have to be deparately indexed in 2 places >> and >> > solr has to confirm that both indexing went well. >> > Indexing times are lower on a solrcloud collection with 2 shards (just >> one >> > replica, the leader, per shard) because docs are indexed just once and >> the >> > load is spread on 2 servers instead of one >> > >> > 2015-12-30 2:03 GMT+01:00 Luca Quarello <lucaquare...@gmail.com>: >> > >> >> Hi, >> >> >> >> I have an 260M documents index (90GB) with this structure: >> >> >> >> >> >> <field name="fragment" type="text_general" indexed="true" stored="true" >> >> multiValued="false" termVectors="false" termPositions="false" >> >> termOffsets="false" /> >> >> >> >> <field name="parentId" type="long" indexed="false" stored="true" >> >> multiValued="false"/> >> >> >> >> <field name="fragmentContentType" type="string" indexed="false" >> >> stored="true" multiValued="false"/> >> >> >> >> <field name="creationDate" type="date" indexed="true" stored="true" >> >> multiValued="false"/> >> >> >> >> <field name="creationTimestamp" type="date" indexed="true" >> stored="true" >> >> multiValued="false"/> >> >> >> >> <field name="visibility" type="string" indexed="true" stored="true" >> >> multiValued="false"/> >> >> >> >> <field name="category" type="string" indexed="true" stored="true" >> >> multiValued="false"/> >> >> >> >> <field name="marked" type="string" indexed="true" stored="true" >> >> multiValued="false"/> >> >> >> >> <!-- catchall field, containing all other searchable text fields >> >> (implemented >> >> >> >> via copyField further on in this schema --> >> >> >> >> <field name="text" type="text_general" indexed="true" stored="false" >> >> multiValued="true"/> >> >> >> >> <copyField source="fragment" dest="text"/> >> >> >> >> <copyField source="parentId" dest="text"/> >> >> >> >> <copyField source="fragmentContentType" dest="text"/> >> >> >> >> <copyField source="creationDate" dest="text"/> >> >> >> >> <copyField source="visibility" dest="text"/> >> >> >> >> <copyField source="category" dest="text"/> >> >> >> >> <copyField source="marked" dest="text"/> >> >> >> >> >> >> where the fragmetnt field contains XML messagges. >> >> >> >> There is a search function that provide the messagges satisfying a >> search >> >> criterion. >> >> >> >> >> >> TARGET: >> >> >> >> To find the best configuration to optimize the response time of a two >> solr >> >> instances cloud with 2 VM with 8 core and 32 GB >> >> >> >> >> >> TEST RESULTS: >> >> >> >> >> >> 1. >> >> >> >> Configurations: >> >> 1. >> >> >> >> the better configuration without replicas >> >> - CONF1: 16 shards of 17M documents (8 per VM) >> >> 1. >> >> >> >> configuration with replica >> >> - CONF 2: 8 shards of 35M documents with replication factor of 1 >> >> - CONF 3: 16 shards of 35M documents with replication factor >> of 1 >> >> >> >> >> >> >> >> 1. >> >> >> >> Executed tests >> >> >> >> >> >> - sequential requests >> >> - 5 parallel requests >> >> - 10 parallel requests >> >> - 20 parallel requests >> >> >> >> in two scenarios: during an indexing phase and not >> >> >> >> >> >> Call are: http://localhost:8983/solr/sepa/select? >> >> q=+fragment%3A*AAA*+&fq=marked%3AT&fq=-fragmentContentType >> >> %3ABULK&start=0&rows=100&sort=creationTimestamp+desc%2Cid+asc >> >> >> >> >> >> 1. >> >> >> >> Test results >> >> >> >> All the test have point out an I/O utilization of 100MB/s >> during >> >> >> >> loading data on disk cache, disk cache utilization of 20GB and core >> >> utilization of 100% (all 8 cores) >> >> >> >> >> >> >> >> - >> >> >> >> No indexing >> >> - >> >> >> >> CONF1 (time average and maximum time) >> >> - >> >> >> >> sequential: 4,1 6,9 >> >> - >> >> >> >> 5 parallel: 15,6 19,1 >> >> - >> >> >> >> 10 parallel: 23,6 30,2 >> >> - >> >> >> >> 20 parallel: 48 52,2 >> >> - >> >> >> >> CONF2 >> >> - >> >> >> >> sequential: 12,3 17,4 >> >> - >> >> >> >> 5 parallel: 32,5 34,2 >> >> - >> >> >> >> 10 parallel: 45,4 49 >> >> - >> >> >> >> 20 parallel: 64,6 74 >> >> - >> >> >> >> CONF3 >> >> - >> >> >> >> sequential: 6,9 9,9 >> >> - >> >> >> >> 5 parallel: 33,2 37,5 >> >> - >> >> >> >> 10 parallel: 46 51 >> >> - >> >> >> >> 20 parallel: 68 83 >> >> >> >> >> >> >> >> - >> >> >> >> Indexing (into the solr admin console is it possible to view the >> >> total throughput? >> >> I find it only relative to a single shard). >> >> >> >> >> >> CONF1 >> >> >> >> - >> >> >> >> sequential: 7,7 9,5 >> >> - >> >> >> >> 5 parallel: 26,8 28,4 >> >> - >> >> >> >> 10 parallel: 31,8 37,8 >> >> - >> >> >> >> 20 parallel: 42 52,5 >> >> - >> >> >> >> CONF2 >> >> - >> >> >> >> sequential: 12,3 19 >> >> - >> >> >> >> 5 parallel: 39 40,8 >> >> - >> >> >> >> 10 parallel: 56,6 62,9 >> >> - >> >> >> >> 20 parallel: 79 116 >> >> - >> >> >> >> CONF3 >> >> - >> >> >> >> sequential: 10 18,9 >> >> - >> >> >> >> 5 parallel: 36,5 41,9 >> >> - >> >> >> >> 10 parallel: 63,7 64,1 >> >> - >> >> >> >> 20 parallel: 85 120 >> >> >> >> >> >> >> >> I have two question: >> >> >> >> - >> >> >> >> the response times of the configuration with replica are worse (in >> test >> >> case of sequential requests worse of about three time) than the >> response >> >> times of the configuration without replica. Is it an expected >> result? >> >> - Why during index inserting and updating replicas doesn’t help to >> >> reduce the response time? >> >> >> > >