Re: SOLR replicas performance

Erick Erickson Tue, 05 Jan 2016 08:09:07 -0800

What version of Solr? Prior to 5.2 the replicas were doing lots of
unnecessary work/being blocked, see:
https://lucidworks.com/blog/2015/06/10/indexing-performance-solr-5-2-now-twice-fast/


Best,
Erick

On Tue, Jan 5, 2016 at 6:09 AM, Matteo Grolla <matteo.gro...@gmail.com> wrote:
> Hi Luca,
>       not sure if I understood well. Your question is
> "Why are index times on a solr cloud collecton with 2 replicas higher than
> on solr cloud with 1 replica" right?
> Well with 2 replicas all docs have to be deparately indexed in 2 places and
> solr has to confirm that both indexing went well.
> Indexing times are lower on a solrcloud collection with 2 shards (just one
> replica, the leader, per shard) because docs are indexed just once and the
> load is spread on 2 servers instead of one
>
> 2015-12-30 2:03 GMT+01:00 Luca Quarello <lucaquare...@gmail.com>:
>
>> Hi,
>>
>> I have an 260M documents index (90GB) with this structure:
>>
>>
>> <field name="fragment" type="text_general" indexed="true" stored="true"
>> multiValued="false" termVectors="false" termPositions="false"
>> termOffsets="false" />
>>
>>   <field name="parentId" type="long" indexed="false" stored="true"
>> multiValued="false"/>
>>
>>   <field name="fragmentContentType" type="string" indexed="false"
>> stored="true" multiValued="false"/>
>>
>>   <field name="creationDate" type="date" indexed="true" stored="true"
>> multiValued="false"/>
>>
>>   <field name="creationTimestamp" type="date" indexed="true" stored="true"
>> multiValued="false"/>
>>
>>   <field name="visibility" type="string" indexed="true" stored="true"
>> multiValued="false"/>
>>
>>   <field name="category" type="string" indexed="true" stored="true"
>> multiValued="false"/>
>>
>>   <field name="marked" type="string" indexed="true" stored="true"
>> multiValued="false"/>
>>
>>    <!-- catchall field, containing all other searchable text fields
>> (implemented
>>
>>    via copyField further on in this schema  -->
>>
>>   <field name="text" type="text_general" indexed="true" stored="false"
>> multiValued="true"/>
>>
>>   <copyField source="fragment" dest="text"/>
>>
>>   <copyField source="parentId" dest="text"/>
>>
>>   <copyField source="fragmentContentType" dest="text"/>
>>
>>   <copyField source="creationDate" dest="text"/>
>>
>>   <copyField source="visibility" dest="text"/>
>>
>>   <copyField source="category" dest="text"/>
>>
>>   <copyField source="marked" dest="text"/>
>>
>>
>> where the fragmetnt field contains XML messagges.
>>
>> There is a search function that provide the messagges satisfying a search
>> criterion.
>>
>>
>> TARGET:
>>
>> To find the best configuration to optimize the response time of a two solr
>> instances cloud with 2 VM with 8 core and 32 GB
>>
>>
>> TEST RESULTS:
>>
>>
>>    1.
>>
>>    Configurations:
>>    1.
>>
>>       the better configuration without replicas
>>       - CONF1: 16 shards of 17M documents (8 per VM)
>>       1.
>>
>>       configuration with replica
>>       - CONF 2: 8 shards of 35M documents with replication factor of 1
>>          - CONF 3: 16 shards of 35M documents with replication factor of 1
>>
>>
>>
>>    1.
>>
>>    Executed tests
>>
>>
>>    - sequential requests
>>       - 5 parallel requests
>>       - 10 parallel requests
>>       - 20 parallel requests
>>
>> in two scenarios: during an indexing phase and not
>>
>>
>> Call are: http://localhost:8983/solr/sepa/select?
>> q=+fragment%3A*AAA*+&fq=marked%3AT&fq=-fragmentContentType
>> %3ABULK&start=0&rows=100&sort=creationTimestamp+desc%2Cid+asc
>>
>>
>>    1.
>>
>>    Test results
>>
>>            All the test have point out an I/O utilization of 100MB/s during
>>
>> loading data on disk cache, disk cache utilization of 20GB and core
>> utilization of 100% (all 8 cores)
>>
>>
>>
>>    -
>>
>>    No indexing
>>    -
>>
>>       CONF1 (time average and maximum time)
>>       -
>>
>>          sequential: 4,1 6,9
>>          -
>>
>>          5 parallel: 15,6 19,1
>>          -
>>
>>          10 parallel: 23,6 30,2
>>          -
>>
>>          20 parallel: 48 52,2
>>          -
>>
>>       CONF2
>>       -
>>
>>          sequential: 12,3 17,4
>>          -
>>
>>          5 parallel: 32,5 34,2
>>          -
>>
>>          10 parallel: 45,4 49
>>          -
>>
>>          20 parallel: 64,6 74
>>          -
>>
>>       CONF3
>>       -
>>
>>          sequential: 6,9 9,9
>>          -
>>
>>          5 parallel: 33,2 37,5
>>          -
>>
>>          10 parallel: 46 51
>>          -
>>
>>          20 parallel: 68 83
>>
>>
>>
>>    -
>>
>>    Indexing (into the solr admin console is it possible to view the
>> total throughput?
>>    I find it only relative to a single shard).
>>
>>
>> CONF1
>>
>>    -
>>
>>       sequential: 7,7 9,5
>>       -
>>
>>       5 parallel: 26,8 28,4
>>       -
>>
>>       10 parallel: 31,8 37,8
>>       -
>>
>>       20 parallel: 42 52,5
>>       -
>>
>>    CONF2
>>    -
>>
>>       sequential: 12,3 19
>>       -
>>
>>       5 parallel: 39 40,8
>>       -
>>
>>       10 parallel: 56,6 62,9
>>       -
>>
>>       20 parallel: 79 116
>>       -
>>
>>    CONF3
>>    -
>>
>>       sequential: 10 18,9
>>       -
>>
>>       5 parallel: 36,5 41,9
>>       -
>>
>>       10 parallel: 63,7 64,1
>>       -
>>
>>       20 parallel: 85 120
>>
>>
>>
>> I have two question:
>>
>>    -
>>
>>    the response times of the configuration with replica are worse (in test
>>    case of sequential requests worse of about three time) than the response
>>    times of the configuration without replica. Is it an expected result?
>>    - Why during  index inserting and updating replicas doesn’t help to
>>    reduce the response time?
>>

Re: SOLR replicas performance

Reply via email to