Hi,

I have an 260M documents index (90GB) with this structure:


<field name="fragment" type="text_general" indexed="true" stored="true"
multiValued="false" termVectors="false" termPositions="false"
termOffsets="false" />

  <field name="parentId" type="long" indexed="false" stored="true"
multiValued="false"/>

  <field name="fragmentContentType" type="string" indexed="false"
stored="true" multiValued="false"/>

  <field name="creationDate" type="date" indexed="true" stored="true"
multiValued="false"/>

  <field name="creationTimestamp" type="date" indexed="true" stored="true"
multiValued="false"/>

  <field name="visibility" type="string" indexed="true" stored="true"
multiValued="false"/>

  <field name="category" type="string" indexed="true" stored="true"
multiValued="false"/>

  <field name="marked" type="string" indexed="true" stored="true"
multiValued="false"/>

   <!-- catchall field, containing all other searchable text fields
(implemented

   via copyField further on in this schema  -->

  <field name="text" type="text_general" indexed="true" stored="false"
multiValued="true"/>

  <copyField source="fragment" dest="text"/>

  <copyField source="parentId" dest="text"/>

  <copyField source="fragmentContentType" dest="text"/>

  <copyField source="creationDate" dest="text"/>

  <copyField source="visibility" dest="text"/>

  <copyField source="category" dest="text"/>

  <copyField source="marked" dest="text"/>


where the fragmetnt field contains XML messagges.

There is a search function that provide the messagges satisfying a search
criterion.


TARGET:

To find the best configuration to optimize the response time of a two solr
instances cloud with 2 VM with 8 core and 32 GB


TEST RESULTS:


   1.

   Configurations:
   1.

      the better configuration without replicas
      - CONF1: 16 shards of 17M documents (8 per VM)
      1.

      configuration with replica
      - CONF 2: 8 shards of 35M documents with replication factor of 1
         - CONF 3: 16 shards of 35M documents with replication factor of 1



   1.

   Executed tests


   - sequential requests
      - 5 parallel requests
      - 10 parallel requests
      - 20 parallel requests

in two scenarios: during an indexing phase and not


Call are: http://localhost:8983/solr/sepa/select?
q=+fragment%3A*AAA*+&fq=marked%3AT&fq=-fragmentContentType
%3ABULK&start=0&rows=100&sort=creationTimestamp+desc%2Cid+asc


   1.

   Test results

           All the test have point out an I/O utilization of 100MB/s during

loading data on disk cache, disk cache utilization of 20GB and core
utilization of 100% (all 8 cores)



   -

   No indexing
   -

      CONF1 (time average and maximum time)
      -

         sequential: 4,1 6,9
         -

         5 parallel: 15,6 19,1
         -

         10 parallel: 23,6 30,2
         -

         20 parallel: 48 52,2
         -

      CONF2
      -

         sequential: 12,3 17,4
         -

         5 parallel: 32,5 34,2
         -

         10 parallel: 45,4 49
         -

         20 parallel: 64,6 74
         -

      CONF3
      -

         sequential: 6,9 9,9
         -

         5 parallel: 33,2 37,5
         -

         10 parallel: 46 51
         -

         20 parallel: 68 83



   -

   Indexing (into the solr admin console is it possible to view the
total throughput?
   I find it only relative to a single shard).


CONF1

   -

      sequential: 7,7 9,5
      -

      5 parallel: 26,8 28,4
      -

      10 parallel: 31,8 37,8
      -

      20 parallel: 42 52,5
      -

   CONF2
   -

      sequential: 12,3 19
      -

      5 parallel: 39 40,8
      -

      10 parallel: 56,6 62,9
      -

      20 parallel: 79 116
      -

   CONF3
   -

      sequential: 10 18,9
      -

      5 parallel: 36,5 41,9
      -

      10 parallel: 63,7 64,1
      -

      20 parallel: 85 120



I have two question:

   -

   the response times of the configuration with replica are worse (in test
   case of sequential requests worse of about three time) than the response
   times of the configuration without replica. Is it an expected result?
   - Why during  index inserting and updating replicas doesn’t help to
   reduce the response time?

Reply via email to