Clustering speed become slow after splitting shards

Zheng Lin Edwin Yeo Mon, 31 Aug 2015 02:57:28 -0700

Hi,

I've tried to split my collection from 1 shard to 2 shards using the
command:
http://localhost:8983/solr/admin/collections?action=SPLITSHARD&collection=collection1&shard=shard1


The shard was split successfully with all the index intact. The search and
highlight gives the same results before and after the split.

However, when I tried to query the clustering, I found that the speed to
generate the cluster label become much slower after the split (used to be
average about 2 seconds, but now it takes about 5 seconds).
Also, the clustering labels produced before and after the split are
different. What could be the reason?

Below is my clustering handler for reference. I'm using Solr 5.2.1.

  <requestHandler name="/clustering"
                  startup="lazy"
                  enable="${solr.clustering.enabled:true}"
                  class="solr.SearchHandler">
    <lst name="defaults">
<!-- Added by Edwin on 120515 -->
       <str name="echoParams">explicit</str>
       <!--<int name="rows">1000</int>-->
  <int name="rows">200</int>
  <str name="defType">edismax</str>
       <str name="wt">json</str>
       <str name="indent">true</str>
  <str name="df">text</str>
  <str name="fl">null</str>

      <bool name="clustering">true</bool>
      <bool name="clustering.results">true</bool>
 <str name="clustering.engine">default</str>

 <str name="carrot.id">id</str>
      <!-- Field name with the logical "title" of a each document
(optional) -->
      <!--<str name="carrot.title">subject content_cluster tag</str>-->
 <str name="carrot.title">title</str>
      <!-- Field name with the logical "URL" of a each document (optional)
-->
      <str name="carrot.url">url</str>
      <!-- Field name with the logical "content" of a each document
(optional) -->
      <!--<str name="carrot.snippet">title content</str>-->
 <str name="carrot.snippet">content</str>
      <!-- Apply highlighter to the title/ content and use this for
clustering. -->
      <bool name="carrot.produceSummary">true</bool>

 <int name="carrot.fragSize">100</int>
 <str name="carrot.summarySnippets">2</str>

      <!-- the maximum number of labels per cluster -->
      <int name="carrot.numDescriptions">20</int>
      <!-- produce sub clusters -->
      <bool name="carrot.outputSubClusters">false</bool>
 <str name="LingoClusteringAlgorithm.desiredClusterCountBase">30</str>

      <!--<str name="qf">
        text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4
      </str>
      <str name="q.alt">*:*</str>
      <str name="rows">10</str>
      <str name="fl">*,score</str>-->
    </lst>
    <arr name="last-components">
      <str>clustering</str>
    </arr>
  </requestHandler>


Regards,
Edwin

Clustering speed become slow after splitting shards

Reply via email to