Re: Huge Performance: Solr distributed search

Artem Lokotosh Wed, 23 Nov 2011 06:39:22 -0800

> Is this log from the frontend SOLR (aggregator) or from a shard?
from aggregator


> Can you merge, e.g. 3 shards together or is it much effort for your team?
Yes, we can merge. We'll try to do this and review how it will works
Thanks, Dmitry

Any another ideas?

On Wed, Nov 23, 2011 at 4:01 PM, Dmitry Kan <dmitry....@gmail.com> wrote:
> Hello,
>
> Is this log from the frontend SOLR (aggregator) or from a shard?
> Can you merge, e.g. 3 shards together or is it much effort for your team?
>
> In our setup we currently have 16 shards with ~30GB each, but we rarely
> search in all of them at once.
>
> Best,
> Dmitry
>
> On Wed, Nov 23, 2011 at 3:12 PM, Artem Lokotosh <arco...@gmail.com> wrote:
>
>> Hi!
>>
>> * Data:
>> - Solr 3.4;
>> - 30 shards ~ 13GB, 27-29M docs each shard.
>>
>> * Machine parameters (Ubuntu 10.04 LTS):
>> user@Solr:~$ uname -a
>> Linux Solr 2.6.32-31-server #61-Ubuntu SMP Fri Apr 8 19:44:42 UTC 2011
>> x86_64 GNU/Linux
>> user@Solr:~$ cat /proc/cpuinfo
>> processor       : 0 - 3
>> vendor_id       : GenuineIntel
>> cpu family      : 6
>> model           : 44
>> model name      : Intel(R) Xeon(R) CPU           X5690  @ 3.47GHz
>> stepping        : 2
>> cpu MHz         : 3458.000
>> cache size      : 12288 KB
>> fpu             : yes
>> fpu_exception   : yes
>> cpuid level     : 11
>> wp              : yes
>> flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
>> mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss syscall nx
>> rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology
>> tsc_reliable nonstop_tsc aperfmperf pni pclmulqdq ssse3 cx16 sse4_1
>> sse4_2 popcnt aes hypervisor lahf_lm ida arat
>> bogomips        : 6916.00
>> clflush size    : 64
>> cache_alignment : 64
>> address sizes   : 40 bits physical, 48 bits virtual
>> power management:
>> user@Solr:~$ cat /proc/meminfo
>> MemTotal:       16992680 kB
>> MemFree:          110424 kB
>> Buffers:            9976 kB
>> Cached:         11588380 kB
>> SwapCached:        41952 kB
>> Active:          9860764 kB
>> Inactive:        6198668 kB
>> Active(anon):    4062144 kB
>> Inactive(anon):   398972 kB
>> Active(file):    5798620 kB
>> Inactive(file):  5799696 kB
>> Unevictable:           0 kB
>> Mlocked:               0 kB
>> SwapTotal:      46873592 kB
>> SwapFree:       46810712 kB
>> Dirty:                36 kB
>> Writeback:             0 kB
>> AnonPages:       4424756 kB
>> Mapped:           940660 kB
>> Shmem:                40 kB
>> Slab:             362344 kB
>> SReclaimable:     350372 kB
>> SUnreclaim:        11972 kB
>> KernelStack:        2488 kB
>> PageTables:        68568 kB
>> NFS_Unstable:          0 kB
>> Bounce:                0 kB
>> WritebackTmp:          0 kB
>> CommitLimit:    55369932 kB
>> Committed_AS:    5740556 kB
>> VmallocTotal:   34359738367 kB
>> VmallocUsed:      350532 kB
>> VmallocChunk:   34359384964 kB
>> HardwareCorrupted:     0 kB
>> HugePages_Total:       0
>> HugePages_Free:        0
>> HugePages_Rsvd:        0
>> HugePages_Surp:        0
>> Hugepagesize:       2048 kB
>> DirectMap4k:       10240 kB
>> DirectMap2M:    17299456 kB
>>
>> - Apache Tomcat 6.0.32:
>> <!-- java arguments -->
>> -XX:+DisableExplicitGC
>> -XX:PermSize=512M
>> -XX:MaxPermSize=512M
>> -Xmx12G
>> -Xms3G
>> -XX:NewSize=128M
>> -XX:MaxNewSize=128M
>> -XX:+UseParNewGC
>> -XX:+UseConcMarkSweepGC
>> -XX:+CMSClassUnloadingEnabled
>> -XX:CMSInitiatingOccupancyFraction=50
>> -XX:GCTimeRatio=9
>> -XX:MinHeapFreeRatio=25
>> -XX:MaxHeapFreeRatio=25
>> -verbose:gc
>> -XX:+PrintGCTimeStamps
>> -Xloggc:/opt/search/tomcat/logs/gc.log
>>
>> Out search schema is:
>> - 5 servers with configuration above;
>> - one tomcat6 application on each server with 6 solr applications.
>>
>> - Full addresses are:
>> 1) http://192.168.1.85:8080/solr1,http://192.168.1.85:8080/solr2,...,
>> http://192.168.1.85:8080/solr6
>> 2) http://192.168.1.86:8080/solr7,http://192.168.1.86:8080/solr8,...,
>> http://192.168.1.86:8080/solr12
>> ...
>> 5) http://192.168.1.89:8080/solr25,http://192.168.1.89:8080/solr26,...,
>> http://192.168.1.89:8080/solr30
>> - At another server there is a additional "common" application with
>> shards paramerter:
>> <requestHandler name="search" class="solr.SearchHandler" default="true">
>> <lst name="defaults">
>> <str name="echoParams">explicit</str>
>> <str name="shards">192.168.1.85:8080/solr1,192.168.1.85:8080/solr2,...,
>> 192.168.1.89:8080/solr30</str>
>> <int name="rows">10</int>
>> </lst>
>> </requestHandler>
>> - schema and solrconfig are identical for all shards, for first shard
>> see attach;
>> - on these servers are only search, indexation is on another
>> (optimized to 2 segments shards replicate with ssh/rsync scripts).
>>
>> So now the major problem is huge performance on distributed search.
>> Take look on, for example, these logs:
>> This is on 30 shards:
>> INFO: [] webapp=/solr
>> path=/select/params={fl=*,score&ident=true&start=0&q=(barium)&rows=2000}
>> status=0 QTime=40712
>> INFO: [] webapp=/solr
>> path=/select/params={fl=*,score&ident=true&start=0&q=(pittances)&rows=2000}
>> status=0 QTime=36097
>> INFO: [] webapp=/solr
>>
>> path=/select/params={fl=*,score&ident=true&start=0&q=(reliability)&rows=2000}
>> status=0 QTime=75756
>> INFO: [] webapp=/solr
>>
>> path=/select/params={fl=*,score&ident=true&start=0&q=(blessing's)&rows=2000}
>> status=0 QTime=30342
>> INFO: [] webapp=/solr
>>
>> path=/select/params={fl=*,score&ident=true&start=0&q=(reiterated)&rows=2000}
>> status=0 QTime=55690
>>
>> Sometimes QTime is more than 150000. But when we run identical queries
>> on one shard separately, QTime is between 200 and 1500.
>> Does ditributed solr search really slow or our architecture is non
>> optimal? Or maybe need to use any third-party applications?
>> Thanks for any replies.
>>
>> --
>> Best regards,
>> Artem
>>
>



-- 
Best regards,
Artem Lokotosh        mailto:arco...@gmail.com

Re: Huge Performance: Solr distributed search

Reply via email to