Hi all again. Thanks to all for your replies.
On this weekend I'd made some interesting tests, and I would like to share it
with you.
First of all I made speed test of my hdd:
root@LSolr:~# hdparm -t /dev/sda9
/dev/sda9:
Timing buffered disk reads: 146 MB in 3.01 seconds = 48.54 MB/sec
Then with iperf I had tested my network:
[ 4] 0.0-18.7 sec 2.00 GBytes 917 Mbits/sec
Then, I tried to post my quesries using shard parameter with one
shard, so my queries were like:
http://localhost:8080/solr1/select/?q=(test)&qt=requestShards
<http://localhost:8080/solr1/select/?q=%28test%29&qt=requestShards>
where "requestShards" is:
<requestHandler name="requestShards" class="solr.SearchHandler" default="false">
<lst name="defaults">
<str name="echoParams">explicit</str>
<int name="rows">10</int>
<str name="shards">127.0.0.1:8080/solr1 <http://127.0.0.1:8080/solr1></str>
</lst>
</requestHandler>
Maybe its not correct, but:
INFO: [] webapp=/solr1
path=/select/params={fl=*,score&ident=true&start=0&q=(genuflections)&qt=requestShards&rows=2000}status=0
QTime=6525
INFO: [] webapp=/solr1
path=/select/params={fl=*,score&ident=true&start=0&q=(tunefulness)&qt=requestShards&rows=2000}
status=0 QTime=20170
INFO: [] webapp=/solr1
path=/select/params={fl=*,score&ident=true&start=0&q=(societal)&qt=requestShards&rows=2000}
status=0 QTime=44958
INFO: [] webapp=/solr1
path=/select/params={fl=*,score&ident=true&start=0&q=(euchre's)&qt=requestShards&rows=2000}
status=0 QTime=32161
INFO: [] webapp=/solr1
path=/select/params={fl=*,score&ident=true&start=0&q=(monogram's)&qt=requestShards&rows=2000}
status=0 QTime=85252
When I posted similar queries direct to solr1 without "requestShards" I had:
INFO: [] webapp=/solr1
path=/select/params={fl=*,score&ident=true&start=0&q=(reopening)&rows=2000}
hits=712 status=0 QTime=10
INFO: [] webapp=/solr1
path=/select/params={fl=*,score&ident=true&start=0&q=(housemothers)&rows=2000}
hits=0 status=0 QTime=446
INFO: [] webapp=/solr1
path=/select/params={fl=*,score&ident=true&start=0&q=(harpooners)&rows=2000}
hits=76 status=0 QTime=399
INFO: [] webapp=/solr1 path=/select/
params={fl=*,score&ident=true&start=0&q=(coaxing)&rows=2000} hits=562 status=0
QTime=2820
INFO: [] webapp=/solr1 path=/select/
params={fl=*,score&ident=true&start=0&q=(superstar's)&rows=2000} hits=4748
status=0 QTime=672
INFO: [] webapp=/solr1 path=/select/
params={fl=*,score&ident=true&start=0&q=(sedateness's)&rows=2000} hits=136
status=0 QTime=923
INFO: [] webapp=/solr1 path=/select/
params={fl=*,score&ident=true&start=0&q=(petrolatum)&rows=2000} hits=8
status=0 QTime=6183
INFO: [] webapp=/solr1 path=/select/
params={fl=*,score&ident=true&start=0&q=(everlasting's)&rows=2000} hits=1522
status=0 QTime=2625
And finally I found a bug:
https://issues.apache.org/jira/browse/SOLR-1524
<https://issues.apache.org/jira/browse/SOLR-1524>
Why is no activity on it? Its not actual?
Today I wrote a bash script:
#!/bin/bash
ds=$(date +%s.%N)
echo "START: $ds"> ./data/east_2000
curl http://127.0.0.1:8080/solr1/select/?fl=*,score&ident=true&start=0&q=(east)&rows=2000
<http://127.0.0.1:8080/solr1/select/?fl=*,score&ident=true&start=0&q=%28east%29&rows=2000-s> -s-H
'Content-type:text/xml; charset=utf-8'>> ./data/east_2000
de=$(date +%s.%N)
ddf=$(echo "$de - $ds" | bc)
echo "END: $de">> ./data/east_2000
echo "DIFF: $ddf">> ./data/east_2000
Before runing a Tomcat I'd dropped cache:
root@LSolr:~# echo 3> /proc/sys/vm/drop_caches
Then I started Tomcat and run the script. Result is bellow:
START: 1322476131.783146691
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader"><int name="status">0</int><int
name="QTime">125</int><lst name="params"><str
name="fl">*,score</str><str name="ident">true</str><str
name="start">0</str><str name="q">(east)</str><str
name="rows">2000</str></lst></lst><result name="response"
numFound="21439" start="0" maxScore="4.387605">
...
</response>
END: 1322476180.262770244
DIFF: 48.479623553
File size is:
root@LSolr:~# ls -l | grep east
-rw-r--r-- 1 root root 1063579 Nov 28 12:29 east_2000
I'm using nmon to monitor a HDD activity. It was near 100% when I run the
script. But when I tried to run it again the result was:
DIFF: .063678709
and no much HDD activity at nmon.
I can't undestand one thing: is this my huge hardware such as slow HDDor its a
Solr troubles?
And why is no activity on bug https://issues.apache.org/jira/browse/SOLR-1524
<https://issues.apache.org/jira/browse/SOLR-1524> since 27/Oct/09 07:19?
On 11/25/2011 10:02 AM, Dmitry Kan wrote:
45 000 000 per shard approx, Tomcat, caching was tweaked in solrconfig and
shard given 12GB of RAM max.
<!-- Filter Cache
Cache used by SolrIndexSearcher for filters (DocSets),
unordered sets of *all* documents that match a query. When a
new searcher is opened, its caches may be prepopulated or
"autowarmed" using data from caches in the old searcher.
autowarmCount is the number of items to prepopulate. For
LRUCache, the autowarmed items will be the most recently
accessed items.
Parameters:
class - the SolrCache implementation LRUCache or
(LRUCache or FastLRUCache)
size - the maximum number of entries in the cache
initialSize - the initial capacity (number of entries) of
the cache. (see java.util.HashMap)
autowarmCount - the number of entries to prepopulate from
and old cache.
-->
filterCache class="solr.FastLRUCache" size="1200" initialSize="1200"
autowarmCount="128"/>
<!-- Query Result Cache
Caches results of searches - ordered lists of document ids
(DocList) based on a query, a sort, and the range of
documents requested.
-->
<queryResultCache class="solr.LRUCache" size="512" initialSize="512"
autowarmCount="32"/>
<!-- Document Cache
Caches Lucene Document objects (the stored fields for each
document). Since Lucene internal document ids are transient,
this cache will not be autowarmed.
-->
<documentCache class="solr.LRUCache" size="512" initialSize="512"
autowarmCount="0"/>
<!-- Field Value Cache
Cache used to hold field values that are quickly accessible
by document id. The fieldValueCache is created by default
even if not configured here.
-->
<!--
<fieldValueCache class="solr.FastLRUCache"
size="512"
autowarmCount="128"
showItems="32" />
-->
<!-- Custom Cache
Example of a generic cache. These caches may be accessed by
name through SolrIndexSearcher.getCache(),cacheLookup(), and
cacheInsert(). The purpose is to enable easy caching of
user/application level data. The regenerator argument should
be specified as an implementation of solr.CacheRegenerator
if autowarming is desired.
-->
<!--
<cache name="myUserCache"
class="solr.LRUCache"
size="4096"
initialSize="1024"
autowarmCount="1024"
regenerator="com.mycompany.MyRegenerator"
/>
-->
<!-- Lazy Field Loading
If true, stored fields that are not requested will be loaded
lazily. This can result in a significant speed improvement
if the usual case is to not load all stored fields,
especially if the skipped fields are large compressed text
fields.
-->
<enableLazyFieldLoading>
true
</enableLazyFieldLoading>
<!-- Use Filter For Sorted Query
A possible optimization that attempts to use a filter to
satisfy a search. If the requested sort does not include
score, then the filterCache will be checked for a filter
matching the query. If found, the filter will be used as the
source of document ids, and then the sort will be applied to
that.
For most situations, this will not be useful unless you
frequently get the same search repeatedly with different sort
options, and none of them ever use "score"
-->
<!--
<useFilterForSortedQuery>true</useFilterForSortedQuery>
-->
<!-- Result Window Size
An optimization for use with the queryResultCache. When a search
is requested, a superset of the requested number of document ids
are collected. For example, if a search for a particular query
requests matching documents 10 through 19, and queryWindowSize is 50,
then documents 0 through 49 will be collected and cached. Any further
requests in that range can be satisfied via the cache.
-->
<queryResultWindowSize>
50
</queryResultWindowSize>
<!-- Maximum number of documents to cache for any entry in the
queryResultCache.
-->
<queryResultMaxDocsCached>
200
</queryResultMaxDocsCached>
In you case I would first check if the network throughput is a bottleneck.
It would be nice if you could check timestamps of completing a request on
each of the shards and arrival time (via some http sniffer) at the frondend
SOLR's servers. Then you will see if it is frontend taking so much time or
was it a network issue.
Are you shards btw well balanced?
On Thu, Nov 24, 2011 at 7:06 PM, Artem Lokotosh<arco...@gmail.com> wrote:
Can you merge, e.g. 3 shards together or is it much effort for your
team?>Yes, we can merge. We'll try to do this and review how it will works
Merge does not help :(I've tried to merge two shards in one, three
shards in one, but results are similar to results first configuration
with 30 shardsbut this solution have an one big minus the optimization
proccess may take more time
In our setup we currently have 16 shards with ~30GB each, but we
rarely>>search in all of them at once
How many documents per shards in your setup?Any difference between
Tomcat, Jetty or other?
Have you configured your servlet more specifically than default
configuration?
On Wed, Nov 23, 2011 at 4:38 PM, Artem Lokotosh<arco...@gmail.com> wrote:
Is this log from the frontend SOLR (aggregator) or from a shard?
from aggregator
Can you merge, e.g. 3 shards together or is it much effort for your
team?
Yes, we can merge. We'll try to do this and review how it will works
Thanks, Dmitry
Any another ideas?
On Wed, Nov 23, 2011 at 4:01 PM, Dmitry Kan<dmitry....@gmail.com>
wrote:
Hello,
Is this log from the frontend SOLR (aggregator) or from a shard?
Can you merge, e.g. 3 shards together or is it much effort for your
team?
In our setup we currently have 16 shards with ~30GB each, but we rarely
search in all of them at once.
Best,
Dmitry
On Wed, Nov 23, 2011 at 3:12 PM, Artem Lokotosh<arco...@gmail.com>
wrote:
--
Best regards,
Artem Lokotosh mailto:arco...@gmail.com
--
Best regards,
Artem Lokotosh mailto:arco...@gmail.com
--
Best regards,
Artem Lokotosh mailto:arco...@gmail.com