On Tue, 2015-11-03 at 11:09 +0530, Modassar Ather wrote:
> It is around 90GB of index (around 8 million documents) on one shard and
> there are 12 such shards. As per my understanding the sharding is required
> for this case. Please help me understand if it is not required.
Except for an internal
One rule of thumb for Solr is to shard after you reach 100 million documents.
With large documents, you might want to shard sooner.
We are running an unsharded index of 7 million documents (55GB) without
problems.
The EdgeNgramFilter generates a set of prefix terms for each term in the
documen
Thanks Walter for your response,
It is around 90GB of index (around 8 million documents) on one shard and
there are 12 such shards. As per my understanding the sharding is required
for this case. Please help me understand if it is not required.
We have requirements where we need full wild card su
To back up a bit, how many documents are in this 90GB index? You might not need
to shard at all.
Why are you sending a query with a trailing wildcard? Are you matching the
prefix of words, for query completion? If so, look at the suggester, which is
designed to solve exactly that. Or you can us
On Mon, 2015-11-02 at 14:17 +0100, Toke Eskildsen wrote:
> http://rosalind:52300/solr/collection1/select?q=%22der+se*%
> 22&wt=json&indent=true&facet=false&group=true&group.field=domain
>
> gets expanded to
>
> parsedquery": "(+DisjunctionMaxQuery((content_text:\"kan svane\" |
> author:kan svane*
On Mon, 2015-11-02 at 17:27 +0530, Modassar Ather wrote:
> The query q=network se* is quick enough in our system too. It takes
> around 3-4 seconds for around 8 million records.
>
> The problem is with the same query as phrase. q="network se*".
I misunderstood your query then. I tried replicatin
Well it seems that doing q="network se*" is working but not in the way you
expect. Doing this q="network se*" would not trigger a prefix query and the
"*" character would be treated as any character. I suspect that your query
is in fact "network se" (assuming you're using a StandardTokenizer) and
t
The problem is with the same query as phrase. q="network se*".
The last . is fullstops for the sentence and the query is q=field:"network
se*"
Best,
Modassar
On Mon, Nov 2, 2015 at 6:10 PM, jim ferenczi wrote:
> Oups I did not read the thread carrefully.
> *The problem is with the same query a
Oups I did not read the thread carrefully.
*The problem is with the same query as phrase. q="network se*".*
I was not aware that you could do that with Solr ;). I would say this is
expected because in such case if the number of expansions for "se*" is big
then you would have to check the positions
*I am not able to get the above point. So when I start Solr with 28g RAM,
for all the activities related to Solr it should not go beyond 28g. And the
remaining heap will be used for activities other than Solr. Please help me
understand.*
Well those 28GB of heap are the memory "reserved" for your
I monitored swap activities for the query using vmstat. The *so* and *si*
shows 0 till the completion of query. Also the top showed 0 against swap.
This means there was no scarcity of physical memory. Swap activity seems
not to be a bottleneck.
Kindly note that this I ran on 8 node cluster with 30
Okay. I guess your observation of 400% for a single core is with top and
looking at that core's entry? If so, the 400% can be explained by
excessive garbage collection. You could turn GC-logging on to check
that. With a bit of luck GC would be the cause of the slow down.
Yes it is with top command
On Mon, 2015-11-02 at 16:25 +0530, Modassar Ather wrote:
> The remaining size after you removed the heap usage should be reserved for
> the index (not only the other system activities).
> I am not able to get the above point. So when I start Solr with 28g RAM,
> for all the activities related to S
On Mon, 2015-11-02 at 14:34 +0530, Modassar Ather wrote:
> No! This is a single big machine with 12 shards on it.
> Around 370 gb on the single machine.
Okay. I guess your observation of 400% for a single core is with top and
looking at that core's entry? If so, the 400% can be explained by
exces
Thanks Jim for your response.
The remaining size after you removed the heap usage should be reserved for
the index (not only the other system activities).
I am not able to get the above point. So when I start Solr with 28g RAM,
for all the activities related to Solr it should not go beyond 28g. A
*if it correlates with the bad performance you're seeing. One important
thing to notice is that a significant part of your index needs to be in RAM
(especially if you're using SSDs) in order to achieve good performance.*
Especially if you're not using SSDs, sorry ;)
2015-11-02 11:38 GMT+01:00 jim
12 shards with 28GB for the heap and 90GB for each index means that you
need at least 336GB for the heap (assuming you're using all of it which may
be easily the case considering the way the GC is handling memory) and ~=
1TO for the index. Let's say that you don't need your entire index in RAM,
the
Just to add one more point that one external Zookeeper instance is also
running on this particular machine.
Regards,
Modassar
On Mon, Nov 2, 2015 at 2:34 PM, Modassar Ather
wrote:
> Hi Toke,
> Thanks for your response. My comments in-line.
>
> That is 12 machines, running a shard each?
> No! Th
Hi Toke,
Thanks for your response. My comments in-line.
That is 12 machines, running a shard each?
No! This is a single big machine with 12 shards on it.
What is the total amount of physical memory on each machine?
Around 370 gb on the single machine.
Well, se* probably expands to a great deal o
On Mon, 2015-11-02 at 12:00 +0530, Modassar Ather wrote:
> I have a setup of 12 shard cluster started with 28gb memory each on a
> single server. There are no replica. The size of index is around 90gb on
> each shard. The Solr version is 5.2.1.
That is 12 machines, running a shard each?
What is t
Hi,
I have a setup of 12 shard cluster started with 28gb memory each on a
single server. There are no replica. The size of index is around 90gb on
each shard. The Solr version is 5.2.1.
When I query "network se*", the memory utilization goes upto 24-26 gb and
the query takes around 3+ minutes to e
21 matches
Mail list logo