Re: Storage/Volume type for Kubernetes Solr POD?

2020-02-07 Thread Karl Stoney
we personally run solr on google cloud kubernetes engine and each node has a 512Gb persistent ssd (network attached) storage which gives roughly this performance (read/write): Sustained random IOPS limit 15,360.00 15,360.00 Sustained throughput limit (MB/s) 245.76 245.76 and we get very good p

Re: Storage/Volume type for Kubernetes Solr POD?

2020-02-07 Thread Nicolas PARIS
hi all what about cephfs or lustre distrubuted filesystem for such purpose ? Karl Stoney writes: > we personally run solr on google cloud kubernetes engine and each node has a > 512Gb persistent ssd (network attached) storage which gives roughly this > performance (read/write): > > Sustained

Re: Solr 7.7 heap space is getting full

2020-02-07 Thread Erick Erickson
Walter’s comment (that I’ve seen too BTW) is something to pursue if (and only if) you have proof that Solr is spinning up thousands of threads. Do you have any proof of that? Having several hundred threads running is quite common BTW. Attach jconsole or take a thread dump and it’ll be obvious. H

Re: Checking in on Solr Progress

2020-02-07 Thread Jan Høydahl
Could we expose some high level recovery info as part of metrics api? Then people could track number of cores recovering, recovery time, recovery phase, number of recoveries failed etc, and also build alerts on top of that. Jan Høydahl > 6. feb. 2020 kl. 19:42 skrev Erick Erickson : > > There

Re: Checking in on Solr Progress

2020-02-07 Thread Erick Erickson
I was wondering about using metrics myself. I confess I didn’t look to see what was already there either ;) Actually, using metrics might be easiest all told, but I also confess I have no clue what it takes to build a new metric in. Nor how to use the same (?) collection process for the 5 situa

Re: Checking in on Solr Progress

2020-02-07 Thread Walter Underwood
I wrote some Python that checks CLUSTERSTATUS and reports replica status to Telegraf. Great for charts and alerts, but it only shows status, not progress. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Feb 7, 2020, at 7:58 AM, Erick Erickson wrote:

Solr Analyzer : Filter to drop tokens based on some logic which needs access to adjacent tokens

2020-02-07 Thread Pratik Patel
Hello Everyone, Let's say I have an analyzer which has following token stream as an output. *token stream : [], a, ab, [], c, [], d, de, def .* Now let's say I want to add another filter which will drop a certain tokens based on whether adjacent token on the right side is [] or some string.

Stream InnerJoin to merge hierarchal data

2020-02-07 Thread sambasivarao giddaluri
Hi All, Our dataset is of 50M records and we are using complex graph query and now trying to do innerjoin on the records and facing the below issue . This is a critical issue . Parent { parentId:"1" parent.name:"foo" type:"parent" } Child { childId:"2" parentId:"1" child.name:"bar" type:"child"

Re: Stream InnerJoin to merge hierarchal data

2020-02-07 Thread Joel Bernstein
This is working as designed I believe. I issue is that innerJoin relies on the sort order of the streams in order to perform streaming merge join. The first join works because the sorts line up on childId. innerJoin(search(collection_name, q="type:grandchild",