Re: Cost of enabling doc values

2018-06-13 Thread Erick Erickson
I pretty much agree with your business side. The rough size of the docValues fields is one of X for each doc. So say you have an int field. Size is near maxDoc * 4 bytes. This is not totally accurate, there is some int packing done for instance, but it'll do. If you really want an accurate count,

Re: Autoscaling and inactive shards

2018-06-13 Thread Shalin Shekhar Mangar
Yes, I believe Noble is working on this. See https://issues.apache.org/jira/browse/SOLR-11985 On Wed, Jun 13, 2018 at 1:35 PM Jan Høydahl wrote: > Ok, get the meaning of preferences. > > Would there be a way to write a generic rule that would suggest moving > shards to obtain balance, without sp

Cost of enabling doc values

2018-06-13 Thread root23
Hi all, Does anyone know how much typically index size increments when we enable doc value on a field. Our business side want to enable sorting fields on most of our fields. I am trying to push back saying that it will increase the index size, since enabling docvalues will create the univerted inde

Re: How to avoid join queries

2018-06-13 Thread Erik Hatcher
> On Jun 13, 2018, at 4:24 PM, root23 wrote: ... > But i > know use of join is discouraged in solr and i do not want to use it. … Why do you say that? I, for one, find great power and joy using `{!join}`. Erik

Re: SolrCore Initialization Failures

2018-06-13 Thread shefalid
Thanks for your response. None of the processes are deleting any index files. One data directory is only pointed by one core. We are writing data at a high ingestion rate (100,000 records per second). Commit happens once every 30 seconds. Also a periodic service runs to backup the data to our ba

Re: SolrCore Initialization Failures

2018-06-13 Thread shefalid
Thanks for your response. None of the processes are deleting any index files. One data directory is only pointed by one core. We are writing data at a high ingestion rate (100,000 records per second). Commit happens once every 30 seconds. Also a periodic service runs to backup the data to our ba

How to avoid join queries

2018-06-13 Thread root23
Hi all, I have a following use case. lets say my document is like this. doc={ name:abc, status:def, store_id:store_1 parent:nike } Now lets say in our use case at some point of time store_1 moved under a different parent. lets say adidas. Our business use c

Re: Exception when processing streaming expression

2018-06-13 Thread Joel Bernstein
Can your provide some example expressions that are causing these exceptions? Joel Bernstein http://joelsolr.blogspot.com/ On Wed, Jun 13, 2018 at 9:02 AM, Christian Spitzlay < christian.spitz...@biologis.com> wrote: > Hi, > > I am seeing a lot of (reproducible) exceptions in my solr log file > w

Re: 7.3.1 creates thousands of threads after start up

2018-06-13 Thread Shawn Heisey
On 6/13/2018 4:04 AM, Markus Jelsma wrote: You mentioned shard handler tweaks, thanks. I see we have an incorrect setting there for maximumPoolSize, way too high, but that doesn't account for the number of threads created. After reducing the number, for dubious reasons, twice the number of thr

RE: [EXT] Re: Extracting top level URL when indexing document

2018-06-13 Thread Hanjan, Harinder
Thank you Alex. I have managed to get this to work via URLClassifyProcessorFactory. If anyone is interested, it can be easily done via with the following solrconfig.xml true SolrId hostname

Re: Suggestions for debugging performance issue

2018-06-13 Thread Chris Troullis
Hi Susheel, It's not drastically different no. There are other collections with more fields and more documents that don't have this issue. And the collection is not sharded. Just 1 shard with 2 replicas. Both replicas are similar in response time. Thanks, Chris On Wed, Jun 13, 2018 at 2:37 PM, S

Re: Suggestions for debugging performance issue

2018-06-13 Thread Susheel Kumar
Is this collection anyway drastically different than others in terms of schema/# of fields/total document etc is it sharded and if so can you look which shard taking more time with shard.info=true. Thnx Susheel On Wed, Jun 13, 2018 at 2:29 PM, Chris Troullis wrote: > Thanks Erick, > > Seems to

Re: Suggestions for debugging performance issue

2018-06-13 Thread Chris Troullis
Thanks Erick, Seems to be a mixed bag in terms of tlog size across all of our indexes, but currently the index with the performance issues has 4 tlog files totally ~200 MB. This still seems high to me since the collections are in sync, and we hard commit every minute, but it's less than the ~8GB i

Re: Suggestions for debugging performance issue

2018-06-13 Thread Erick Erickson
First, nice job of eliminating all the standard stuff! About tlogs: Sanity check: They aren't growing again, right? They should hit a relatively steady state. The tlogs are used as a queueing mechanism for CDCR to durably store updates until they can successfully be transmitted to the target. So I

Logging Every document to particular core

2018-06-13 Thread govind nitk
Hi, Is there any way to log all the data getting indexed to a particular core only ? Regards, govind

Re: Solr 7 + HDFS issue

2018-06-13 Thread Shawn Heisey
On 6/12/2018 10:14 PM, Joe Obernberger wrote: Thank you Shawn.  It looks like it is being applied.  This could be some sort of chain reaction where: Drive or server fails.  HDFS starts to replicate blocks which causes network congestion.  Solr7 can't talk, so initiates a replication process w

Re: Suggestions for debugging performance issue

2018-06-13 Thread Chris Troullis
Thanks Erick. A little more info: -We do have buffering disabled everywhere, as I had read multiple posts on the mailing list regarding the issue you described. -We soft commit (with opensearcher=true) pretty frequently (15 seconds) as we have some NRT requirements. We hard commit every 60 seconds

Exception when processing streaming expression

2018-06-13 Thread Christian Spitzlay
Hi, I am seeing a lot of (reproducible) exceptions in my solr log file when I execute streaming expressions: o.a.s.s.HttpSolrCall Unable to write response, client closed connection or we are shutting down org.eclipse.jetty.io.EofException at org.eclipse.jetty.io.ChannelEndPoint.flush(Ch

RE: 7.3.1 creates thousands of threads after start up

2018-06-13 Thread Markus Jelsma
Hello Shawn, You mentioned shard handler tweaks, thanks. I see we have an incorrect setting there for maximumPoolSize, way too high, but that doesn't account for the number of threads created. After reducing the number, for dubious reasons, twice the number of threads are created and the node d

Re: Autoscaling and inactive shards

2018-06-13 Thread Jan Høydahl
Ok, get the meaning of preferences. Would there be a way to write a generic rule that would suggest moving shards to obtain balance, without specifying absolute core counts? I.e. if you have three nodes A: 3 cores B: 5 cores C: 3 cores Then that rule would suggest two moves to end up with 4 cor