Re: Log4J Logging to Http

2020-06-17 Thread Radu Gheorghe
Hi Florian, I don’t know the answer to your specific question, but I would like to suggest a different approach. Excuse me in advance, I usually hate suggesting different approaches. The reason why I suggest a different approach is because logging via HTTP can be blocking a thread e.g. until a

Re: Getting rid of Master/Slave nomenclature in Solr

2020-06-17 Thread Noble Paul
Looking at the code I see a 692 occurrences of the word "slave". Mostly variable names and ref guide docs. The word "slave" is present in the responses as well. Any change in the request param/response payload is backward incompatible. I have no objection to changing the names in ref guide and ot

Re: Solr cloud backup/restore not working

2020-06-17 Thread Shawn Heisey
On 6/17/2020 8:55 PM, yaswanth kumar wrote: Caused by: javax.crypto.BadPaddingException: RSA private key operation failed Something appears to be wrong with the private key that Solr is attempting to use for a certificate. Best guess, incorporating everything I can see in the stacktrace, is

Re: Solr cloud backup/restore not working

2020-06-17 Thread Shawn Heisey
On 6/16/2020 8:44 AM, yaswanth kumar wrote: I don't see anything related in the solr.log file for the same error. Not sure if there is anyother place where I can check for this. The underlying request that failed might be happening on one of the other nodes in the cloud. It might be necessary

Re: Getting rid of Master/Slave nomenclature in Solr

2020-06-17 Thread Ilan Ginzburg
Would master/follower work? Half the rename work while still getting rid of the slavery connotation... On Thu 18 Jun 2020 at 07:13, Walter Underwood wrote: > > On Jun 17, 2020, at 4:00 PM, Shawn Heisey wrote: > > > > It has been interesting watching this discussion play out on multiple > open

Re: Getting rid of Master/Slave nomenclature in Solr

2020-06-17 Thread Walter Underwood
> On Jun 17, 2020, at 4:00 PM, Shawn Heisey wrote: > > It has been interesting watching this discussion play out on multiple open > source mailing lists. On other projects, I have seen a VERY high level of > resistance to these changes, which I find disturbing and surprising. Yes, it is nice

Re: Getting rid of Master/Slave nomenclature in Solr

2020-06-17 Thread Walter Underwood
Master/slave is not going away in our company. That cluster has zero downtime in five years. I can’t say that about our Solr Cloud clusters. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jun 17, 2020, at 9:36 PM, Noble Paul wrote: > > I really do

Re: Getting rid of Master/Slave nomenclature in Solr

2020-06-17 Thread Noble Paul
I really do not see a reason why a master/slave terminology is a problem. We do not have slavery anywhere in the world. Should we also remove it from the dictionary? The old mode is going to go away anyway. Why waste time bikeshedding on this? On Thu, Jun 18, 2020, 12:04 PM Trey Grainger wrote:

Re: Solr cloud backup/restore not working

2020-06-17 Thread yaswanth kumar
Hi Vinodh, Here is what I see when I tried with requestid, Collection: test operation: restore failed:org.apache.solr.common.SolrException: ADDREPLICA failed to create replica at org.apache.solr.cloud.api.collections.OverseerCollectionMessageHandler$ShardRequestTracker.processResponses(OverseerCo

Re: Log4J Logging to Http

2020-06-17 Thread Shawn Heisey
On 6/17/2020 1:33 AM, Krönert Florian wrote: 2020-06-17T07:06:55.121856339Z java.lang.NoClassDefFoundError: Failed to initialize Apache Solr: Could not find necessary SLF4j logging jars. If using Jetty, the SLF4j logging jars need to go in the jetty lib/ext directory. For other containers, the

Re: Getting rid of Master/Slave nomenclature in Solr

2020-06-17 Thread Trey Grainger
@Shawn, Ok, yeah, apologies, my semantics were wrong. I was thinking that a TLog replica is a follower role only and becomes an NRT replica if it gets elected leader. From a pure semantics standpoint, though, I guess technically the TLog replica doesn't "become" an NRT replica, but just "acts the

Re: Getting rid of Master/Slave nomenclature in Solr

2020-06-17 Thread Michael Gibney
I agree with Shawn that the top contenders so far (from my perspective) are "primary/secondary" and "publisher/subscriber", and agree with Walter that whatever term pair is used should ideally be usable *as a pair* (to identify a cluster type) in addition to individually (to identify the individual

Re: Getting rid of Master/Slave nomenclature in Solr

2020-06-17 Thread Scott Cote
Perhaps  Apache could provide a nomenclature suggestion that the projects could adopt.   This would stand well for the whole Apache  community in regards to BLM. My two cents as a “user”  Good luck. Sent from Yahoo Mail for iPhone On Wednesday, June 17, 2020, 6:00 PM, Shawn Heisey wrote: On

Re: Getting rid of Master/Slave nomenclature in Solr

2020-06-17 Thread Shawn Heisey
On 6/17/2020 2:36 PM, Trey Grainger wrote: 2) TLOG - which can only serve in the role of follower This is inaccurate. TLOG can become leader. If that happens, then it functions exactly like an NRT leader. I'm aware that saying the following is bikeshedding ... but I do think it would be a

Re: Getting rid of Master/Slave nomenclature in Solr

2020-06-17 Thread Walter Underwood
Master/slave is not just two roles, but a kind of cluster. I really don’t think “Standalone” captures the non-Cloud cluster. Nobody in Chegg would have any idea that “standalone” meant “no Zookeeper”. I’ve never thought that master/slave accurately described the traditional replication model, but

Re: Getting rid of Master/Slave nomenclature in Solr

2020-06-17 Thread Trey Grainger
Sorry: > > but I maintain that leader vs. follower behavior is inconsistent here. Sorry, that should have said "I maintain that leader vs. follower behavior is consistent here." Trey Grainger Founder, Searchkernel https://searchkernel.com On Wed, Jun 17, 2020 at 6:03 PM Trey Grainger wrote: >

Re: Getting rid of Master/Slave nomenclature in Solr

2020-06-17 Thread Trey Grainger
Hi Walter, >In Solr Cloud, the leader knows about each follower and updates them. Respectfully, I think you're mixing the "TYPE" of replica with the role of the "leader" and "follower" In SolrCloud, only if the TYPE of a follower is NRT or TLOG does the leader push updates those followers. When

Re: Getting rid of Master/Slave nomenclature in Solr

2020-06-17 Thread Sameer Maggon
+1 for simplifying and using the Leader/Follower Terminology. Our company operates both SolrCloud, Standalone Solr, and Master/Slave Configurations, outside of the Solr Developer community, it's painful and confusing to talk about Master/Slave and Leader/Replica. It would be easier if we had the fo

Re: Getting rid of Master/Slave nomenclature in Solr

2020-06-17 Thread Walter Underwood
But they are not the same. In Solr Cloud, the leader knows about each follower and updates them. In standalone, the master has no idea that slaves exist until a replication request arrives. In Solr Cloud, the leader is elected. In standalone, that role is fixed at config load time. Looking ahead

Re: Getting rid of Master/Slave nomenclature in Solr

2020-06-17 Thread gnandre
+1 for Leader-Follower. How about Publisher-Subscriber? On Wed, Jun 17, 2020 at 5:19 PM Rahul Goswami wrote: > +1 on avoiding SolrCloud terminology. In the interest of keeping it obvious > and simple, may I I please suggest primary/secondary? > > On Wed, Jun 17, 2020 at 5:14 PM Atita Arora wrot

Re: Getting rid of Master/Slave nomenclature in Solr

2020-06-17 Thread Trey Grainger
I guess I don't see it as polysemous, but instead simplifying. In my proposal, the terms "leader" and "follower" would have the exact same meaning in both SolrCloud and standalone mode. The only difference would be that SolrCloud automatically manages the leaders and followers, whereas in standalo

Re: RankLib model output format to Solr LTR model format

2020-06-17 Thread gnandre
Thanks Doug, this is very helpful. On Wed, Jun 17, 2020 at 1:11 PM Doug Turnbull < dturnb...@opensourceconnections.com> wrote: > There are several scripts for doing this. > > I might encourage you to checkout our Hello LTR library of notebooks, which > has a ranklib training driver, and helpers t

Re: Getting rid of Master/Slave nomenclature in Solr

2020-06-17 Thread Rahul Goswami
+1 on avoiding SolrCloud terminology. In the interest of keeping it obvious and simple, may I I please suggest primary/secondary? On Wed, Jun 17, 2020 at 5:14 PM Atita Arora wrote: > I agree avoiding using of solr cloud terminology too. > > I may suggest going for "prime" and "clone" > (Short an

Re: Getting rid of Master/Slave nomenclature in Solr

2020-06-17 Thread Atita Arora
I agree avoiding using of solr cloud terminology too. I may suggest going for "prime" and "clone" (Short and precise as Master and Slave). Best, Atita On Wed, 17 Jun 2020, 22:50 Walter Underwood, wrote: > I strongly disagree with using the Solr Cloud leader/follower terminology > for non-C

Re: Autocommit in SolrCloud with many shards

2020-06-17 Thread Erick Erickson
Please raise a JIRA and attach your patch to that…. Best, Erick P.S. Buy me some beers sometime if we’re even in the same place... > On Jun 17, 2020, at 5:00 PM, Bram Van Dam wrote: > > Thanks for pointing that out. I'm attaching a patch for the ref-guide > which summarizes what you said. Mayb

Re: Autocommit in SolrCloud with many shards

2020-06-17 Thread Bram Van Dam
Thanks for pointing that out. I'm attaching a patch for the ref-guide which summarizes what you said. Maybe other people will find this useful as well? Oh and Erick, thanks for your ever thoughtful replies. Given all the hours of your time I've soaked up over the years, you should probably start i

Re: Getting rid of Master/Slave nomenclature in Solr

2020-06-17 Thread Walter Underwood
I strongly disagree with using the Solr Cloud leader/follower terminology for non-Cloud clusters. People in my company are confused enough without using polysemous terminology. “This node is the leader, but it means something different than the leader in this other cluster.” I’m dreading that conv

Re: Getting rid of Master/Slave nomenclature in Solr

2020-06-17 Thread Trey Grainger
Proposal: "A Solr COLLECTION is composed of one or more SHARDS, which each have one or more REPLICAS. Each replica can have a ROLE of either: 1) A LEADER, which can process external updates for the shard 2) A FOLLOWER, which receives updates from another replica" (Note: I prefer "role" but if othe

Getting rid of Master/Slave nomenclature in Solr

2020-06-17 Thread Anshum Gupta
Hi everyone, Moving a conversation that was happening on the PMC list to the public forum. Most of the following is just me recapping the conversation that has happened so far. Some members of the community have been discussing getting rid of the master/slave nomenclature from Solr. While this m

Re: RankLib model output format to Solr LTR model format

2020-06-17 Thread Doug Turnbull
There are several scripts for doing this. I might encourage you to checkout our Hello LTR library of notebooks, which has a ranklib training driver, and helpers to log training data, train a model w/ Ranklib, and search with it. I am using this code for my LTR contributions AI Powered Search http

Re: Java - setting multi-valued fields

2020-06-17 Thread kumar gaurav
HI Example: String[] values = new String[] {“value 1”, “value 2” }; inputDoc.setField (multiFieldName, values); Can you try once to change the array to list ? List values = new ArrayList<>(); values.add("value 1"); values.add("value 2"); inputDoc.setField (multiFieldName, values); regar

Re: Solr 7.6 optimize index size increase

2020-06-17 Thread Erick Erickson
What Walter said. Although with Solr 7.6, unless you specify maxSegments explicitly, you won’t create segments over the default 5G maximum. And if you have in the past specified maxSegments so you have segments over 5G, optimize (again without specifying maxSegments) will do a “singleton merge”

Re: Facet Performance

2020-06-17 Thread Erick Erickson
queryResultCache doesn’t really help with faceting, even if it’s hit for the main query. That cache only stores a subset of the hits, and to facet properly you need the entire result set…. > On Jun 17, 2020, at 12:47 PM, James Bodkin > wrote: > > We've noticed that the filterCache uses a sig

Re: Facet Performance

2020-06-17 Thread James Bodkin
We've noticed that the filterCache uses a significant amount of memory, as we've assigned 8GB Heap per instance. In total, we have 32 shards with 2 replicas, hence (8*32*2) 512G Heap space alone, further memory is required to ensure the index is always memory mapped for performance reasons. Ide

RankLib model output format to Solr LTR model format

2020-06-17 Thread gnandre
Hi, Before I start writing my own implementation for converting RankLib's model output format to Solr LTR model format for my own use cases, I just wanted to check if there is any work done on this front already. Any references are welcome.

Re: Solr 7.6 optimize index size increase

2020-06-17 Thread Walter Underwood
From that short description, you should not be running optimize at all. Just stop doing it. It doesn’t make that big a difference. It may take your indexes a few weeks to get back to a normal state after the forced merges. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood

Re: Master Slave Terminology

2020-06-17 Thread Walter Underwood
I’ve long thought that master/slave was not the right metaphor for a pull model anyway. We probably should not use “replica” since that already has a use in Solr Cloud. Where is the discussion? wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jun 16

Re: ChildDocTransformer and export handler

2020-06-17 Thread Munendra S N
Currently, Doc transformers are not supported while exporting the results. The document covers the field requirements for the export handler. I hope this helps. https://lucene.apache.org/solr/guide/8_5/exporting-result-sets.html#field-requirements Regards, Munendra S N On Wed, Jun 17, 2020 at

Re: Facet Performance

2020-06-17 Thread Michael Gibney
To expand a bit on what Erick said regarding performance: my sense is that the RefGuide assertion that "docValues=true" makes faceting "faster" could use some qualification/clarification. My take, fwiw: First, to reiterate/paraphrase what Erick said: the "faster" assertion is not comparing to "fac

Re: Master Slave Terminology

2020-06-17 Thread Doug Turnbull
+1 to name change. Also 'overseer' which doesn't go well with Master/Slave! On Wed, Jun 17, 2020 at 11:16 AM David Smiley wrote: > priv...@lucene.apache.org but it should have been public and expect it to > spill out to the dev list today. > > ~ David > > > On Wed, Jun 17, 2020 at 11:14 AM Mike

ChildDocTransformer and export handler

2020-06-17 Thread Ludger Steens
Dear Community, we are using the /export handler with Solr 7.7 to fetch a large number of documents from Solr. Recently we have extended our schema with Child Documents and now we are wondering if/how it is possible to export parent documents together with their corresponding Child Documents.

Re: Master Slave Terminology

2020-06-17 Thread David Smiley
priv...@lucene.apache.org but it should have been public and expect it to spill out to the dev list today. ~ David On Wed, Jun 17, 2020 at 11:14 AM Mike Drob wrote: > Hi Jan, > > Can you link to the discussion? I searched the dev list and didn’t see > anything, is it on slack or a jira or some

Re: Master Slave Terminology

2020-06-17 Thread Mike Drob
Hi Jan, Can you link to the discussion? I searched the dev list and didn’t see anything, is it on slack or a jira or somewhere else? Mike On Wed, Jun 17, 2020 at 1:51 AM Jan Høydahl wrote: > Hi Kaya, > > Thanks for bringing it up. The topic is already being discussed by > developers, so expect

Java - setting multi-valued fields

2020-06-17 Thread Eivind Hodneland
Hi, My customer has a Solr index with a large amount of fields, many of these are multivalued (type="string", multiValued="true"). I am having problems with setting the values for these fields in my Java update processors. Example: String[] values = new String[] {"value 1", "value 2" }; inputDo

Re: Facet Performance

2020-06-17 Thread Erick Erickson
Uninvertible is a safety mechanism to make sure that you don’t _unknowingly_ use a docValues=false field for faceting/grouping/sorting/function queries. The primary point of docValues=true is twofold: 1> reduce Java heap requirements by using the OS memory to hold it 2> uninverting can be expen

Re: Facet Performance

2020-06-17 Thread James Bodkin
The large majority of the relevant fields have fewer than 20 unique values. We have two fields over that with 150 unique values and 5300 unique values retrospectively. At the moment, our filterCache is configured with a maximum size of 8192. From the DocValues documentation (https://lucene.apac

Re: Facet Performance

2020-06-17 Thread Anthony Groves
Ah, interesting! So if the number of possible values is low (like <= 10), it is faster to *not *use docvalues on that (indexed) faceted field? Does this hold true even when using faceting techniques like tag and exclusion? Thanks, Anthony On Wed, Jun 17, 2020 at 9:37 AM David Smiley wrote: > I

Re: Facet Performance

2020-06-17 Thread David Smiley
I strongly recommend setting indexed=true on a field you facet on for the purposes of efficient refinement (fq=field:value). But it strictly isn't required, as you have discovered. ~ David On Wed, Jun 17, 2020 at 9:02 AM Michael Gibney wrote: > facet.method=enum works by executing a query (ag

Re: Facet Performance

2020-06-17 Thread Michael Gibney
facet.method=enum works by executing a query (against indexed values) for each indexed value in a given field (which, for indexed=false, is "no values"). So that explains why facet.method=enum no longer works. I was going to suggest that you might not want to set indexed=false on the docValues face

RE: Solr cloud backup/restore not working

2020-06-17 Thread Kommu, Vinodh K.
Hi, What is the log level defined for solr nodes? Did you used requestid in restore command? If so, check the status of the requestid if that points to any errors. Thanks & Regards, Vinodh -Original Message- From: yaswanth kumar Sent: Wednesday, June 17, 2020 4:33 PM To: solr-user@luc

Re: Migration for total noob?

2020-06-17 Thread Erick Erickson
Yeah, there’s a lot to get your head around with Solr, I wish it could be simpler… If at all possible, I recommend you just re-index the data from the system of record. That aside, you say you “copied the cores”. Is this still stand-alone and did that include the conf directory? What this seem

Re: Autocommit in SolrCloud with many shards

2020-06-17 Thread Erick Erickson
Each node has its own timer that starts when it receives an update. So in your situation, 60 seconds after any give replica gets it’s first update, all documents that have been received in the interval will be committed. But note several things: 1> commits will tend to cluster for a given shard.

Re: Solr 7.6 optimize index size increase

2020-06-17 Thread Raveendra Yerraguntla
Thank you David, Walt , Eric. 1. First time bloated index generated , there is no disk space issue. one copy of index is 1/6 of disk capacity. we ran into disk capacity after more than 2  copies of bloated copies.2. Solr is upgraded from 5.*. in 5.* more than 5 segments is causing performance is

Re: Solr cloud backup/restore not working

2020-06-17 Thread yaswanth kumar
Can someone please guide me on where can I get more detailed error of the above exception while doing restore?? All that I see in solr.log was pasted above Thanks, On Tue, Jun 16, 2020 at 10:44 AM yaswanth kumar wrote: > I don't see anything related in the solr.log file for the same error. Not

Autocommit in SolrCloud with many shards

2020-06-17 Thread Bram Van Dam
'morning :-) I'm wondering how autocommits work in Solr. Say I have a cluster with many nodes and many colections with many shards. If each collection's config has a hard autocommit configured every minute, does that mean that SolrCloud (presumably the leader?) will dish out commit requests to ea

Re: Facet Performance

2020-06-17 Thread James Bodkin
Thanks, I've implemented some queries that improve the first-hit execution for faceting. Since turning off indexed on those fields, we've noticed that facet.method=enum no longer returns the facets when used. Using facet.method=fc/fcs is significantly slower compared to facet.method=enum for us

Log4J Logging to Http

2020-06-17 Thread Krönert Florian
Hello everyone, We want to log our queries to a HTTP endpoint and tried configuring our log4j settings accordingly. We are using Solr inside Docker with the official Solr image (version solr:8.3.1). As soon as we add a http appender, we receive errors on startup and solr fails to start complet