Always-on trace id generation merged to Solr main

2023-08-24 Thread Alex Deparvu
Dropping a note to let everyone know that SOLR-15367 [0] was just merged to
Solr main (no 9.x backport).

This change brings always-on trace id generation based on OTEL libraries,
but without having the entire tracing mechanism enabled.
This is a much improved version of the existing rid mechanism so I am
interested in any type of feedback relating to what is missing or what
could be added to make this more useful.

I have enabled this by default to get more test coverage and will keep an
eye on the build status, but if you see anything unexpected please send a
message to the list.

best,
alex

[0] https://issues.apache.org/jira/browse/SOLR-15367


Re: SolrStream error response format - backwards compatibility

2023-08-24 Thread Jason Gerlowski
Hey Alex,

I'm sure we're not totally consistent on this, but in most cases I
think it's fine to change error messages or format etc. without
worrying unduly about backcompat.  It's definitely something we've
done in the past.

Best,

Jason

On Fri, Aug 18, 2023 at 12:03 PM Alex Deparvu  wrote:
>
> Hi,
>
> Trying to collect some community feedback on SOLR-16929 [0], which is
> trying to undo a change made in SOLR-15451.
> From what I gather this change was made to allow auth related messages
> (which come as html) to be presented 'as a string' (basically skip
> decoding) to the user, but it accidentally covered the 400 errors (as shown
> below). introducing some bugs when the response is in javabin format.
> I am proposing to revert this 'present as a string' change for the 400s.
>
> The part that I'm concerned about is the response format:
>
> Prior to SOLR-15451 it was a simple "one liner":
> java.util.concurrent.ExecutionException: java.io.IOException: -->
> http://127.0.0.1:65079/solr:sort param field can't be found: blah
>
> Post change it turned into a bigger chun:
> java.util.concurrent.ExecutionException: java.io.IOException: Query to
> '/streams_shard2_replica_n2/select?q=*:*&fl=a_s,a_i,a_f,blah&sort=blah+asc&distrib=false'
> failed due to: (400) {
> "responseHeader":{
> "zkConnected":true,
> "status":400,
> "QTime":1
>   },
>   "error":{
>
> "metadata":["error-class","org.apache.solr.common.SolrException","root-error-class","org.apache.solr.common.SolrException"],
> "msg":"sort param field can't be found: blah",
> "code":400
>   }
>   }
>
> Now the actual question is: does this present any backwards compatibility
> issues? if anyone had upgraded and changed their code to parse the error
> and present something "nice" to the user, they are now presented with the
> requirement to change back to old style code.
> I would say there are no issues, but it doesn't hurt to ask before messing
> with this code (again).
>
> best,
> alex
>
> [0] https://issues.apache.org/jira/browse/SOLR-16929

-
To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org
For additional commands, e-mail: dev-h...@solr.apache.org



Re: ZkCLI.java (and bin/zkcli.sh) commands

2023-08-24 Thread Jason Gerlowski
> what commands that exist in zkcli are already covered with existing bin/solr 
> zk subcommands, which maybe are no longer needed

To clarify - you're thinking of removing functionality from zkcli if
there's already an equivalent in "bin/solr zk"?  Or the opposite?

I'm definitely rusty on this bit of the code, but my understanding was
that, in broad strokes:
  - "bin/solr zk upconfig" == "zkcli.sh upconfig"
  - "bin/solr zk downconfig" == "zkcli.sh downconfig"
  - "bin/solr zk cp" == "zkcli.sh putfile"
  - "bin/solr zk rm" == "zkcli.sh clear"
  - "bin/solr zk mkroot" == "zkcli.sh bootstrap

I think you'd need to look at each pair in detail though to see if
there aren't gaps.

> which aspects of zkcli.sh people ACTUALLY use

I use zkcli.sh's file manipulation commands ("put", "putfile", "get",
"getfile", "clear", and "ls") fairly often.  I'm not sure if there's a
reason I've fallen into that pattern instead of using "bin/solr", or
whether there was a bug or feature gap that forced me into using
zkcli.sh

One last note: don't forget that the solr operator uses zkcli.sh a
good bit on users' behalf.  Particularly the "get", "clusterprop", and
"putfile" commands.

Best,

Jason

On Mon, Aug 21, 2023 at 10:51 AM Eric Pugh
 wrote:
>
> Hi all, I wanted to tap the braintrust to see what commands that exist in 
> zkcli are already covered with existing bin/solr zk subcommands, which maybe 
> are no longer needed, and which need porting over in some fashion.
>
> Looking at ZkCLITest.java, I see tests for:
>
> testBootstrapWithChroot - This appears to bootstrap a solr home and make the 
> chroot in Zk. We have bin/solr zk upconfig, however it doesn’t do the 
> chroot step, you have to bin/solr zk mkroot.   We could change upconfig to 
> have a chroot property, or just make it implicit that if you create a path 
> that needs a chroot, then it does.
>
> testMakePath - Makes a path, and that appears to just duplicate bin/solr zk 
> mkroot?  Unless something special about roots?   I wonder if “bin/solr zk mk” 
> would be a better name?
>
> testPut - Just puts some random bytes.  Bin/solr zk cp lets you copy files 
> around, and appears to be similar.   Do we need to keep this?
>
> testPutCompressed - Similar to the put, but compresses the file.  Do we need 
> this functionality?  (I thought somewhere we do it on the fly or 
> something??).   It could be added I guess to bin/solr zk
>
> testPutFile - Similar to bin/solr zk cp with files.
>
> testPutFileCompressed - A compressed version of a file <— Do we need this?
>
> testList/ testLs - Similar to bin/solr zk ls
>
> testUpdateAcls - appears to have been needed at a point in time.  Calls 
> SolrZKClient.updateACLs(), which is also maybe called by ZkController, but 
> maybe no longer needed?
>
>
> Interestingly there appears to some code to attempt to bootstrap a Zookeeper 
> or maybe aspects of Solr if Solr isn’t actually running?
>
> Would love thoughts/feedback, any suggestions or ideas.   Especially, which 
> aspects of zkcli.sh people ACTUALLY use, because I’d love to migrate less 
> rather than more ;-).
>
> Eric
>
>
> ___
> Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | 
> http://www.opensourceconnections.com  
> | My Free/Busy 
> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed 
> 
> This e-mail and all contents, including attachments, is considered to be 
> Company Confidential unless explicitly stated otherwise, regardless of 
> whether attachments are marked as such.
>

-
To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org
For additional commands, e-mail: dev-h...@solr.apache.org



Re: Always-on trace id generation merged to Solr main

2023-08-24 Thread Ishan Chattopadhyaya
+1, great improvement. Thanks!

On Thu, 24 Aug, 2023, 6:53 pm Alex Deparvu,  wrote:

> Dropping a note to let everyone know that SOLR-15367 [0] was just merged to
> Solr main (no 9.x backport).
>
> This change brings always-on trace id generation based on OTEL libraries,
> but without having the entire tracing mechanism enabled.
> This is a much improved version of the existing rid mechanism so I am
> interested in any type of feedback relating to what is missing or what
> could be added to make this more useful.
>
> I have enabled this by default to get more test coverage and will keep an
> eye on the build status, but if you see anything unexpected please send a
> message to the list.
>
> best,
> alex
>
> [0] https://issues.apache.org/jira/browse/SOLR-15367
>


Re: Throttling "expensive" admin operations

2023-08-24 Thread Pierre Salagnac
It seems there was some push back due to the complexity and the impact on
the overseer code. So this PR is now probably stale.

I opened a simpler version that introduces a dedicated thread pool for
"expensive" operations. End behavior is the same: we don't execute more
than 5 concurrent expensive operations per Solr node. But the change in the
code base is more scoped, mostly there is no change in task tracking.

Any feedback is appreciated
Thanks

[1] https://github.com/apache/solr/pull/1864

Le lun. 7 août 2023 à 19:28, David Smiley  a
écrit :

> In https://issues.apache.org/jira/browse/SOLR-16879, we have a proposal
> for
> certain core level admin operations to be deemed expensive and throttled
> (e.g. only 5 at a time).  It only works when invoked with the async style
> (with the "async" param).
>
> At present, only BACKUPCORE & RESTORECORE are flagged as expensive in the
> PR.  I'm thinking other operations should consider such designation too:
>
> MERGEINDEXES, SPLIT.
>
> That's probably it.  Opinions & reviews are welcome as usual!
>
> ~ David
>


Fwd: [JENKINS] Solr » Solr-Check-main - Build # 7785 - Still Failing!

2023-08-24 Thread David Smiley
Many Bats tests seem to be failing on main branch; can't start Solr:
> Port 8983 is already being used by another process (pid: 31806)

https://ci-builds.apache.org/job/Solr/job/Solr-Check-main/

Eric, you've been looking at Bats lately; do you know about this?

~ David


-- Forwarded message -
From: Apache Jenkins Server 
Date: Thu, Aug 24, 2023 at 7:35 PM
Subject: [JENKINS] Solr » Solr-Check-main - Build # 7785 - Still Failing!
To: 


Build: https://ci-builds.apache.org/job/Solr/job/Solr-Check-main/7785/

All tests passed

Build Log:
[...truncated 1105 lines...]
#   ERROR: Solr is not running on url http://localhost:8983 after 1 seconds
# --
#
# Please find the SOLR_HOME snapshot for failed test #1 at:
/home/jenkins/jenkins-slave/workspace/Solr/Solr-Check-main/solr/packaging/build/test-output/failure-snapshots/1-1
# Last output:
# WARNING: URLs provided to this tool needn't include Solr's context-root
(e.g. "/solr"). Such URLs are deprecated and support for them will be
removed in a future release. Correcting from [http://localhost:8983/solr]
to [http://localhost:8983].
#
# ERROR: Solr is not running on url http://localhost:8983 after 1 seconds
not ok 2 assert for cloud mode in 3487ms
# (from function `refute_output' in file
/home/jenkins/jenkins-slave/workspace/Solr/Solr-Check-main/.gradle/node/packaging/node_modules/bats-assert/src/assert.bash,
line 360,
#  in test file test/test_assert.bats, line 48)
#   `refute_output --partial "ERROR"' failed
#
# -- output should not contain substring --
# substring (1 lines):
#   ERROR
# output (2 lines):
#
#   ERROR: Solr is not running on url http://localhost:8983 after 1 seconds
# --
#
# Please find the SOLR_HOME snapshot for failed test #2 at:
/home/jenkins/jenkins-slave/workspace/Solr/Solr-Check-main/solr/packaging/build/test-output/failure-snapshots/2-2
# Last output:
#
# ERROR: Solr is not running on url http://localhost:8983 after 1 seconds
ok 3 auth rejects blockUnknown option with invalid boolean in 2151ms
ok 4 auth rejects updateIncludeFileOnly option with invalid boolean in
2120ms
not ok 5 setup_file failed
# (from function `setup_file' in test file test/test_bats.bats, line 26)
#   `solr start -c -V' failed
# Using Solr root directory:
/home/jenkins/jenkins-slave/workspace/Solr/Solr-Check-main/solr/packaging/build/solr-10.0.0-SNAPSHOT
# Using Java: /home/jenkins/tools/java/latest11/bin/java
# openjdk version "11.0.16.1" 2022-08-12
# OpenJDK Runtime Environment Temurin-11.0.16.1+1 (build 11.0.16.1+1)
# OpenJDK 64-Bit Server VM Temurin-11.0.16.1+1 (build 11.0.16.1+1, mixed
mode)
#
# Port 8983 is already being used by another process (pid: 31806)
# Please choose a different port using the -p option.
#
not ok 6 setup_file failed
# (from function `setup_file' in test file test/test_config.bats, line 22)
#   `solr start -c' failed
#
# Port 8983 is already being used by another process (pid: 31806)
# Please choose a different port using the -p option.
#
not ok 8 create for non cloud mode in 1516ms
# (from function `assert_output' in file
/home/jenkins/jenkins-slave/workspace/Solr/Solr-Check-main/.gradle/node/packaging/node_modules/bats-assert/src/assert.bash,
line 247,
#  in test file test/test_create.bats, line 34)
#   `assert_output --partial "Created new core 'COLL_NAME'"' failed
#
# -- output does not contain substring --
# substring (1 lines):
#   Created new core 'COLL_NAME'
# output (3 lines):
#   Neither -zkHost or -solrUrl parameters provided so assuming solrUrl is
http://localhost:8983.
#
#   ERROR: Server refused connection at:
http://localhost:8983/solr/admin/info/system
# --
#
# Please find the SOLR_HOME snapshot for failed test #1 at:
/home/jenkins/jenkins-slave/workspace/Solr/Solr-Check-main/solr/packaging/build/test-output/failure-snapshots/8-1
# Last output:
# Neither -zkHost or -solrUrl parameters provided so assuming solrUrl is
http://localhost:8983.
#
# ERROR: Server refused connection at:
http://localhost:8983/solr/admin/info/system
not ok 9 create for cloud mode in 1521ms
# (from function `assert_output' in file
/home/jenkins/jenkins-slave/workspace/Solr/Solr-Check-main/.gradle/node/packaging/node_modules/bats-assert/src/assert.bash,
line 247,
#  in test file test/test_create.bats, line 40)
#   `assert_output --partial "Created collection 'COLL_NAME'"' failed
#
# -- output does not contain substring --
# substring (1 lines):
#   Created collection 'COLL_NAME'
# output (3 lines):
#   Neither -zkHost or -solrUrl parameters provided so assuming solrUrl is
http://localhost:8983.
#
#   ERROR: Server refused connection at:
http://localhost:8983/solr/admin/info/system
# --
#
# Please find the SOLR_HOME snapshot for failed test #2 at:
/home/jenkins/jenkins-slave/workspace/Solr/Solr-Check-main/solr/packaging/build/test-output/failure-snapshots/9-2
# Last output:
# Neither -zkHost or -solrUrl parameters provided so assuming solrUrl is
http://localhost:8983.
#
# ERROR: Server refused connection at:
http://localhost:8983/solr/admin/i

Re: [JENKINS] Solr » Solr-Check-main - Build # 7785 - Still Failing!

2023-08-24 Thread Houston Putman
It happens on the first test sometimes, so I think it may be a rogue
process on the machine.

On Thu, Aug 24, 2023 at 11:27 PM David Smiley 
wrote:

> Many Bats tests seem to be failing on main branch; can't start Solr:
> > Port 8983 is already being used by another process (pid: 31806)
>
> https://ci-builds.apache.org/job/Solr/job/Solr-Check-main/
>
> Eric, you've been looking at Bats lately; do you know about this?
>
> ~ David
>
>
> -- Forwarded message -
> From: Apache Jenkins Server 
> Date: Thu, Aug 24, 2023 at 7:35 PM
> Subject: [JENKINS] Solr » Solr-Check-main - Build # 7785 - Still Failing!
> To: 
>
>
> Build: https://ci-builds.apache.org/job/Solr/job/Solr-Check-main/7785/
>
> All tests passed
>
> Build Log:
> [...truncated 1105 lines...]
> #   ERROR: Solr is not running on url http://localhost:8983 after 1
> seconds
> # --
> #
> # Please find the SOLR_HOME snapshot for failed test #1 at:
>
> /home/jenkins/jenkins-slave/workspace/Solr/Solr-Check-main/solr/packaging/build/test-output/failure-snapshots/1-1
> # Last output:
> # WARNING: URLs provided to this tool needn't include Solr's context-root
> (e.g. "/solr"). Such URLs are deprecated and support for them will be
> removed in a future release. Correcting from [http://localhost:8983/solr]
> to [http://localhost:8983].
> #
> # ERROR: Solr is not running on url http://localhost:8983 after 1 seconds
> not ok 2 assert for cloud mode in 3487ms
> # (from function `refute_output' in file
>
> /home/jenkins/jenkins-slave/workspace/Solr/Solr-Check-main/.gradle/node/packaging/node_modules/bats-assert/src/assert.bash,
> line 360,
> #  in test file test/test_assert.bats, line 48)
> #   `refute_output --partial "ERROR"' failed
> #
> # -- output should not contain substring --
> # substring (1 lines):
> #   ERROR
> # output (2 lines):
> #
> #   ERROR: Solr is not running on url http://localhost:8983 after 1
> seconds
> # --
> #
> # Please find the SOLR_HOME snapshot for failed test #2 at:
>
> /home/jenkins/jenkins-slave/workspace/Solr/Solr-Check-main/solr/packaging/build/test-output/failure-snapshots/2-2
> # Last output:
> #
> # ERROR: Solr is not running on url http://localhost:8983 after 1 seconds
> ok 3 auth rejects blockUnknown option with invalid boolean in 2151ms
> ok 4 auth rejects updateIncludeFileOnly option with invalid boolean in
> 2120ms
> not ok 5 setup_file failed
> # (from function `setup_file' in test file test/test_bats.bats, line 26)
> #   `solr start -c -V' failed
> # Using Solr root directory:
>
> /home/jenkins/jenkins-slave/workspace/Solr/Solr-Check-main/solr/packaging/build/solr-10.0.0-SNAPSHOT
> # Using Java: /home/jenkins/tools/java/latest11/bin/java
> # openjdk version "11.0.16.1" 2022-08-12
> # OpenJDK Runtime Environment Temurin-11.0.16.1+1 (build 11.0.16.1+1)
> # OpenJDK 64-Bit Server VM Temurin-11.0.16.1+1 (build 11.0.16.1+1, mixed
> mode)
> #
> # Port 8983 is already being used by another process (pid: 31806)
> # Please choose a different port using the -p option.
> #
> not ok 6 setup_file failed
> # (from function `setup_file' in test file test/test_config.bats, line 22)
> #   `solr start -c' failed
> #
> # Port 8983 is already being used by another process (pid: 31806)
> # Please choose a different port using the -p option.
> #
> not ok 8 create for non cloud mode in 1516ms
> # (from function `assert_output' in file
>
> /home/jenkins/jenkins-slave/workspace/Solr/Solr-Check-main/.gradle/node/packaging/node_modules/bats-assert/src/assert.bash,
> line 247,
> #  in test file test/test_create.bats, line 34)
> #   `assert_output --partial "Created new core 'COLL_NAME'"' failed
> #
> # -- output does not contain substring --
> # substring (1 lines):
> #   Created new core 'COLL_NAME'
> # output (3 lines):
> #   Neither -zkHost or -solrUrl parameters provided so assuming solrUrl is
> http://localhost:8983.
> #
> #   ERROR: Server refused connection at:
> http://localhost:8983/solr/admin/info/system
> # --
> #
> # Please find the SOLR_HOME snapshot for failed test #1 at:
>
> /home/jenkins/jenkins-slave/workspace/Solr/Solr-Check-main/solr/packaging/build/test-output/failure-snapshots/8-1
> # Last output:
> # Neither -zkHost or -solrUrl parameters provided so assuming solrUrl is
> http://localhost:8983.
> #
> # ERROR: Server refused connection at:
> http://localhost:8983/solr/admin/info/system
> not ok 9 create for cloud mode in 1521ms
> # (from function `assert_output' in file
>
> /home/jenkins/jenkins-slave/workspace/Solr/Solr-Check-main/.gradle/node/packaging/node_modules/bats-assert/src/assert.bash,
> line 247,
> #  in test file test/test_create.bats, line 40)
> #   `assert_output --partial "Created collection 'COLL_NAME'"' failed
> #
> # -- output does not contain substring --
> # substring (1 lines):
> #   Created collection 'COLL_NAME'
> # output (3 lines):
> #   Neither -zkHost or -solrUrl parameters provided so assuming solrUrl is
> http://localhost:8983.
> #
> #   ERROR: Server refused connection at:
> http://local