Re: Crave is doing well

2023-10-06 Thread Eric Pugh
Agreed on the branch merging.  It’s been great to have it running the full set 
of tests!


> On Oct 5, 2023, at 10:58 PM, David Smiley  wrote:
> 
> I believe the Crave issues with branch merging seem to have been fixed.  If 
> someone sees otherwise, please let me know.
> 
> And boy Crave is fast!  The whole GHA action takes 8m but Crave side is 6m of 
> which 4m of it is tests running.  It's faster than "precommit" will is still 
> running in a standard GHA.  Isn't that crazy!  Yes, there's room for 
> improvement.
> 
> There are opportunities for Crave to come up with a GHA self hosted runner to 
> substantially eat away at that 2m, like a needless checkout of all the code 
> on the GHA side that basically isn't used.
> 
> There are opportunities for our project to try to optimize the Gradle build 
> so that it can start running tests (or whatever task) as soon as possible no 
> matter where it runs.  There's a whole section to the Gradle docs on build 
> optimization.  Maybe someone would like to explore that, like trying the 
> "configuration cache" 
> https://docs.gradle.org/current/userguide/configuration_cache.html  
> 
> I have access to build analytics in Crave that give some insights:  The first 
> 48 seconds is not very concurrent and not downloading anything.  The next 36 
> seconds it downloads 100MB of something (don't know what).  Then CPUs go full 
> tilt with tests.  It's very apparent that Gradle testing has no "work 
> stealing" algorithm amongst the runners.
> 
> 
> 
> I'm a bit perplexed at the downloading of 100MB because the image for the 
> build machine has commands I added to pre-download stuff.  That looks like 
> the following:
> 
> # Pre-download what we can through Gradle
> ./gradlew --write-verification-metadata sha256 --dry-run
> rm gradle/verification-metadata.dryrun.xml
> ./gradlew -p solr/solr-ref-guide downloadAntora
> ./gradlew -p solr/packaging downloadBats
> # May need more memory
> sed -i 's/-Xmx1g/-Xmx2g/g' gradle.properties
> # Use lots of CPUs
> sed -i 's/org.gradle.workers.max=.*/org.gradle.workers.max=96/' 
> gradle.properties
> sed -i 's/tests.jvms=.*/tests.jvms=96/' gradle.properties
> 
> ./gradlew assemble || true
> 
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley

___
Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | 
http://www.opensourceconnections.com  | 
My Free/Busy   
Co-Author: Apache Solr Enterprise Search Server, 3rd Ed 


This e-mail and all contents, including attachments, is considered to be 
Company Confidential unless explicitly stated otherwise, regardless of whether 
attachments are marked as such.



Re: Crave is doing well

2023-10-06 Thread Kevin Risden
For PRs crave might be doing ok, but branch_9x  check builds are all
failing for a while now?

https://lists.apache.org/list?bui...@solr.apache.org:lte=1y:%22rsync%20error:%20some%20files/attrs%20were%20not%20transferred%22

Kevin Risden


On Fri, Oct 6, 2023 at 3:38 PM Eric Pugh 
wrote:

> Agreed on the branch merging.  It’s been great to have it running the full
> set of tests!
>
>
> > On Oct 5, 2023, at 10:58 PM, David Smiley  wrote:
> >
> > I believe the Crave issues with branch merging seem to have been fixed.
> If someone sees otherwise, please let me know.
> >
> > And boy Crave is fast!  The whole GHA action takes 8m but Crave side is
> 6m of which 4m of it is tests running.  It's faster than "precommit" will
> is still running in a standard GHA.  Isn't that crazy!  Yes, there's room
> for improvement.
> >
> > There are opportunities for Crave to come up with a GHA self hosted
> runner to substantially eat away at that 2m, like a needless checkout of
> all the code on the GHA side that basically isn't used.
> >
> > There are opportunities for our project to try to optimize the Gradle
> build so that it can start running tests (or whatever task) as soon as
> possible no matter where it runs.  There's a whole section to the Gradle
> docs on build optimization.  Maybe someone would like to explore that, like
> trying the "configuration cache"
> https://docs.gradle.org/current/userguide/configuration_cache.html
> >
> > I have access to build analytics in Crave that give some insights:  The
> first 48 seconds is not very concurrent and not downloading anything.  The
> next 36 seconds it downloads 100MB of something (don't know what).  Then
> CPUs go full tilt with tests.  It's very apparent that Gradle testing has
> no "work stealing" algorithm amongst the runners.
> >
> >
> >
> > I'm a bit perplexed at the downloading of 100MB because the image for
> the build machine has commands I added to pre-download stuff.  That looks
> like the following:
> >
> > # Pre-download what we can through Gradle
> > ./gradlew --write-verification-metadata sha256 --dry-run
> > rm gradle/verification-metadata.dryrun.xml
> > ./gradlew -p solr/solr-ref-guide downloadAntora
> > ./gradlew -p solr/packaging downloadBats
> > # May need more memory
> > sed -i 's/-Xmx1g/-Xmx2g/g' gradle.properties
> > # Use lots of CPUs
> > sed -i 's/org.gradle.workers.max=.*/org.gradle.workers.max=96/'
> gradle.properties
> > sed -i 's/tests.jvms=.*/tests.jvms=96/' gradle.properties
> >
> > ./gradlew assemble || true
> >
> > ~ David Smiley
> > Apache Lucene/Solr Search Developer
> > http://www.linkedin.com/in/davidwsmiley
>
> ___
> Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 |
> http://www.opensourceconnections.com <
> http://www.opensourceconnections.com/> | My Free/Busy <
> http://tinyurl.com/eric-cal>
> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <
> https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>
>
> This e-mail and all contents, including attachments, is considered to be
> Company Confidential unless explicitly stated otherwise, regardless of
> whether attachments are marked as such.
>
>


Re: Crave is doing well

2023-10-06 Thread Kevin Risden
At least the last few are failing with:

https://ci-builds.apache.org/job/Solr/job/Solr-Check-9.x/5616/console

```

> Task :solr:solrj:compileJava
/tmp/src/solr/solr/solrj/build/generated/src/main/java/org/apache/solr/client/solrj/request/CoresApi.java:20:
error: cannot find symbol
import org.apache.solr.client.api.model.InstallCoreDataRequestBody;
   ^
  symbol:   class InstallCoreDataRequestBody
  location: package org.apache.solr.client.api.model
/tmp/src/solr/solr/solrj/build/generated/src/main/java/org/apache/solr/client/solrj/request/CoresApi.java:43:
error: cannot find symbol
private final InstallCoreDataRequestBody requestBody;
  ^
  symbol:   class InstallCoreDataRequestBody
  location: class InstallCoreData
/tmp/src/solr/solr/solrj/build/generated/src/main/java/org/apache/solr/client/solrj/request/CoresApi.java:57:
error: cannot find symbol
  this.requestBody = new InstallCoreDataRequestBody();
 ^
  symbol:   class InstallCoreDataRequestBody
  location: class InstallCoreData
Note: Some input files use or override a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
3 errors

```

which I think is due to branch_9x not being cleaned before the tests are
run? not 100% sure

Kevin Risden


On Fri, Oct 6, 2023 at 3:43 PM Kevin Risden  wrote:

> For PRs crave might be doing ok, but branch_9x  check builds are all
> failing for a while now?
>
>
> https://lists.apache.org/list?bui...@solr.apache.org:lte=1y:%22rsync%20error:%20some%20files/attrs%20were%20not%20transferred%22
>
> Kevin Risden
>
>
> On Fri, Oct 6, 2023 at 3:38 PM Eric Pugh 
> wrote:
>
>> Agreed on the branch merging.  It’s been great to have it running the
>> full set of tests!
>>
>>
>> > On Oct 5, 2023, at 10:58 PM, David Smiley  wrote:
>> >
>> > I believe the Crave issues with branch merging seem to have been
>> fixed.  If someone sees otherwise, please let me know.
>> >
>> > And boy Crave is fast!  The whole GHA action takes 8m but Crave side is
>> 6m of which 4m of it is tests running.  It's faster than "precommit" will
>> is still running in a standard GHA.  Isn't that crazy!  Yes, there's room
>> for improvement.
>> >
>> > There are opportunities for Crave to come up with a GHA self hosted
>> runner to substantially eat away at that 2m, like a needless checkout of
>> all the code on the GHA side that basically isn't used.
>> >
>> > There are opportunities for our project to try to optimize the Gradle
>> build so that it can start running tests (or whatever task) as soon as
>> possible no matter where it runs.  There's a whole section to the Gradle
>> docs on build optimization.  Maybe someone would like to explore that, like
>> trying the "configuration cache"
>> https://docs.gradle.org/current/userguide/configuration_cache.html
>> >
>> > I have access to build analytics in Crave that give some insights:  The
>> first 48 seconds is not very concurrent and not downloading anything.  The
>> next 36 seconds it downloads 100MB of something (don't know what).  Then
>> CPUs go full tilt with tests.  It's very apparent that Gradle testing has
>> no "work stealing" algorithm amongst the runners.
>> >
>> >
>> >
>> > I'm a bit perplexed at the downloading of 100MB because the image for
>> the build machine has commands I added to pre-download stuff.  That looks
>> like the following:
>> >
>> > # Pre-download what we can through Gradle
>> > ./gradlew --write-verification-metadata sha256 --dry-run
>> > rm gradle/verification-metadata.dryrun.xml
>> > ./gradlew -p solr/solr-ref-guide downloadAntora
>> > ./gradlew -p solr/packaging downloadBats
>> > # May need more memory
>> > sed -i 's/-Xmx1g/-Xmx2g/g' gradle.properties
>> > # Use lots of CPUs
>> > sed -i 's/org.gradle.workers.max=.*/org.gradle.workers.max=96/'
>> gradle.properties
>> > sed -i 's/tests.jvms=.*/tests.jvms=96/' gradle.properties
>> >
>> > ./gradlew assemble || true
>> >
>> > ~ David Smiley
>> > Apache Lucene/Solr Search Developer
>> > http://www.linkedin.com/in/davidwsmiley
>>
>> ___
>> Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 |
>> http://www.opensourceconnections.com <
>> http://www.opensourceconnections.com/> | My Free/Busy <
>> http://tinyurl.com/eric-cal>
>> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <
>> https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>
>>
>> This e-mail and all contents, including attachments, is considered to be
>> Company Confidential unless explicitly stated otherwise, regardless of
>> whether attachments are marked as such.
>>
>>


Community over Code Apache Solr Hackathon

2023-10-06 Thread Eric Pugh
Folks headed to Halifax….   Jason and I have talked about hacking together on 
Solr during the conference.   Well, Good News!  I’ve got us a room, thanks to 
Brian Proffitt's help, to use on Sunday (the day after the Search Track).  When 
you check the conference schedule you should see it show up.

Room 107 on Sunday from 10:25-18:30.

Jason G and I are planning on spiking out what it would take to fire up a Solr 
cloud node with the role “zookeeper” and see if we can build a quorum ;-).   
Other things that would be interesting is to show folks who aren’t deep in Solr 
code how to build Solr, how to write tests.If we are super energetic, maybe 
I can talk folks like Jeff Zemerick from OpenNLP community to work with us in 
loading models into Solr via ONNX ;-).  So basically, anything anyone wants to 
work on ;-).

So please come join us in Room 107 on Sunday!

Eric
___
Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | 
http://www.opensourceconnections.com  | 
My Free/Busy   
Co-Author: Apache Solr Enterprise Search Server, 3rd Ed 


This e-mail and all contents, including attachments, is considered to be 
Company Confidential unless explicitly stated otherwise, regardless of whether 
attachments are marked as such.



Re: Issue with marking replicas down at startup

2023-10-06 Thread rajani m
Hi Vincent,

I have seen that behavior,  node gets re provisioned, replica on that node
is back up live and zk starts routing traffic, however the response time
from that replica is really high for a short period.

Worked around it by adding some hundreds of warming queries which puts the
replica into recovery until all the queries are replayed and hence delay
the live state. But yeah not a good solution as this always puts replica
into recovery for minutes which may not be required, if it's what you said
is issue in the replica being live before the core is loaded.

Thank you,
Rajani

On Thu, Oct 5, 2023, 3:26 AM Vincent Primault
 wrote:

> Hello,
>
> I have been looking at a previous investigation we had about an unexpected
> behaviour where a node was taking traffic for a replica that was not ready
> to take it. It seems to happen when the node is marked as live and the
> replica is marked as active, while the corresponding core was not loaded
> yet on the node.
>
> I looked at the code and in theory it should not happen, since the
> following happens in ZkController#init: mark node as down, wait for
> replicas to be marked as down, and then register the node as live. However,
> after looking at the code of publishAndWaitForDownStates, I observed that
> we wait for down states for replicas associated with cores as returned by
> CoreContainer#getCoreDescriptors... which is empty at this point since
> ZkController#init is called before cores are discovered (which happens
> later in CoreContainer#load).
>
> It hence seems to me that we basically never wait for any replicas to be
> marked as down, and continue the startup sequence by marking the node as
> live, and hence *might* take traffic for a short period of time for a
> replica that is not ready (e.g., if the node previously crashed and the
> replica stayed active).
>
> As I am new to investigating this kind of stuff in Solr Cloud, I want to
> share my findings and get feedback about whether it was possibly correct
> (in which case I'd be happy to contribute a bug fix), or whether I was
> missing something else.
>
> Thank you,
>
> Vincent Primault.
>


Re: [VOTE] Release Solr 9.4.0 RC1

2023-10-06 Thread Houston Putman
+1 (binding)

SUCCESS! [0:30:08.938639]

I also built the docker image and used it to run the Solr Operator
integration tests (using the unreleased main branch, which will soon be
v0.8.0).
This tests the prometheus exporter, replica placement, TLS, mTLS, backups,
and more.


> $ make e2e-tests SOLR_IMAGE=solr-rc:9.4.0-1
> ...

Ran 23 of 23 Specs in 492.840 seconds
> SUCCESS! -- 23 Passed | 0 Failed | 0 Pending | 0 Skipped


- Houston

On Thu, Oct 5, 2023 at 3:53 PM Alex Deparvu  wrote:

> Please vote for release candidate 1 for Solr 9.4.0
>
> The artifacts can be downloaded from:
>
> https://dist.apache.org/repos/dist/dev/solr/solr-9.4.0-RC1-rev-ee474b7db483c2242ce1d75074258236ca22103b
>
> You can run the smoke tester directly with this command:
>
> python3 -u dev-tools/scripts/smokeTestRelease.py \
>
> https://dist.apache.org/repos/dist/dev/solr/solr-9.4.0-RC1-rev-ee474b7db483c2242ce1d75074258236ca22103b
>
> You can build a release-candidate of the official docker images (full &
> slim) using the following command:
>
> SOLR_DOWNLOAD_SERVER=
>
> https://dist.apache.org/repos/dist/dev/solr/solr-9.4.0-RC1-rev-ee474b7db483c2242ce1d75074258236ca22103b/solr
> &&
> 
> \
>   docker build $SOLR_DOWNLOAD_SERVER/9.4.0/docker/Dockerfile.official-full
> \
> --build-arg SOLR_DOWNLOAD_SERVER=$SOLR_DOWNLOAD_SERVER \
> -t solr-rc:9.4.0-1 && \
>   docker build $SOLR_DOWNLOAD_SERVER/9.4.0/docker/Dockerfile.official-slim
> \
> --build-arg SOLR_DOWNLOAD_SERVER=$SOLR_DOWNLOAD_SERVER \
> -t solr-rc:9.4.0-1-slim
>
> The vote will be open for at least 72 hours i.e. until 2023-10-08 20:00
> UTC.
>
> [ ] +1  approve
> [ ] +0  no opinion
> [ ] -1  disapprove (and reason why)
>
> Here is my +1 (which I am not completely sure, but I think it is
> non-binding)
>
> best,
> alex
>


Re: Crave is doing well

2023-10-06 Thread Jason Gerlowski
> branch_9x not being cleaned before the tests are run?

That'd be my guess as well.  'CoresApi.java' is a SolrJ class that's
being generated from our v2 "OpenAPI Spec" (OAS).  I merged a PR a few
days ago that adds the install-core-data API to our OAS (which causes
us to generate code for it in CoresApi.java)but only in 'main'.
The code hasn't made it to branch_9x yet.

I can see both a git-clean and gradle-clean happening at various
points in the Jenkins logs, which looks sufficient at a glance.  But
evidently it's not somehow - I'm a bit stumped.

I'm traveling at Community Over Code this week and won't have tons of
time to dig in.  If no one else has any theories I can revert the
commit until I have time to dig in.  Or alternatively, we can roll
forward and backport the 'main' commit to branch_9x and fix the
compilation issue that way (though that'll leave the underlying build
weirdness unsolved).

Jason

On Fri, Oct 6, 2023 at 4:45 PM Kevin Risden  wrote:
>
> At least the last few are failing with:
>
> https://ci-builds.apache.org/job/Solr/job/Solr-Check-9.x/5616/console
>
> ```
>
> > Task :solr:solrj:compileJava
> /tmp/src/solr/solr/solrj/build/generated/src/main/java/org/apache/solr/client/solrj/request/CoresApi.java:20:
> error: cannot find symbol
> import org.apache.solr.client.api.model.InstallCoreDataRequestBody;
>^
>   symbol:   class InstallCoreDataRequestBody
>   location: package org.apache.solr.client.api.model
> /tmp/src/solr/solr/solrj/build/generated/src/main/java/org/apache/solr/client/solrj/request/CoresApi.java:43:
> error: cannot find symbol
> private final InstallCoreDataRequestBody requestBody;
>   ^
>   symbol:   class InstallCoreDataRequestBody
>   location: class InstallCoreData
> /tmp/src/solr/solr/solrj/build/generated/src/main/java/org/apache/solr/client/solrj/request/CoresApi.java:57:
> error: cannot find symbol
>   this.requestBody = new InstallCoreDataRequestBody();
>  ^
>   symbol:   class InstallCoreDataRequestBody
>   location: class InstallCoreData
> Note: Some input files use or override a deprecated API.
> Note: Recompile with -Xlint:deprecation for details.
> 3 errors
>
> ```
>
> which I think is due to branch_9x not being cleaned before the tests are
> run? not 100% sure
>
> Kevin Risden
>
>
> On Fri, Oct 6, 2023 at 3:43 PM Kevin Risden  wrote:
>
> > For PRs crave might be doing ok, but branch_9x  check builds are all
> > failing for a while now?
> >
> >
> > https://lists.apache.org/list?bui...@solr.apache.org:lte=1y:%22rsync%20error:%20some%20files/attrs%20were%20not%20transferred%22
> >
> > Kevin Risden
> >
> >
> > On Fri, Oct 6, 2023 at 3:38 PM Eric Pugh 
> > wrote:
> >
> >> Agreed on the branch merging.  It’s been great to have it running the
> >> full set of tests!
> >>
> >>
> >> > On Oct 5, 2023, at 10:58 PM, David Smiley  wrote:
> >> >
> >> > I believe the Crave issues with branch merging seem to have been
> >> fixed.  If someone sees otherwise, please let me know.
> >> >
> >> > And boy Crave is fast!  The whole GHA action takes 8m but Crave side is
> >> 6m of which 4m of it is tests running.  It's faster than "precommit" will
> >> is still running in a standard GHA.  Isn't that crazy!  Yes, there's room
> >> for improvement.
> >> >
> >> > There are opportunities for Crave to come up with a GHA self hosted
> >> runner to substantially eat away at that 2m, like a needless checkout of
> >> all the code on the GHA side that basically isn't used.
> >> >
> >> > There are opportunities for our project to try to optimize the Gradle
> >> build so that it can start running tests (or whatever task) as soon as
> >> possible no matter where it runs.  There's a whole section to the Gradle
> >> docs on build optimization.  Maybe someone would like to explore that, like
> >> trying the "configuration cache"
> >> https://docs.gradle.org/current/userguide/configuration_cache.html
> >> >
> >> > I have access to build analytics in Crave that give some insights:  The
> >> first 48 seconds is not very concurrent and not downloading anything.  The
> >> next 36 seconds it downloads 100MB of something (don't know what).  Then
> >> CPUs go full tilt with tests.  It's very apparent that Gradle testing has
> >> no "work stealing" algorithm amongst the runners.
> >> >
> >> >
> >> >
> >> > I'm a bit perplexed at the downloading of 100MB because the image for
> >> the build machine has commands I added to pre-download stuff.  That looks
> >> like the following:
> >> >
> >> > # Pre-download what we can through Gradle
> >> > ./gradlew --write-verification-metadata sha256 --dry-run
> >> > rm gradle/verification-metadata.dryrun.xml
> >> > ./gradlew -p solr/solr-ref-guide downloadAntora
> >> > ./gradlew -p solr/packaging downloadBats
> >> > # May need more memory
> >> > sed -i 's/-Xmx1g/-Xmx2g/g' gradle.properties
> >> > # Use lots of CPUs
> >> > sed -i 's/org.gradle.workers.max=.*/org.gradle.worke

Re: Community over Code Apache Solr Hackathon

2023-10-06 Thread Ishan Chattopadhyaya
Sounds great, looking forward to the zookeeper quorum experiments! Sooner
we get that to work, the closer we get to axing the standalone mode. All
the best..

On Sat, 7 Oct, 2023, 1:19 am Eric Pugh, 
wrote:

> Folks headed to Halifax….   Jason and I have talked about hacking together
> on Solr during the conference.   Well, Good News!  I’ve got us a room,
> thanks to Brian Proffitt's help, to use on Sunday (the day after the Search
> Track).  When you check the conference schedule you should see it show up.
>
> Room 107 on Sunday from 10:25-18:30.
>
> Jason G and I are planning on spiking out what it would take to fire up a
> Solr cloud node with the role “zookeeper” and see if we can build a quorum
> ;-).   Other things that would be interesting is to show folks who aren’t
> deep in Solr code how to build Solr, how to write tests.If we are super
> energetic, maybe I can talk folks like Jeff Zemerick from OpenNLP community
> to work with us in loading models into Solr via ONNX ;-).  So basically,
> anything anyone wants to work on ;-).
>
> So please come join us in Room 107 on Sunday!
>
> Eric
> ___
> Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 |
> http://www.opensourceconnections.com <
> http://www.opensourceconnections.com/> | My Free/Busy <
> http://tinyurl.com/eric-cal>
> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <
> https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>
>
> This e-mail and all contents, including attachments, is considered to be
> Company Confidential unless explicitly stated otherwise, regardless of
> whether attachments are marked as such.
>
>


Re: Issue with marking replicas down at startup

2023-10-06 Thread Mark Miller
Yes, you are correct. It doesn’t really work. Depending on the distributed
mode you are running in, it may still publish the cores as down, in one of
the modes it sends a down node cmd to the Overseer which should do it based
on what cores are in the cluster state. In that case it should still
publish as down, but in both cases it doesn’t wait for the down state state
anyway, so you can see active replicas before they are active. Before the
two modes it always published, but it never waited.