Re: Bugfix release Lucene/Solr 8.11.3

2024-01-29 Thread Noble Paul
What I have noticed is that CloudAuthStreamTest fails in PRS mode. So,
it passes all the time without PRS.

It needs to be fixed anyway, but I'm disabling the PRS mode for that test

On Sat, Jan 27, 2024 at 4:28 AM Christine Poerschke (BLOOMBERG/
LONDON)  wrote:
>
> Thanks! Done.
>
> From: dev@solr.apache.org At: 01/25/24 14:42:10 UTCTo:  dev@solr.apache.org
> Subject: Re: Bugfix release Lucene/Solr 8.11.3
>
> Not too late, please continue with the back port Christine’
>
> - Houston
>
> On Thu, Jan 25, 2024 at 3:58 AM Christine Poerschke (BLOOMBERG/ LONDON) <
> cpoersc...@bloomberg.net> wrote:
>
> > If it's not too late I'd like to nominate
> > https://issues.apache.org/jira/browse/SOLR-17120 via
> > https://github.com/apache/lucene-solr/pull/2683 for being in the 8.11.3
> > release too.
> >
> > - Christine
> >
> > From: dev@solr.apache.org At: 01/25/24 00:20:16 UTCTo:
> > dev@solr.apache.org
> > Subject: Re: Bugfix release Lucene/Solr 8.11.3
> >
> > > But the "CloudAuthStreamTest" test fails 100% of the time with the
> >
> > Yes.Thanks for bringing it to my attention. I'm looking into it and
> > trying to fix that
> >
> > On Thu, Jan 25, 2024 at 9:02 AM Houston Putman  wrote:
> > >
> > > I'm unsure about the hdfs issues for now..
> > >
> > > But the "CloudAuthStreamTest" test fails 100% of the time with the
> > > "F8952559841D5C83" seed for me. And it's not a test issue, its an actual
> > > bug.
> > >
> > > I've been focusing my efforts here, but it fails both with PRS enabled
> > and
> > > disabled. So the patch went beyond just adjusting how PRS is handled.
> > >
> > > Honestly at this point I'm not sure what is the cause, and I can't really
> > > put more time into it. I'll leave the rest of my findings on the JIRA.
> > >
> > > - Houston
> > >
> > > On Tue, Jan 23, 2024 at 8:53 PM Ishan Chattopadhyaya <
> > > ichattopadhy...@gmail.com> wrote:
> > >
> > > > I ran the solr/core tests few times, and here are the results:
> > > >
> > > > 1)
> > > > [junit4] Tests with failures [seed: 7A919DB0B1698A5]:
> > > > [junit4] - org.apache.solr.search.TestRecoveryHdfs (suite) [junit4]
> > > >
> > > > 2)
> > > > BUILD SUCCESSFUL
> > > > Total time: 4 minutes 31 seconds
> > > >
> > > > 3)
> > > >[junit4] Tests with failures [seed: E174107AF681900D]:
> > > >[junit4]   -
> > > >
> > > >
> > org.apache.solr.filestore.TestDistribPackageStore.testPackageStoreManagement
> > > >[junit4]   - org.apache.solr.cloud.hdfs.HdfsRecoverLeaseTest (suite)
> > > >
> > > > 4)
> > > >[junit4] Tests with failures [seed: 6135443D8851EC4F]:
> > > >[junit4]   - org.apache.solr.store.hdfs.HdfsDirectoryTest (suite)
> > > >[junit4]   - org.apache.solr.cloud.hdfs.HdfsRecoverLeaseTest (suite)
> > > >
> > > > 5)
> > > >[junit4] Tests with failures [seed: 967F9EA7B1CB4A4F]:
> > > >[junit4]   - org.apache.solr.index.hdfs.CheckHdfsIndexTest (suite)
> > > >[junit4]   - org.apache.solr.cloud.hdfs.HdfsNNFailoverTest (suite)
> > > >
> > > > 6)
> > > >[junit4] Tests with failures [seed: 22B4F78C758D19E4]:
> > > >[junit4]   -
> > > > org.apache.solr.cloud.api.collections.TestHdfsCloudBackupRestore
> > (suite)
> > > >
> > > >
> > > > I think the frequency of these HDFS failures have increased since the
> > > > Hadoop upgrade 3.2.2 -> 3.2.4 (
> > > >
> > > >
> >
> >
> https://github.com/apache/lucene-solr/commit/3cf0a5501084c9e3d0e53657a20477007f3
> > 3755a
> >
> 
> > > > ).
> > > > Any ideas, please, on how to deal with them?
> > > >
> > > > On Wed, 24 Jan 2024 at 07:12, Ishan Chattopadhyaya <
> > > > ichattopadhy...@gmail.com> wrote:
> > > >
> > > > > Looking at it, ASAP.
> > > > >
> > > > > On Wed, 24 Jan, 2024, 2:07 am Houston Putman, 
> > > > wrote:
> > > > >
> > > > >> Right now we are blocked on
> > > > >> https://issues.apache.org/jira/browse/SOLR-16580,
> > > > >> which introduced failures that pop up roughly 50% of the time or so.
> > > > >>
> > > > >> We can't really proceed until the issue is fixed, as I don't think
> > it's
> > > > >> necessarily a test issue.
> > > > >>
> > > > >> - Houston
> > > > >>
> > > > >> On Sun, Jan 21, 2024 at 10:10 PM Ishan Chattopadhyaya <
> > > > >> ichattopadhy...@gmail.com> wrote:
> > > > >>
> > > > >> > +1
> > > > >> >
> > > > >> > On Sun, 21 Jan, 2024, 8:47 am Houston Putman,  > >
> > > > >> wrote:
> > > > >> >
> > > > >> > > Now that 9.4 is out, we should plan this for next week. Ill try
> > to
> > > > get
> > > > >> > > stuff ready for a monday RC, if no one objects.
> > > > >> > >
> > > > >> > > - Houston
> > > > >> > >
> > > > >> > > On Tue, Jan 16, 2024 at 12:58 PM Houston Putman <
> > > > >> houstonput...@gmail.com
> > > > >> > >
> > > > >> > > wrote:
> > > > >> > >
> > > > >> > > > Since the 9.4.1 release candidate is out, I'm fine waiting
> > for it
> > > > to
> > > > >> > > > finish. But let's try to get 8.11.3 out very soon afterwards.
> > > > >> > > >
> > > > >> > > > Als

Re: [dev help wanted] /admin/segments handler: expose the term count

2024-01-29 Thread Christine Poerschke (BLOOMBERG/ LONDON)
Wonderful. Please feel free to directly open a pull request or draft pull 
request, or if you prefer first leave an "i'm working on this" style comment 
and/or ask any questions on the JIRA issue. Thanks!

From: us...@solr.apache.org At: 01/26/24 19:34:36 UTCTo:  us...@solr.apache.org
Cc:  dev@solr.apache.org
Subject: Re: [dev help wanted] /admin/segments handler: expose the term count

I would love to take this up.

On Fri, Jan 26, 2024 at 6:46 AM Christine Poerschke (BLOOMBERG/ LONDON) <
cpoersc...@bloomberg.net> wrote:

> Hi Everyone,
>
> Have you used or are you curious about the segments info handler and/or
> screen?
> 
https://solr.apache.org/guide/solr/latest/configuration-guide/index-segments-mer
ging.html#segments-info-screen
>
> If so then would you be interested in contributing to the
> https://issues.apache.org/jira/browse/SOLR-17038 issue?
>
> Thanks,
> Christine
>
>




Re: Bugfix release Lucene/Solr 8.11.3

2024-01-29 Thread Houston Putman
When I run this:

ant test  -Dtestcase=CloudAuthStreamTest -Dtests.seed=F8952559841D5C83
> -Dtests.multiplier=2 -Dtests.slow=true -Dtests.locale=sk
> -Dtests.timezone=Atlantic/Bermuda -Dtests.asserts=true
> -Dtests.file.encoding=UTF-8 -Duse.perreplica=false


It fails. Doesn't the "-Duse.perreplica=false" tell it to not use PRS?

- Houston

On Mon, Jan 29, 2024 at 5:25 AM Noble Paul  wrote:

> What I have noticed is that CloudAuthStreamTest fails in PRS mode. So,
> it passes all the time without PRS.
>
> It needs to be fixed anyway, but I'm disabling the PRS mode for that test
>
> On Sat, Jan 27, 2024 at 4:28 AM Christine Poerschke (BLOOMBERG/
> LONDON)  wrote:
> >
> > Thanks! Done.
> >
> > From: dev@solr.apache.org At: 01/25/24 14:42:10 UTCTo:
> dev@solr.apache.org
> > Subject: Re: Bugfix release Lucene/Solr 8.11.3
> >
> > Not too late, please continue with the back port Christine’
> >
> > - Houston
> >
> > On Thu, Jan 25, 2024 at 3:58 AM Christine Poerschke (BLOOMBERG/ LONDON) <
> > cpoersc...@bloomberg.net> wrote:
> >
> > > If it's not too late I'd like to nominate
> > > https://issues.apache.org/jira/browse/SOLR-17120 via
> > > https://github.com/apache/lucene-solr/pull/2683 for being in the
> 8.11.3
> > > release too.
> > >
> > > - Christine
> > >
> > > From: dev@solr.apache.org At: 01/25/24 00:20:16 UTCTo:
> > > dev@solr.apache.org
> > > Subject: Re: Bugfix release Lucene/Solr 8.11.3
> > >
> > > > But the "CloudAuthStreamTest" test fails 100% of the time with the
> > >
> > > Yes.Thanks for bringing it to my attention. I'm looking into it and
> > > trying to fix that
> > >
> > > On Thu, Jan 25, 2024 at 9:02 AM Houston Putman 
> wrote:
> > > >
> > > > I'm unsure about the hdfs issues for now..
> > > >
> > > > But the "CloudAuthStreamTest" test fails 100% of the time with the
> > > > "F8952559841D5C83" seed for me. And it's not a test issue, its an
> actual
> > > > bug.
> > > >
> > > > I've been focusing my efforts here, but it fails both with PRS
> enabled
> > > and
> > > > disabled. So the patch went beyond just adjusting how PRS is handled.
> > > >
> > > > Honestly at this point I'm not sure what is the cause, and I can't
> really
> > > > put more time into it. I'll leave the rest of my findings on the
> JIRA.
> > > >
> > > > - Houston
> > > >
> > > > On Tue, Jan 23, 2024 at 8:53 PM Ishan Chattopadhyaya <
> > > > ichattopadhy...@gmail.com> wrote:
> > > >
> > > > > I ran the solr/core tests few times, and here are the results:
> > > > >
> > > > > 1)
> > > > > [junit4] Tests with failures [seed: 7A919DB0B1698A5]:
> > > > > [junit4] - org.apache.solr.search.TestRecoveryHdfs (suite) [junit4]
> > > > >
> > > > > 2)
> > > > > BUILD SUCCESSFUL
> > > > > Total time: 4 minutes 31 seconds
> > > > >
> > > > > 3)
> > > > >[junit4] Tests with failures [seed: E174107AF681900D]:
> > > > >[junit4]   -
> > > > >
> > > > >
> > >
> org.apache.solr.filestore.TestDistribPackageStore.testPackageStoreManagement
> > > > >[junit4]   - org.apache.solr.cloud.hdfs.HdfsRecoverLeaseTest
> (suite)
> > > > >
> > > > > 4)
> > > > >[junit4] Tests with failures [seed: 6135443D8851EC4F]:
> > > > >[junit4]   - org.apache.solr.store.hdfs.HdfsDirectoryTest
> (suite)
> > > > >[junit4]   - org.apache.solr.cloud.hdfs.HdfsRecoverLeaseTest
> (suite)
> > > > >
> > > > > 5)
> > > > >[junit4] Tests with failures [seed: 967F9EA7B1CB4A4F]:
> > > > >[junit4]   - org.apache.solr.index.hdfs.CheckHdfsIndexTest
> (suite)
> > > > >[junit4]   - org.apache.solr.cloud.hdfs.HdfsNNFailoverTest
> (suite)
> > > > >
> > > > > 6)
> > > > >[junit4] Tests with failures [seed: 22B4F78C758D19E4]:
> > > > >[junit4]   -
> > > > > org.apache.solr.cloud.api.collections.TestHdfsCloudBackupRestore
> > > (suite)
> > > > >
> > > > >
> > > > > I think the frequency of these HDFS failures have increased since
> the
> > > > > Hadoop upgrade 3.2.2 -> 3.2.4 (
> > > > >
> > > > >
> > >
> > >
> >
> https://github.com/apache/lucene-solr/commit/3cf0a5501084c9e3d0e53657a20477007f3
> > > 3755a
> > >
> > <
> https://github.com/apache/lucene-solr/commit/3cf0a5501084c9e3d0e53657a20477007f33755a
> >
> > > > > ).
> > > > > Any ideas, please, on how to deal with them?
> > > > >
> > > > > On Wed, 24 Jan 2024 at 07:12, Ishan Chattopadhyaya <
> > > > > ichattopadhy...@gmail.com> wrote:
> > > > >
> > > > > > Looking at it, ASAP.
> > > > > >
> > > > > > On Wed, 24 Jan, 2024, 2:07 am Houston Putman, <
> hous...@apache.org>
> > > > > wrote:
> > > > > >
> > > > > >> Right now we are blocked on
> > > > > >> https://issues.apache.org/jira/browse/SOLR-16580,
> > > > > >> which introduced failures that pop up roughly 50% of the time
> or so.
> > > > > >>
> > > > > >> We can't really proceed until the issue is fixed, as I don't
> think
> > > it's
> > > > > >> necessarily a test issue.
> > > > > >>
> > > > > >> - Houston
> > > > > >>
> > > > > >> On Sun, Jan 21, 2024 at 10:10 PM Ishan Chattopadhyaya <
> > > > > >> ichattopadhy...@gmail.com> wrote:
> > > > > >>
>

Re: Bugfix release Lucene/Solr 8.11.3

2024-01-29 Thread Noble Paul
I beasted this test a few 100 times and it was not failing. I shall
try this again (with the seed)

On Tue, Jan 30, 2024 at 4:08 AM Houston Putman  wrote:
>
> When I run this:
>
> ant test  -Dtestcase=CloudAuthStreamTest -Dtests.seed=F8952559841D5C83
> > -Dtests.multiplier=2 -Dtests.slow=true -Dtests.locale=sk
> > -Dtests.timezone=Atlantic/Bermuda -Dtests.asserts=true
> > -Dtests.file.encoding=UTF-8 -Duse.perreplica=false
>
>
> It fails. Doesn't the "-Duse.perreplica=false" tell it to not use PRS?
>
> - Houston
>
> On Mon, Jan 29, 2024 at 5:25 AM Noble Paul  wrote:
>
> > What I have noticed is that CloudAuthStreamTest fails in PRS mode. So,
> > it passes all the time without PRS.
> >
> > It needs to be fixed anyway, but I'm disabling the PRS mode for that test
> >
> > On Sat, Jan 27, 2024 at 4:28 AM Christine Poerschke (BLOOMBERG/
> > LONDON)  wrote:
> > >
> > > Thanks! Done.
> > >
> > > From: dev@solr.apache.org At: 01/25/24 14:42:10 UTCTo:
> > dev@solr.apache.org
> > > Subject: Re: Bugfix release Lucene/Solr 8.11.3
> > >
> > > Not too late, please continue with the back port Christine’
> > >
> > > - Houston
> > >
> > > On Thu, Jan 25, 2024 at 3:58 AM Christine Poerschke (BLOOMBERG/ LONDON) <
> > > cpoersc...@bloomberg.net> wrote:
> > >
> > > > If it's not too late I'd like to nominate
> > > > https://issues.apache.org/jira/browse/SOLR-17120 via
> > > > https://github.com/apache/lucene-solr/pull/2683 for being in the
> > 8.11.3
> > > > release too.
> > > >
> > > > - Christine
> > > >
> > > > From: dev@solr.apache.org At: 01/25/24 00:20:16 UTCTo:
> > > > dev@solr.apache.org
> > > > Subject: Re: Bugfix release Lucene/Solr 8.11.3
> > > >
> > > > > But the "CloudAuthStreamTest" test fails 100% of the time with the
> > > >
> > > > Yes.Thanks for bringing it to my attention. I'm looking into it and
> > > > trying to fix that
> > > >
> > > > On Thu, Jan 25, 2024 at 9:02 AM Houston Putman 
> > wrote:
> > > > >
> > > > > I'm unsure about the hdfs issues for now..
> > > > >
> > > > > But the "CloudAuthStreamTest" test fails 100% of the time with the
> > > > > "F8952559841D5C83" seed for me. And it's not a test issue, its an
> > actual
> > > > > bug.
> > > > >
> > > > > I've been focusing my efforts here, but it fails both with PRS
> > enabled
> > > > and
> > > > > disabled. So the patch went beyond just adjusting how PRS is handled.
> > > > >
> > > > > Honestly at this point I'm not sure what is the cause, and I can't
> > really
> > > > > put more time into it. I'll leave the rest of my findings on the
> > JIRA.
> > > > >
> > > > > - Houston
> > > > >
> > > > > On Tue, Jan 23, 2024 at 8:53 PM Ishan Chattopadhyaya <
> > > > > ichattopadhy...@gmail.com> wrote:
> > > > >
> > > > > > I ran the solr/core tests few times, and here are the results:
> > > > > >
> > > > > > 1)
> > > > > > [junit4] Tests with failures [seed: 7A919DB0B1698A5]:
> > > > > > [junit4] - org.apache.solr.search.TestRecoveryHdfs (suite) [junit4]
> > > > > >
> > > > > > 2)
> > > > > > BUILD SUCCESSFUL
> > > > > > Total time: 4 minutes 31 seconds
> > > > > >
> > > > > > 3)
> > > > > >[junit4] Tests with failures [seed: E174107AF681900D]:
> > > > > >[junit4]   -
> > > > > >
> > > > > >
> > > >
> > org.apache.solr.filestore.TestDistribPackageStore.testPackageStoreManagement
> > > > > >[junit4]   - org.apache.solr.cloud.hdfs.HdfsRecoverLeaseTest
> > (suite)
> > > > > >
> > > > > > 4)
> > > > > >[junit4] Tests with failures [seed: 6135443D8851EC4F]:
> > > > > >[junit4]   - org.apache.solr.store.hdfs.HdfsDirectoryTest
> > (suite)
> > > > > >[junit4]   - org.apache.solr.cloud.hdfs.HdfsRecoverLeaseTest
> > (suite)
> > > > > >
> > > > > > 5)
> > > > > >[junit4] Tests with failures [seed: 967F9EA7B1CB4A4F]:
> > > > > >[junit4]   - org.apache.solr.index.hdfs.CheckHdfsIndexTest
> > (suite)
> > > > > >[junit4]   - org.apache.solr.cloud.hdfs.HdfsNNFailoverTest
> > (suite)
> > > > > >
> > > > > > 6)
> > > > > >[junit4] Tests with failures [seed: 22B4F78C758D19E4]:
> > > > > >[junit4]   -
> > > > > > org.apache.solr.cloud.api.collections.TestHdfsCloudBackupRestore
> > > > (suite)
> > > > > >
> > > > > >
> > > > > > I think the frequency of these HDFS failures have increased since
> > the
> > > > > > Hadoop upgrade 3.2.2 -> 3.2.4 (
> > > > > >
> > > > > >
> > > >
> > > >
> > >
> > https://github.com/apache/lucene-solr/commit/3cf0a5501084c9e3d0e53657a20477007f3
> > > > 3755a
> > > >
> > > <
> > https://github.com/apache/lucene-solr/commit/3cf0a5501084c9e3d0e53657a20477007f33755a
> > >
> > > > > > ).
> > > > > > Any ideas, please, on how to deal with them?
> > > > > >
> > > > > > On Wed, 24 Jan 2024 at 07:12, Ishan Chattopadhyaya <
> > > > > > ichattopadhy...@gmail.com> wrote:
> > > > > >
> > > > > > > Looking at it, ASAP.
> > > > > > >
> > > > > > > On Wed, 24 Jan, 2024, 2:07 am Houston Putman, <
> > hous...@apache.org>
> > > > > > wrote:
> > > > > > >
> > > > > > >> Right now we are blocked on
> > > > > > >> https://issues.

Re: Collections LIST semantics

2024-01-29 Thread Jason Gerlowski
Thanks for calling this out more explicitly; definitelyf worth discussing.

> If a client/caller/user lists collections and then loops them to take
some action on them, it needs to be tolerant of the collection not working;
may seem to not exist.

I'd go even a step further and say that users should always have
error-handling around their calls to Solr.

But even so I'm leery of changing the semantics here.  I think the
assumption of most folks is that each entry returned by a "list" exists
fully, unless the response gives more granular info to augment that.  I'd
worry that returning partially-created or partially-deleted collections
would be confusing and unintuitive to most users.  (e.g. Imagine iterating
over a "list", getting a not-found error running some operation on one of
the entries, but still seeing the collection when you call "list" again to
double-check.)

I understand the need for a more scalable API, or a way to detect orphaned
data in ZK.  But I'd personally rather not see us change the LIST semantics
to accomplish that.  If you need the ZK child nodes, is there maybe a
scalable way to invoke ZookeeperInfoHandler to get that information?

Best,

Jason

On Fri, Jan 26, 2024 at 2:46 PM David Smiley  wrote:

> https://issues.apache.org/jira/browse/SOLR-16909
> > Collections LIST command should fetch ZK data, not cached state
>
> I want to get further input from folks that changing the semantics is
> okay.  If the change is applied, LIST will be much faster but it will
> return collections that have not yet been fully constructed or
> deleted.  If a client/caller/user lists collections and then loops
> them to take some action on them, it needs to be tolerant of the
> collection not working; may seem to not exist.  I argue callers should
> *already* behave in this way or it may be brittle to circumstances
> that are hard to reason about.  On the other hand, maybe this would
> increase the frequency of errors to existing clients that didn't
> encounter this in testing?  Shrug.  I could imagine ways to solve this
> but it would add some complexity and it's not clear it's worthwhile.
>
> A related aside: the method ClusterStatus.getCollectionsMap is not
> scalable for clusters with 10K+ collections because it loops every
> collection to fetch the latest stake from ZK, putting a massive load
> on ZK.  Our implementation of collection listing calls it, as does a
> number of places across Solr.  Some could be changed with relative
> ease; some are more thorny.  I'd love to rename this thing, putting
> "slow" in the name so that you think twice before calling it :-)
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org
> For additional commands, e-mail: dev-h...@solr.apache.org
>
>


Re: Collections LIST semantics

2024-01-29 Thread Walter Underwood
If a program gets a list from a remote server, then expects that list to be 
accurate when they make calls based on it, well, my kindest thought is 
“charmingly naive”. Really, that is just bad code that hasn’t broken yet.

That is true even if it gets a list from Zookeeper. Things change while you 
aren’t looking at them.

Solr could make that happen less often or more often, but it will happen.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Jan 29, 2024, at 10:42 AM, Jason Gerlowski  wrote:
> 
> Thanks for calling this out more explicitly; definitelyf worth discussing.
> 
>> If a client/caller/user lists collections and then loops them to take
> some action on them, it needs to be tolerant of the collection not working;
> may seem to not exist.
> 
> I'd go even a step further and say that users should always have
> error-handling around their calls to Solr.
> 
> But even so I'm leery of changing the semantics here.  I think the
> assumption of most folks is that each entry returned by a "list" exists
> fully, unless the response gives more granular info to augment that.  I'd
> worry that returning partially-created or partially-deleted collections
> would be confusing and unintuitive to most users.  (e.g. Imagine iterating
> over a "list", getting a not-found error running some operation on one of
> the entries, but still seeing the collection when you call "list" again to
> double-check.)
> 
> I understand the need for a more scalable API, or a way to detect orphaned
> data in ZK.  But I'd personally rather not see us change the LIST semantics
> to accomplish that.  If you need the ZK child nodes, is there maybe a
> scalable way to invoke ZookeeperInfoHandler to get that information?
> 
> Best,
> 
> Jason
> 
> On Fri, Jan 26, 2024 at 2:46 PM David Smiley  wrote:
> 
>> https://issues.apache.org/jira/browse/SOLR-16909
>>> Collections LIST command should fetch ZK data, not cached state
>> 
>> I want to get further input from folks that changing the semantics is
>> okay.  If the change is applied, LIST will be much faster but it will
>> return collections that have not yet been fully constructed or
>> deleted.  If a client/caller/user lists collections and then loops
>> them to take some action on them, it needs to be tolerant of the
>> collection not working; may seem to not exist.  I argue callers should
>> *already* behave in this way or it may be brittle to circumstances
>> that are hard to reason about.  On the other hand, maybe this would
>> increase the frequency of errors to existing clients that didn't
>> encounter this in testing?  Shrug.  I could imagine ways to solve this
>> but it would add some complexity and it's not clear it's worthwhile.
>> 
>> A related aside: the method ClusterStatus.getCollectionsMap is not
>> scalable for clusters with 10K+ collections because it loops every
>> collection to fetch the latest stake from ZK, putting a massive load
>> on ZK.  Our implementation of collection listing calls it, as does a
>> number of places across Solr.  Some could be changed with relative
>> ease; some are more thorny.  I'd love to rename this thing, putting
>> "slow" in the name so that you think twice before calling it :-)
>> 
>> ~ David Smiley
>> Apache Lucene/Solr Search Developer
>> http://www.linkedin.com/in/davidwsmiley
>> 
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org
>> For additional commands, e-mail: dev-h...@solr.apache.org
>> 
>> 



Re: Collections LIST semantics

2024-01-29 Thread David Smiley
Yeah, I'm sympathetic to that viewpoint.  I was coming at this from
Walter's -- clients must be tolerant always.  This mindset is
important when working on scalable distributed systems.  But depending
on clients being so tolerant leads to being less friendly --
increasing the likelihood that they will have to deal with such
errors.  Solr might even appear buggy to such a client/user.  Shrug.

At work we've got this modification to add listAll to collection
listing (thus can toggle the semantics) but for scalability reasons,
we're finding we want this enabled everywhere, which begs the question
if it should simply work this way to begin with.  I'm also motivated
to contribute to Solr without adding complexity -- arguably listing
collections shouldn't need any parameters.  But we could contribute it
this way; okay?  And maybe make listAll's default be a system property
so you can run Solr in this way.

On Mon, Jan 29, 2024 at 1:42 PM Jason Gerlowski  wrote:
>
> Thanks for calling this out more explicitly; definitelyf worth discussing.
>
> > If a client/caller/user lists collections and then loops them to take
> some action on them, it needs to be tolerant of the collection not working;
> may seem to not exist.
>
> I'd go even a step further and say that users should always have
> error-handling around their calls to Solr.
>
> But even so I'm leery of changing the semantics here.  I think the
> assumption of most folks is that each entry returned by a "list" exists
> fully, unless the response gives more granular info to augment that.  I'd
> worry that returning partially-created or partially-deleted collections
> would be confusing and unintuitive to most users.  (e.g. Imagine iterating
> over a "list", getting a not-found error running some operation on one of
> the entries, but still seeing the collection when you call "list" again to
> double-check.)
>
> I understand the need for a more scalable API, or a way to detect orphaned
> data in ZK.  But I'd personally rather not see us change the LIST semantics
> to accomplish that.  If you need the ZK child nodes, is there maybe a
> scalable way to invoke ZookeeperInfoHandler to get that information?
>
> Best,
>
> Jason
>
> On Fri, Jan 26, 2024 at 2:46 PM David Smiley  wrote:
>
> > https://issues.apache.org/jira/browse/SOLR-16909
> > > Collections LIST command should fetch ZK data, not cached state
> >
> > I want to get further input from folks that changing the semantics is
> > okay.  If the change is applied, LIST will be much faster but it will
> > return collections that have not yet been fully constructed or
> > deleted.  If a client/caller/user lists collections and then loops
> > them to take some action on them, it needs to be tolerant of the
> > collection not working; may seem to not exist.  I argue callers should
> > *already* behave in this way or it may be brittle to circumstances
> > that are hard to reason about.  On the other hand, maybe this would
> > increase the frequency of errors to existing clients that didn't
> > encounter this in testing?  Shrug.  I could imagine ways to solve this
> > but it would add some complexity and it's not clear it's worthwhile.
> >
> > A related aside: the method ClusterStatus.getCollectionsMap is not
> > scalable for clusters with 10K+ collections because it loops every
> > collection to fetch the latest stake from ZK, putting a massive load
> > on ZK.  Our implementation of collection listing calls it, as does a
> > number of places across Solr.  Some could be changed with relative
> > ease; some are more thorny.  I'd love to rename this thing, putting
> > "slow" in the name so that you think twice before calling it :-)
> >
> > ~ David Smiley
> > Apache Lucene/Solr Search Developer
> > http://www.linkedin.com/in/davidwsmiley
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org
> > For additional commands, e-mail: dev-h...@solr.apache.org
> >
> >

-
To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org
For additional commands, e-mail: dev-h...@solr.apache.org



Re: New branch and feature freeze for Solr 9.5.0

2024-01-29 Thread Eric Pugh
Jason, I tackled SOLR-17068 (the one you reminded me of) and I’d love to get it 
into 9.5 since right now we have a terrible mish mash of bin/solr post and 
bin/post in the ref guide and docs.

Could someone review https://github.com/apache/solr/pull/2227 and if it looks 
good could we sneak it into 9.5?   

Eric


> On Jan 26, 2024, at 6:29 PM, Eric Pugh  
> wrote:
> 
> Backport to branch_9_5 is done.
> 
> 
>> On Jan 26, 2024, at 1:11 PM, Jason Gerlowski  wrote:
>> 
>> Go ahead and backport on your own!  I'm still waiting on Lucene 9.9.2, so
>> there shouldn't be any branch-contention on my end.
>> 
>> Relatedly, Lucene has their RC1 out there and things look good a day or two
>> into their VOTE, so with any luck we'll be able to get a Solr 9.5 RC
>> together early next week!
>> 
>> Best,
>> 
>> Jason
>> 
>> On Fri, Jan 26, 2024 at 8:49 AM Eric Pugh  wrote:
>> 
>>> I am about to merge SOLR-17112, and will backport it to branch_9x.  Jason,
>>> do you backport it over to the branch_9_5 or do I?
>>> 
>>> ERic
>>> 
>>> 
>>> On 2024/01/23 19:08:10 Jason Gerlowski wrote:
> It was hoped SOLR-17112 would make 9.4.1 but it didn't as no PR was
 proposed.
 
> SOLR-17120 [is] nominated for inclusion in the 9.5.0 release
 
 Those both sound quick and reasonable; they've got a +1 from me to go
>>> into
 9.5 (assuming the contributor decides to continue with SOLR-17112).
 
> Considering Lucene 9.9.2 is being planned, I think it would be better
>>> to
> upgrade Solr to the to-be-released version so users have to deal with
> fewer upgrade cycles.
 
 Yeah, that might be best; I hadn't realized we weren't already on the
 latest Lucene 9.x.  I've created SOLR-17128 to track our Lucene upgrade
 once 9.9.2 is available.
 
 Obviously this is a longer delay than some of the tickets above, and will
 mean we won't be cutting a Solr RC this week.  We can pick a new date for
 the initial Solr 9.5 RC once Lucene 9.9.2 is available.
 
 Best,
 
 Jason
 
 On Tue, Jan 23, 2024 at 1:37 PM Anshum Gupta 
>>> wrote:
 
> Considering Lucene 9.9.2 is being planned, I think it would be better
>>> to
> upgrade Solr to the to-be-released version so users have to deal with
>>> fewer
> upgrade cycles.
> 
> To highlight, there are about 90 odd changes in the Lucene 9.9.x line.
> 
> -Anshum
> 
> On Tue, Jan 23, 2024 at 8:47 AM David Smiley 
>>> wrote:
> 
>> FYI It was hoped SOLR-17112
>> https://issues.apache.org/jira/browse/SOLR-17112 "bin/solr script
>> doesn't do ps properly on some systems" would make 9.4.1 but it
>>> didn't
>> as no PR was proposed.  There still isn't one but a contributor is
>> thinking about it.
>> 
>> On Tue, Jan 23, 2024 at 11:30 AM Christine Poerschke (BLOOMBERG/
>> LONDON)  wrote:
>>> 
>>> Just to cross-reference things further (Jason is already aware) --
>> https://issues.apache.org/jira/browse/SOLR-17120 and
>> https://github.com/apache/solr/pull/2214 are nominated for
>>> inclusion in
>> the 9.5 release, and as always additional reviews and inputs are
>>> welcome.
>>> 
>>> Regards,
>>> Christine
>>> 
>>> From: dev@solr.apache.org At: 01/22/24 17:30:35 UTCTo:
>> dev@solr.apache.org
>>> Subject: New branch and feature freeze for Solr 9.5.0
>>> 
>>> NOTICE:
>>> 
>>> Branch branch_9_5 has been cut and versions updated to 9.6 on the
> stable
>>> branch.
>>> 
>>> Please observe the normal rules:
>>> 
>>> * No new features may be committed to the branch.
>>> * Documentation patches, build patches and serious bug fixes may be
>>>  committed to the branch. However, you should submit all patches
>>> you
>>>  want to commit to Jira first to give others the chance to review
>>>  and possibly vote against the patch. Keep in mind that it is our
>>>  main intention to keep the branch as stable as possible.
>>> * All patches that are intended for the branch should first be
> committed
>>>  to the unstable branch, merged into the stable branch, and then
>>> into
>>>  the current release branch.
>>> * Normal unstable and stable branch development may continue as
>>> usual.
>>>  However, if you plan to commit a big change to the unstable
>>> branch
>>>  while the branch feature freeze is in effect, think twice: can't
>>> the
>>>  addition wait a couple more days? Merges of bug fixes into the
>>> branch
>>>  may become more difficult.
>>> * Only Jira issues with Fix version 9.5 and priority "Blocker" will
> delay
>>>  a release candidate build.
>>> 
>>> The feature-freeze for the 9.5 release will go till the end of this
> week
>> -
>>> I'll aim to create our first RC on Thursday, January 25th.
>>> 
>>> Best,
>>> 
>>> Jason
>>> 
>>> 
>> 
>> 

Re: Collections LIST semantics

2024-01-29 Thread Jason Gerlowski
I agree with every point about the delays inherent in a distributed system,
and how any "list" call should be treated by clients as point-in-time.  And
I agree that the impact _should_ be minimal since diligent clients should
have error handling in these cases anyways.

But it still feels off to me to have a "list" op output something that's
potentially incorrect even in the point-in-time it's produced.

Not a -1 or a veto, just my 2c.  If it's an outlier opinion, please ignore
it.

Best,

Jason

On Mon, Jan 29, 2024 at 2:23 PM David Smiley  wrote:

> Yeah, I'm sympathetic to that viewpoint.  I was coming at this from
> Walter's -- clients must be tolerant always.  This mindset is
> important when working on scalable distributed systems.  But depending
> on clients being so tolerant leads to being less friendly --
> increasing the likelihood that they will have to deal with such
> errors.  Solr might even appear buggy to such a client/user.  Shrug.
>
> At work we've got this modification to add listAll to collection
> listing (thus can toggle the semantics) but for scalability reasons,
> we're finding we want this enabled everywhere, which begs the question
> if it should simply work this way to begin with.  I'm also motivated
> to contribute to Solr without adding complexity -- arguably listing
> collections shouldn't need any parameters.  But we could contribute it
> this way; okay?  And maybe make listAll's default be a system property
> so you can run Solr in this way.
>
> On Mon, Jan 29, 2024 at 1:42 PM Jason Gerlowski 
> wrote:
> >
> > Thanks for calling this out more explicitly; definitelyf worth
> discussing.
> >
> > > If a client/caller/user lists collections and then loops them to take
> > some action on them, it needs to be tolerant of the collection not
> working;
> > may seem to not exist.
> >
> > I'd go even a step further and say that users should always have
> > error-handling around their calls to Solr.
> >
> > But even so I'm leery of changing the semantics here.  I think the
> > assumption of most folks is that each entry returned by a "list" exists
> > fully, unless the response gives more granular info to augment that.  I'd
> > worry that returning partially-created or partially-deleted collections
> > would be confusing and unintuitive to most users.  (e.g. Imagine
> iterating
> > over a "list", getting a not-found error running some operation on one of
> > the entries, but still seeing the collection when you call "list" again
> to
> > double-check.)
> >
> > I understand the need for a more scalable API, or a way to detect
> orphaned
> > data in ZK.  But I'd personally rather not see us change the LIST
> semantics
> > to accomplish that.  If you need the ZK child nodes, is there maybe a
> > scalable way to invoke ZookeeperInfoHandler to get that information?
> >
> > Best,
> >
> > Jason
> >
> > On Fri, Jan 26, 2024 at 2:46 PM David Smiley  wrote:
> >
> > > https://issues.apache.org/jira/browse/SOLR-16909
> > > > Collections LIST command should fetch ZK data, not cached state
> > >
> > > I want to get further input from folks that changing the semantics is
> > > okay.  If the change is applied, LIST will be much faster but it will
> > > return collections that have not yet been fully constructed or
> > > deleted.  If a client/caller/user lists collections and then loops
> > > them to take some action on them, it needs to be tolerant of the
> > > collection not working; may seem to not exist.  I argue callers should
> > > *already* behave in this way or it may be brittle to circumstances
> > > that are hard to reason about.  On the other hand, maybe this would
> > > increase the frequency of errors to existing clients that didn't
> > > encounter this in testing?  Shrug.  I could imagine ways to solve this
> > > but it would add some complexity and it's not clear it's worthwhile.
> > >
> > > A related aside: the method ClusterStatus.getCollectionsMap is not
> > > scalable for clusters with 10K+ collections because it loops every
> > > collection to fetch the latest stake from ZK, putting a massive load
> > > on ZK.  Our implementation of collection listing calls it, as does a
> > > number of places across Solr.  Some could be changed with relative
> > > ease; some are more thorny.  I'd love to rename this thing, putting
> > > "slow" in the name so that you think twice before calling it :-)
> > >
> > > ~ David Smiley
> > > Apache Lucene/Solr Search Developer
> > > http://www.linkedin.com/in/davidwsmiley
> > >
> > > -
> > > To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org
> > > For additional commands, e-mail: dev-h...@solr.apache.org
> > >
> > >
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org
> For additional commands, e-mail: dev-h...@solr.apache.org
>
>