Good point Tomas; I hadn't considered that use-case.  I suppose the
behavior I suggest could be controlled with a boolean parameter flag like
"asyncDeleteStatus" true/false.  WDYT?  I'm not married to it.

BTW these async status objects stored in ZK are in fact cleaned up when
they reach 10k in number.  See SizeLimitedDistributedMap.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Wed, May 10, 2023 at 2:36 PM Tomás Fernández Löbbe <tomasflo...@gmail.com>
wrote:

> I find it very useful to keep the used async IDs regardless of the status
> for some time. For example, If you have a workflow that involves multiple
> steps such as add/remove replicas, you can just retry/restart the workflow
> and be sure Solr will reject the request if the async ID already exists
> (and your code can then handle this accordingly, for example, checking the
> status of success/failed and act accordingly) as long as you use the async
> IDs consistently.
>
> That said, async IDs do need to eventually be removed and AFAIK Solr
> doesn't do this automatically. This is a problem because of ever increasing
> objects in ZooKeeper. I think we should have some sort of task that cleans
> up async ID after some configurable amount of time.
>
> On Wed, May 10, 2023 at 1:01 AM Andras Salamon <andras.sala...@melda.info>
> wrote:
>
> > Hi,
> >
> >
> >
> > How can we be sure that the previous request status info has been already
> > processed? What about the following timeline:
> >
> >
> >
> > -Client1 sends an async request
> >
> > -Client1 reads status info, it's still running
> >
> > -Client1 reads status info, it's still running
> >
> > -Async request finishes
> >
> > -Right after that Client2 sends a new async request with the same ID, we
> > clear the async status because it's already finished
> >
> > -Client1 reads status info, but this time it will read info about the new
> > async request sent by Client2.
> >
> >
> >
> > Andras
> >
> >
> >
> >
> >
> >
> >
> >
> > ---- On Wed, 10 May 2023 05:15:40 +0200 David Smiley <dsmi...@apache.org
> >
> > wrote ---
> >
> >
> >
> > I noticed that async admin requests to Solr must have a unique asyncId or
> > else a request is rejected.  Makes sense -- maybe the request is in
> > progress.  But what if it isn't -- what if the previous request for the
> > same ID either succeeded or failed?  Shouldn't we clear the previous
> > asyncId status and let the new request go through?
> >
> > I'm imagining leveraging this uniqueness constraint in order to be an
> > additional protection measure against requests that should be done
> > atomically, like a shard split.  Yes there are already locks but this
> > additional measure will allow a fail-fast -- no enqueue of a doomed
> > message
> > to the Overseer that will ultimately never succeed any way.  Thus the
> > sender of a shard split can use an async ID like
> > "SPLIT-collectionName-shardName".  Maybe there are other parts of
> > SolrCloud
> > that could leverage this constraint to its advantage likewise.
> >
> > ~ David Smiley
> > Apache Lucene/Solr Search Developer
> > http://www.linkedin.com/in/davidwsmiley
>

Reply via email to