[google-appengine] Re: Idempotence & multiple task execution

hawkett Fri, 28 May 2010 16:50:21 -0700

Just my weekly bump on this thread. The advice from google appears to
be to trust that tasks with the same id cannot be running
concurrently. However, there are clear edge scenarios documented in
this thread that are not accounted for. It would be a pity if people
made architectural decisions based on the advice from google, and
discovered down the track that their data was corrupted as a result of
the occasional concurrent execution of the same task id. Are the edge
cases handled, and tasks *never* run concurrently, or is it only the
case that they don't run concurrently 'under normal conditions'?  If
there could ever be concurrent execution then it is a whole different
architectural scenario. Can it happen or not? By all means, if the
answer is that task queue is an experimental feature, 'anything's
possible', that would be better than tumbleweed, and infinitely better
than advising that concurrent execution cannot occur, when in fact
you're not sure that's true. Thanks,


Colin

On May 22, 9:46 am, hawkett <[email protected]> wrote:
> Apologies for repeatedly bumping this thread, but the advice seems to
> be that the same task-id *cannot* execute concurrently (100%
> guaranteed), but no response asserting this has addressed the failure
> scenario I've raised, where it would appear that the same task *may*
> execute concurrently unless app engine has implemented something
> specifically to prevent it occurring.  I know the task queue is very
> reliable, but not 100% so 
> -http://groups.google.com/group/google-appengine/browse_thread/thread/....
>
> So - in the scenario where the HTTP client (i.e. the task queue) drops
> the HTTP connection in an initial task execution - how does app engine
> prevent the recovery mechanism from executing the task a second time
> while the first is still running?
>
> The possibility of the same task running concurrently has significant
> architectural implications for my app.  Does app engine handle the
> scenario I've outlined and prevent concurrent execution of the same
> task-id?
>
> Thanks for the clarification,
>
> Colin
>
> On May 13, 5:35 pm, "Ikai L (Google)" <[email protected]> wrote:
>
>
>
> > The same task should not be executed multiple times concurrently. If it
> > fails, we will retry it in the future (could be back to back, but this is
> > not guaranteed).
>
> > Are you seeing evidence of the contrary?
>
> > On Wed, May 12, 2010 at 12:49 PM, hawkett <[email protected]> wrote:
> > > Bump - still not clear whether the same task can be executing multiple
> > > times concurrently? I noticed that failed tasks seem to back off for
> > > significantly longer recently - perhaps this has helped the situation?
> > > Appreciate any clarification - cheers,
>
> > > Colin
>
> > > On May 1, 1:08 am, hawkett <[email protected]> wrote:
> > > > My use case is as follows -
>
> > > > 1. tasks which do not support idempotence inherently (such as deletes,
> > > > and some puts) carry a unique identifier, which is written as a
> > > > receipt in an attribute of an entity that is updated in the
> > > > transaction.
> > > > 2. When a task arrives carrying a receipt, I check that it does not
> > > > already exist - so receipted tasks incur an additional, key only, db
> > > > read
>
> > > > This is essentially my algorithm for ensuring idempotence (in
> > > > situations where it is not inherent) - ignore subsequent executions.
>
> > > > If the same task *cannot* be running in parallel, then the check for
> > > > the receipt can be done outside the transaction that writes the
> > > > receipt - which has a couple of advantages -
>
> > > > a. It can be done up front in the task handler, so I don't have to go
> > > > all the way through to the transactional write before discovering it
> > > > already executed
> > > > b. More importantly, I can reduce the work done inside the transaction
> > > > - every extra millisecond spent in the transaction locks the entity
> > > > group, and at scale, those milliseconds can add up - especially on
> > > > entity groups that are somewhat write intensive.
>
> > > > If the same task *can* be running in parallel, then I need to do the
> > > > receipt read inside the transaction that writes it. It would be a pity
> > > > to do that extra work in every transaction for a very rare scenario.
>
> > > > As stated earlier, it seems that it might be possible for GAE to
> > > > guarantee that it does not execute the same task in parallel - by
> > > > ensuring that, for error scenarios like those above (408, client
> > > > crash, perhaps others), the 2nd execution waits 30 seconds.  That has
> > > > some obvious downsides, but given how rarely it occurs, and given that
> > > > an app shouldn't be relying on the speed with which a task is
> > > > executed, it seems like a reasonable trade-off to get a reduction in
> > > > transactional work for the vast majority of the time - less
> > > > contention, less CPU, less datastore activity.
>
> > > > A simple example is a task which increments a counter - we don't want
> > > > to increment the counter twice.
>
> > > > The problem is the same whether one or many entities are being updated
> > > > during handling of the task.
>
> > > > Do you have many situations where you perform a read that does not
> > > > result in some sort of update - db update, another task raised, email
> > > > sent, external system notified etc.?  There's a subset of most of
> > > > these that we want to avoid doing twice. It's the multiple writes,
> > > > rather than multiple reads causing issues.
>
> > > > Anyone from google able to end the speculation? :)
>
> > > > On Apr 30, 2:31 am, Eli Jones <[email protected]> wrote:
>
> > > > > In my opinion, the case you are asking about is pretty much the reason
> > > they
> > > > > state that tasks must be idempotent.. even with named tasks.
>
> > > > > They cannot guarantee 100% that some transient error will not occur
> > > when a
> > > > > scheduled task is executed (even if you are naming tasks and are
> > > guaranteed
> > > > > 100% that your task will not be added to the queue more than once).
>
> > > > > So, it is possible to have more than one version of the "same" task
> > > > > executing at the same time.  You just need to construct your tasks so
> > > they
> > > > > aren't doing too much at once (e.g. reading some data, then updating 
> > > > > or
> > > > > inserting.. then reading other data... and updating some more), or you
> > > need
> > > > > to make sure to do all that inside a big transaction.. and, even then,
> > > you
> > > > > still need to ensure idempotence.
>
> > > > > I sort of prefer a poor man's version of idempotence for my chained
> > > tasks.
> > > > >  Mainly, if the "same" task runs more than once.. each version will
> > > have a
> > > > > potentially different result, but I am perfectly happy getting the
> > > result
> > > > > from the task that ran last.  But, I can easily accept this since my
> > > tasks
> > > > > are not doing multiple updates at once.. and they are not reading from
> > > the
> > > > > same entities that they are updating.
>
> > > > > What is your exact use case?
>
> > > > > On Thu, Apr 29, 2010 at 7:28 PM, hawkett <[email protected]> wrote:
> > > > > > Thanks for the response - it's good to know that the multiple
> > > > > > executions cannot occur in parallel, although I'm not sure I
> > > > > > completely understand the reasons. Take the following example -
>
> > > > > > 1. task queue executes a task for the first time (T1E1)
> > > > > > 2. application receives task, and begins processing
> > > > > > 3. the http connection is lost soon after, and the task queue
> > > receives
> > > > > > a HTTP response code
> > > > > > 4. task queue backs off (e.g. waits 2s)
> > > > > > 5. task queue executes the task a second time (T1E2)
> > > > > > 6. application receives task and begins processing
>
> > > > > > Why is it that T1E1 cannot still be running at step 5/6? Are there 
> > > > > > no
> > > > > > conditions at step 3 where a response (of any status) is received
> > > > > > while the processing at step 2 is still underway?
>
> > > > > > There is also another situation, where the HTTP client crashes, 
> > > > > > which
> > > > > > is also unclear -
>
> > > > > > 1. task queue executes a task for the first time (T1E1)
> > > > > > 2. application receives task, and begins processing
> > > > > > 3. the task queue crashes (i.e. the HTTP client), so no response can
> > > > > > be received
> > > > > > 4. task queue recovers, or another node takes over - (how does it
> > > > > > determine the state of T1E1?)
> > > > > > 5. task queue executes the task a second time, since it cannot know
> > > > > > whether T1E1 completed successfully? (T1E2)
> > > > > > 6. application receives task and begins processing
>
> > > > > > Is it possible in this scenario that it will re-execute the task
> > > > > > (T1E2) prior to the completion of the first (T1E1)?
>
> > > > > > Thanks,
>
> > > > > > Colin
>
> > > > > > On Apr 29, 5:36 pm, djidjadji <[email protected]> wrote:
> > > > > > > The decision to rerun a task is done based on the HTTP response
> > > code.
> > > > > > > There is always a response code, even when the connection is lost.
>
> > > > > > > When the code is 200 the task is considered complete and will not
> > > be
> > > > > > rerun.
> > > > > > > Any other code means the task needs a rerun.
> > > > > > > The time between the reruns is increased with each retry.
>
> > > > > > > This means a certain task is never retried in parallel.
>
> > > > > > > But it could be that a task created later will finish first 
> > > > > > > because
> > > it
> > > > > > > did not need to retry.
>
> > > > > > > 2010/4/25 hawkett <[email protected]>:
>
> > > > > > > > Wondering if I haven't asked the question clearly enough.
> > > Regarding
> > > > > > > > the statement that we need to assume tasks may be executed
> > > multiple
> > > > > > > > times (i.e. ensure idempotence): is that multiple times 
> > > > > > > > serially,
> > > or
> > > > > > > > possibly multiple times concurrently?
>
> > > > > > > > I've gone ahead and coded my idempotence solution to assume that
> > > they
> > > > > > > > cannot be running concurrently, just because its a bit easier,
> > > and a
> > > > > > > > bit less work inside a transaction. I'm guessing that the reason
> > > they
> > > > > > > > may be run multiple times is that GAE won't know what to do if 
> > > > > > > > it
> > > > > > > > doesn't get a response from a task it executes - it can't be 
> > > > > > > > sure
> > > that
> > > > > > > > the task was received by the application, or that the 
> > > > > > > > application
> > > was
> > > > > > > > given the opportunity to correctly react the task - in fact it
> > > has to
> > > > > > > > assume that it didn't, and therefore runs it again to be sure.
> > >  I'm
> > > > > > > > assuming that GAE always knows for certain that a task has been
> > > fired,
> > > > > > > > just not whether it was fired successfully - and it will only
> > > fire
> > > > > > > > again if it hasn't correctly processed a response from the
> > > previous
> > > > > > > > execution. If this were true, then it seems as long as GAE
> > > guarantees
> > > > > > > > that it waits > 30s before firing
>
> ...
>
> read more »

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.

[google-appengine] Re: Idempotence & multiple task execution

Reply via email to