Re: [Puppet-dev] Handling failures in prefetch

Gareth Rushgrove Thu, 23 Jul 2015 05:38:30 -0700

On 23 July 2015 at 03:44, Kylo Ginsberg <[email protected]> wrote:
> On Wed, Jul 22, 2015 at 1:37 PM, Branan Riley <[email protected]> wrote:
>>
>> On Wed, Jul 22, 2015 at 12:44 PM Kylo Ginsberg <[email protected]>
>> wrote:
>>>
>>> On Tue, Jul 21, 2015 at 12:40 PM, Branan Riley <[email protected]>
>>> wrote:
>>>>
>>>> <This is email 2/2 inspired by PUP-4813>
>>>>
>>>> With puppet 4, we made a change to how prefetch failures are handled
>>>> by the transaction[1]. This is actually a pretty good change, as it
>>>> prevents puppet from some nasty misbehaviors.
>>>>
>>>> I still don't think it's the "correct" behavior, though. Aborting the
>>>> entire catalog because one resource type is broken doesn't seem like
>>>> the best behavior, especially when it comes to reporting[2]
>>>>
>>>> I feel like the correct behavior is something like this:
>>>>
>>>> * If an exception escapes a prefetch, mark all resources of that type
>>>>   in the catalog as failed and do not try to apply them
>>>
>>>
>>> Background for a question:
>>>
>>> Wrt exceptions in prefetch, as of 4.0 there is a distinction between the
>>> expected/supported exceptions (there are two: Puppet::MissingCommand and
>>> LoadError) and everything else.
>>>
>>> I'm not sure this is *super* well documented, but it is mentioned as a
>>> breaking change in the 4.0 release notes:
>>>
>>>
>>> https://docs.puppetlabs.com/puppet/4.0/reference/release_notes.html#stuff-a-lot-of-people-will-notice
>>>
>>> and the rescue in question is here:
>>>
>>>
>>> https://github.com/puppetlabs/puppet/blob/810624fe6f88d520339173c0e811c5ede4009fac/lib/puppet/transaction.rb#L322
>>>
>>> So with that in mind, is the proposal here about the handling of the two
>>> supported exceptions or the handling of other exceptions?
>>>
>>>
>>> For the two supported exceptions, this seems like a reasonable behavior
>>> change and could also speed up catalog application in this failure case.
>>>
>>>
>>> For *other* exceptions, I'm not sure. I walked away from PUP-3656
>>> thinking that those two exceptions were now part of the contract we're
>>> specifying for providers. If so, a provider that throws any *other*
>>> exception has a bug which should be fixed, and we could be making matters
>>> worse by trying to soldier on. I.e. this seems like a situation where it's
>>> better to fail hard and the user should file a bug, upgrade/downgrade the
>>> module, etc.
>>
>>
>> I would say this should be for ALL exceptions. We absolutely should stop
>> puppet from soldiering on in ways that could cause problems, but I don't
>> think trying to apply other resource types in the catalog is one of those
>> ways. Prefetch failure should be a localized catastrophe, not a global one.
>>
>> The move away from rescuing everything was definitely good, but aborting
>> the entire transaction when only a single type is unprefetchable is a bit of
>> an over-reaction. We know that instances of that type (really just ones of
>> that type/provider pair) are bad, and we absolutely shouldn't apply them.
>> The rest of the catalog is still fine, though.
>>
>> The two exceptions that are supported allow a Puppet run to install any
>> packages/gems that a provider might need in order to prefetch. Skipping only
>> the resources that would have been prefetched instead of the entire catalog
>> serves that same purpose.
>
>
> I can see this argument. I'm still concerned that this would mean we aren't
> surfacing bugs that really should be exposed to the light of day. I suppose
> we could report the one case with a Warning and the other with an Error.
>
> But assuming we go ahead and do something like this (I see I'm the only one
> with the slightest reservation :)), I'm thinking we should audit how puppet
> handles exceptions in providers during catalog application but *outside*
> prefetch. It would be nice to be consistent throughout catalog application.
>
>>
>> Aborting the entire catalog also makes remediation with Puppet (when
>> puppet could be used for such) impossible, which seems super bad to me.
>
>
> Is it impossible? Couldn't you remediate with a catalog that doesn't include
> resources that will tickle prefetch from that provider? Not trying to
> quibble (i.e. yes, that could still be a PITA), but making sure I'm not
> missing something.
>
>>
>>
>>>
>>> Btw (and perhaps I should have asked this from the get-go before opining
>>> above): what was the exception in PUP-4813 that precipitated this thread?
>>
>>
>> The bad catalog from that ticket set the `target` field of mount to a
>> directory, which causes some sort of IO exception to be raised when
>> ParsedFile tries to access that directory as a regular file.
>
>
> Ooh fun. So we should fix that to be more robust. Separate ticket than this
> thread though.
>
>>
>>
>> This is what inspired my bullet point below - ParsedFile should be smart
>> enough to know which resources have a bad target and which are OK, and
>> appropriately fail only the ones it can't work with.
>>
>>>>
>>>>
>>>> * Make it possible to fail resources from *within* a prefetch. This
>>>>   lets a sufficiently smart prefetch handle the case when only some of
>>>>   its resources are hosed (parsedfile resources with multiple targets,
>>>>   for example, might be only partially unfetchable)
>>>
>>>
>>> How would this work? A new hook for providers to implement? If so, it'd
>>> be nice to flesh out a couple examples use cases so that we can think
>>> through the API with something concrete.
>>
>>
>> It's probably already "doable" by having prefetch mark the returned
>> provider instances as bad in some way, and the provider implementation for
>> flush could raise an exception. I'd prefer to see something at the
>> transaction layer, though. I'll think through what this might look like
>> more.
>
>
> Well, definitely some charm to not adding new methods that we have to
> sprinkle respond_to? tests around for. So the former approach might just
> fly.
>
> I'd also like to think through whether there are additional motivating use
> cases.
>


For some relevant colour we hit this problem hard with the AWS module.
We're prefetching a lot of complex data in some cases and we're going
it over the wire - so it going wrong is pretty likely.

We worked around that at the time by creating our own Exception
inherited from Exception which does get caught by Puppet:

https://github.com/puppetlabs/puppetlabs-aws/blob/master/lib/puppet_x/puppetlabs/aws.rb#L7-L23

And then raising that in cases of StandardError based exceptions being
raised in prefetch.

https://github.com/puppetlabs/puppetlabs-aws/blob/master/lib/puppet/provider/ec2_instance/v2.rb#L34-L36

Some more details in this issue too about the changes in 4.0.

https://jira.puppetlabs.com/browse/PUP-3656

Gareth

> Kylo
>
> --
> Kylo Ginsberg | [email protected] | irc: kylo | twitter: @kylog
>
> PuppetConf 2015 is coming to Portland, Oregon! Join us October 5-9.
> Register now to take advantage of the Early Bird discount —save $249!
>
> --
> You received this message because you are subscribed to the Google Groups
> "Puppet Developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/puppet-dev/CALsUZFHyPv0mGjeE5Xhn-sv0WYXJ2QbDzY802sBe2JJQR%3DLT5g%40mail.gmail.com.
>
> For more options, visit https://groups.google.com/d/optout.



-- 
Gareth Rushgrove
@garethr

devopsweekly.com
morethanseven.net
garethrushgrove.com

-- 
You received this message because you are subscribed to the Google Groups 
"Puppet Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/puppet-dev/CAFi_6y%2BxV9Gk5f_C7S4%2BQ6Oxyk5JmdNsRY%3D2XchPTRqa29ywBQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [Puppet-dev] Handling failures in prefetch

Reply via email to