Re: [swift-evolution] [Proposal] Random Unification

Dave DeLong via swift-evolution Wed, 04 Oct 2017 09:26:09 -0700

Riffing off this a bit…

I’d like to see minimal Random support in the stdlib, and then all this 
specialization stuff in a “non-standard” library. Ie, a library that ships with 
Swift, but is not imported by default.


As I’m developing apps, I don’t need the massive autocompletion overload and 
cognitive overhead that comes from trying to understand all these proposed 
protocols and use-cases unless I am actually going to be needing randomization. 
If I need randomization, I should be explicitly opting-in to it by doing 
“import Random”.

Dave


> On Oct 3, 2017, at 10:31 PM, Alejandro Alonso via swift-evolution 
> <[email protected]> wrote:
> 
> I really like the schedule here. After reading for a while, I do agree with 
> Brent that stdlib should very primitive in functionality that it provides. I 
> also agree that the most important part right now is designing the internal 
> crypto on which the numeric types use to return their respected random 
> number. On the discussion of how we should handle not enough entropy with the 
> device random, from a users perspective it makes sense that calling .random 
> should just give me a random number, but from a developers perspective I see 
> Optional being the best choice here. While I think blocking could, in most 
> cases, provide the user an easier API, we have to do this right and be safe 
> here by providing a value that indicates that there is room for error here. 
> As for the generator abstraction, I believe there should be a bare basic 
> protocol that sets a layout for new generators and should be focusing on its 
> requirements. 
> 
> Whether or not RandomAccessCollection and MutableCollection should get 
> .random and .shuffle/.shuffled in this first proposal is completely up in the 
> air for me. It makes sense, to me, to include the .random in this proposal 
> and open another one .shuffle/.shuffled, but I can see arguments that should 
> say we create something separate for these two, or include all of it in this 
> proposal.
> 
> - Alejandro
> 
> On Sep 27, 2017, 7:29 PM -0500, Xiaodi Wu <[email protected]>, wrote:
>> 
>> On Wed, Sep 27, 2017 at 00:18 Félix Cloutier <[email protected] 
>> <mailto:[email protected]>> wrote:
>>> Le 26 sept. 2017 à 16:14, Xiaodi Wu <[email protected] 
>>> <mailto:[email protected]>> a écrit :
>>> 
>> 
>>> On Tue, Sep 26, 2017 at 11:26 AM, Félix Cloutier <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>> 
>>> It's possible to use a CSPRNG-grade algorithm and seed it once to get a 
>>> reproducible sequence, but when you use it as a CSPRNG, you typically feed 
>>> entropy back into it at nondeterministic points to ensure that even if you 
>>> started with a bad seed, you'll eventually get to an alright state. Unless 
>>> you keep track of when entropy was mixed in and what the values were, 
>>> you'll never get a reproducible CSPRNG.
>>> 
>>> We would give developers a false sense of security if we provided them with 
>>> CSPRNG-grade algorithms that we called CSPRNGs and that they could seed 
>>> themselves. Just because it says "crypto-secure" in the name doesn't mean 
>>> that it'll be crypto-secure if it's seeded with time(). Therefore, 
>>> "reproducible" vs "non-reproducible" looks like a good distinction to me.
>>> 
>>> I disagree here, in two respects:
>>> 
>>> First, whether or not a particular PRNG is cryptographically secure is an 
>>> intrinsic property of the algorithm; whether it's "reproducible" or not is 
>>> determined by the published API. In other words, the distinction between 
>>> CSPRNG vs. non-CSPRNG is important to document because it's semantics that 
>>> cannot be deduced by the user otherwise, and it is an important one for 
>>> writing secure code because it tells you whether an attacker can predict 
>>> future outputs based only on observing past outputs. "Reproducible" in the 
>>> sense of seedable or not is trivially noted by inspection of the published 
>>> API, and it is rather immaterial to writing secure code.
>> 
>> 
>> Cryptographically secure is not a property that I'm comfortable applying to 
>> an algorithm. You cannot say that you've made a cryptographically secure 
>> thing just because you've used all the right algorithms: you also have to 
>> use them right, and one of the most critical components of a 
>> cryptographically secure PRNG is its seed.
>> 
>> A cryptographically secure algorithm isn’t sufficient, but it is necessary. 
>> That’s why it’s important to mark them as such. If I'm a careful developer, 
>> then it is absolutely important to me to know that I’m using a PRNG with a 
>> cryptographically secure algorithm, and that the particular implementation 
>> of that algorithm is correct and secure.
>> 
>> It is a *feature* of a lot of modern CSPRNGs that you can't seed them:
>> 
>> You cannot seed or add entropy to std::random_device
>> 
>> Although std::random_device may in practice be backed by a software CSPRNG, 
>> IIUC, the intention is that it can provide access to a hardware 
>> non-deterministic source when available.
>> 
>> You cannot seed or add entropy to CryptGenRandom
>> You can only add entropy to /dev/(u)random
>> You can only add entropy to BSD's arc4random
>> 
>> Ah, I see. I think we mean different things when we say PRNG. A PRNG is an 
>> entirely deterministic algorithm; the output is non-random and the algorithm 
>> itself requires no entropy. If a PRNG is seeded with a random sequence of 
>> bits, its output can "appear" to be random. A CSPRNG is a PRNG that fulfills 
>> certain criteria such that its output can be appropriate for use in 
>> cryptographic applications in place of a truly random sequence *if* the 
>> input to the CSPRNG is itself random.
>> 
>> The examples you give above *incorporate* a CSPRNG, environment entropy, and 
>> a set of rules about when to mix in additional entropy in order to produce 
>> output indistinguishable from a random sequence, but they are *not* 
>> themselves really *pseudorandom* generators because they are not 
>> deterministic. Not only do such sources of random numbers not require an 
>> interface to allow seeding, they do not even have to be publicly 
>> instantiable: Swift need only expose a single thread-safe instance (or an 
>> instance per thread) of a single type that provides access to 
>> CryptGenRandom/urandom/arc4random, since after all the output of multiple 
>> instances of that type should be statistically indistinguishable from the 
>> output of only one.
>> 
>> What I was trying to respond to, by contrast, is the design of a hierarchy 
>> of protocols CSPRNG : PRNG (or, in Alejandro's proposal, UnsafeRandomSource 
>> : RandomSource) and the appropriate APIs to expose on each. This is entirely 
>> inapplicable to your examples. It stands to reason that a non-instantiable 
>> source of random numbers does not require a protocol of its own (a 
>> hypothetical RNG : CSPRNG), since there is no reason to implement (if done 
>> correctly) more than a single publicly non-instantiable singleton type that 
>> could conform to it. For that matter, the concrete type itself probably 
>> doesn't need *any* public API at all. Instead, extensions to standard 
>> library types such as Int that implement conformance to the protocol that 
>> Alejandro names "Randomizable" could call internal APIs to provide all the 
>> necessary functionality, and third-party types that need to conform to 
>> "Randomizable" could then in turn use `Int.random()` or `Double.random()` to 
>> implement their own conformance. In fact, the concrete random number 
>> generator type doesn't need to be public at all. All public interaction 
>> could be through APIs such as `Int.random()`.
>> 
>> 
>> Just because we can expose a seed interface doesn't mean we should, and in 
>> this case I believe that it would go against the prime objective of 
>> providing secure random numbers.
>> 
>> 
>> If we're talking about a Swift interface to a non-deterministic source of 
>> random numbers like urandom or arc4random, then, as I write above, not only 
>> do I agree that it doesn't need to be seedable, it also does not need to be 
>> instantiable at all, does not need to conform to a protocol that 
>> specifically requires the semantics of a non-deterministic source, does not 
>> need to expose any public interface whatsoever, and doesn't itself even need 
>> to be public. (Does it even need to be a type, as opposed to simply a free 
>> function?)
>> 
>> In fact, having reasoned through all of this, we can split the design task 
>> into two. The most essential part, which definitely should be part of the 
>> stdlib, would be an internal interface to a cryptographically secure 
>> platform-specific entropy source, a public protocol named something like 
>> Randomizable (to be bikeshedded), and the appropriate implementations on 
>> Boolean, binary integer, and floating point types to conform them to 
>> Randomizable so that users can write `Bool.random()` or `Int.random()`. The 
>> second part, which can be a separate proposal or even a standalone core 
>> library or third-party library, would be the protocols and concrete types 
>> that implement pseudorandom number generators, allowing for reproducible 
>> pseudorandom sequences. In other words, instead of PRNGs and CSPRNGs being 
>> the primitives on which `Int.random()` is implemented; `Int.random()` should 
>> be the standard library primitive which allows PRNGs and CSPRNGs to be 
>> seeded.
>>> If your attacker can observe your seeding once, chances are that they can 
>>> observe your reseeding too; then, they can use their own implementation of 
>>> the PRNG (whether CSPRNG or non-CSPRNG) and reproduce your pseudorandom 
>>> sequence whether or not Swift exposes any particular API.
>> 
>> On Linux, the random devices are initially seeded with machine-specific but 
>> rather invariant data that makes /dev/urandom spit out predictable numbers. 
>> It is considered "seeded" after a root process writes POOL_SIZE bytes to it. 
>> On most implementations, this initial seed is stored on disk: when the 
>> computer shuts down, it reads POOL_SIZE bytes from /dev/urandom and saves it 
>> in a file, and the contents of that file is loaded back into /dev/urandom 
>> when the computer starts. A scenario where someone can read that file is 
>> certainly not less likely than a scenario where /dev/urandom was deleted. 
>> That doesn't mean that they have kernel code execution or that they can pry 
>> into your process, but they have a good shot at guessing your seed and 
>> subsequent RNG results if no stirring happens.
>> 
>> Sorry, I don't understand what you're getting at here. Again, I'm talking 
>> about deterministic algorithms, not non-deterministic sources of random 
>> numbers.
>> 
>>> Secondly, I see no reason to justify the notion that, simply because a PRNG 
>>> is cryptographically secure, we ought to hide the seeding initializer 
>>> (because one has to exist internally anyway) from the public. Obviously, 
>>> one use case for a deterministic PRNG is to get reproducible sequences of 
>>> random-appearing values; this can be useful whether the underlying 
>>> algorithm is cryptographically secure or not. There are innumerably many 
>>> ways to use data generated from a CSPRNG in non-cryptographically secure 
>>> ways and omitting or including a public seeding initializer does not change 
>>> that; in other words, using a deterministic seed for a CSPRNG would be a 
>>> bad idea in certain applications, but it's a deliberate act, and someone 
>>> who would mistakenly do that is clearly incapable of *using* the output 
>>> from the PRNG in a secure way either; put a third way, you would be hard 
>>> pressed to find a situation where it's true that "if only Swift had not 
>>> made the seeding initializer public, this author would have written secure 
>>> code, but instead the only security hole that existed in the code was 
>>> caused by the availability of a public seeding initializer mistakenly 
>>> used." The point of having both explicitly instantiable PRNGs and a layer 
>>> of simpler APIs like "Int.random()" is so that the less experienced user 
>>> can get the "right thing" by default, and the experienced user can 
>>> customize the behavior; any user that instantiates his or her own 
>>> ChaCha20Random instance is already calling for the power user interface; it 
>>> is reasonable to expose the underlying primitive operations (such as 
>>> seeding) so long as there are legitimate uses for it.
>> 
>> Nothing prevents us from using the same algorithm for a CSPRNG that is 
>> safely pre-seeded and a PRNG that people seed themselves, mind you. However, 
>> especially when it comes to security, there is a strong responsibility to 
>> drive developers into a pit of success: the most obvious thing to do has to 
>> be the right one, and suggesting to cryptographically-unaware developers 
>> that they have everything they need to manage their own seed is not a step 
>> in that direction.
>> 
>> I'm not opposed to a ChaCha20Random type; I'm opposed to explicitly calling 
>> it cryptographically-secure, because it is not unless you know what to do 
>> with it. It is emphatically not far-fetched to imagine a developer who 
>> thinks that they can outdo the standard library by using their own 
>> ChaCha20Random instance after it's been seeded with time() if we let them 
>> know that it's "cryptographically secure". If you're a power user and you 
>> don't like the default, known-good CSPRNG, then you're hopefully good enough 
>> to know that ChaCha20 is considered a cryptographically-secure algorithm 
>> without help labels from the language, and you know how to operate it.
>> 
>>> I'm fully aware of the myths surrounding /dev/urandom and /dev/random. 
>>> /dev/urandom might never run out, but it is also possible for it not to be 
>>> initialized at all, as in the case of some VM setups. In some older 
>>> versions of iOS, /dev/[u]random is reportedly sandboxed out. On systems 
>>> where it is available, it can also be deleted, since it is a file. The 
>>> point is, all of these scenarios cause an error during seeding of a CSPRNG. 
>>> The question is, how to proceed in the face of inability to access entropy. 
>>> We must do something, because we cannot therefore return a 
>>> cryptographically secure answer. Rare trapping on invocation of 
>>> Int.random() or permanently waiting for a never-to-be-initialized 
>>> /dev/urandom would be terrible to debug, but returning an optional or 
>>> throwing all the time would be verbose. How to design this API?
>> 
>> If the only concern is that the system might not be initialized enough, I'd 
>> say that whatever returns an instance of a global, framework-seeded CSPRNG 
>> should return an Optional, and the random methods that use the global CSPRNG 
>> can trap and scream that the system is not initialized enough. If this is a 
>> likely error for you, you can check if the CSPRNG exists or not before 
>> jumping.
>> 
>> Also note that there is only one system for which Swift is officially 
>> distributed (Ubuntu 14.04) on which the only way to get entropy from the OS 
>> is to open a random device and read from it.
>> 
>> Again, I'm not only talking about urandom. As far as I'm aware, every API to 
>> retrieve cryptographically secure sequences of random bits on every platform 
>> for which Swift is distributed can potentially return an error instead of 
>> random bits. The question is, what design for our API is the most sensible 
>> way to deal with this contingency? On rethinking, I do believe that 
>> consistently returning an Optional is the best way to go about it, allowing 
>> the user to either (a) supply a deterministic fallback; (b) raise an error 
>> of their own choosing; or (c) trap--all with a minimum of fuss. This seems 
>> very Swifty to me.
>>  
>> 
>>>> * What should the default CSPRNG be? There are good arguments for using a 
>>>> cryptographically secure device random. (In my proposed implementation, 
>>>> for device random, I use Security.framework on Apple platforms (because 
>>>> /dev/urandom is not guaranteed to be available due to the sandbox, IIUC). 
>>>> On Linux platforms, I would prefer to use getrandom() and avoid using file 
>>>> system APIs, but getrandom() is new and unsupported on some versions of 
>>>> Ubuntu that Swift supports. This is an issue in and of itself.) Now, a 
>>>> number of these facilities strictly limit or do not guarantee availability 
>>>> of more than a small number of random bytes at a time; they are 
>>>> recommended for seeding other PRNGs but *not* as a routine source of 
>>>> random numbers. Therefore, although device random should be available to 
>>>> users, it probably shouldn’t be the default for the Swift standard library 
>>>> as it could have negative consequences for the system as a whole. There 
>>>> follows the significant task of implementing a CSPRNG correctly and 
>>>> securely for the default PRNG.
>>> 
>>> Theo give a talk a few years ago 
>>> <https://www.youtube.com/watch?v=aWmLWx8ut20> on randomness and how these 
>>> problems are approached in LibreSSL.
>>> 
>>> Certainly, we can learn a lot from those like Theo who've dealt with the 
>>> issue. I'm not in a position to watch the talk at the moment; can you 
>>> summarize what the tl;dr version of it is?
>> 
>> I saw it three years ago, so I don't remember all the details. The gist is 
>> that:
>> 
>> OpenBSD's random is available from extremely early in the boot process with 
>> reasonable entropy
>> LibreSSL includes OpenBSD's arc4random, and it's a "good" PRNG (which 
>> doesn't actually use ARC4)
>> That implementation of arc4random is good because it is fool-proof and it 
>> has basically no failure mode
>> Stirring is good, having multiple components take random numbers from the 
>> same source probably makes results harder to guess too
>> Getrandom/getentropy is in all ways better than reading from random devices
>> 
>> Vigorously agree on all points. Thanks for the summary. 
>> 
> _______________________________________________
> swift-evolution mailing list
> [email protected]
> https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
[email protected]
https://lists.swift.org/mailman/listinfo/swift-evolution

Re: [swift-evolution] [Proposal] Random Unification

Reply via email to