Re: [swift-evolution] [Proposal] Random Unification

Félix Cloutier via swift-evolution Wed, 04 Oct 2017 00:40:20 -0700

I'm really not enthusiastic about `random() -> Self?` or `random() throws -> 
Self` when the only possible error is that some global object hasn't been 
initialized.


The idea of having `random` straight on integers and floats and collections was 
to provide a simple interface, but using a global CSPRNG for those operations 
comes at a significant usability cost. I think that something has to go:

Drop the random methods on FixedWidthInteger, FloatingPoint
...or drop the CSPRNG as a default
Drop the optional/throws, and trap on error

I know I wouldn't use the `Int.random()` method if I had to unwrap every single 
result, when getting one non-nil result guarantees that the program won't see 
any other nil result again until it restarts.

Félix

> Le 3 oct. 2017 à 23:44, Jonathan Hull <[email protected]> a écrit :
> 
> I like the idea of splitting it into 2 separate “Random” proposals.
> 
> The first would have Xiaodi’s built-in CSPRNG which only has the interface:
> 
> On FixedWidthInteger:
>       static func random()throws -> Self
>       static func random(in range: ClosedRange<Self>)throws -> Self
> 
> On Double:
>       static func random()throws -> Double
>       static func random(in range: ClosedRange<Double>)throws -> Double
> 
> (Everything else we want, like shuffled(), could be built in later proposals 
> by calling those functions)
> 
> The other option would be to remove the ‘throws’ from the above functions 
> (perhaps fatalError-ing), and provide an additional function which can be 
> used to check that there is enough entropy (so as to avoid the crash or fall 
> back to a worse source when the CSPRNG is unavailable).
> 
> 
> 
> Then a second proposal would bring in the concept of RandomSources (whatever 
> we call them), which can return however many random bytes you ask for… and a 
> protocol for types which know how to initialize themselves from those bytes.  
> That might be spelled like 'static func random(using: RandomSource)->Self'.  
> As a convenience, the source would also be able to create FixedWidthIntegers 
> and Doubles (both with and without a range), and would also have the 
> coinFlip() and oneIn(UInt)->Bool functions. Most types should be able to 
> build themselves off of that.  There would be a default source which is built 
> from the first protocol.
> 
> I also really think we should have a concept of Repeatably-Random as a 
> subprotocol for the second proposal.  I see far too many shipping apps which 
> have bugs due to using arc4Random when they really needed a repeatable source 
> (e.g. patterns and lines jump around when you resize things). If it was an 
> easy option, people would use it when appropriate. This would just mean a 
> sub-protocol which has an initializer which takes a seed, and the ability to 
> save/restore state (similar to CGContexts).
> 
> The second proposal would also include things like shuffled() and 
> shuffled(using:).
> 
> Thanks,
> Jon
> 
> 
> 
>> On Oct 3, 2017, at 9:31 PM, Alejandro Alonso <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>> I really like the schedule here. After reading for a while, I do agree with 
>> Brent that stdlib should very primitive in functionality that it provides. I 
>> also agree that the most important part right now is designing the internal 
>> crypto on which the numeric types use to return their respected random 
>> number. On the discussion of how we should handle not enough entropy with 
>> the device random, from a users perspective it makes sense that calling 
>> .random should just give me a random number, but from a developers 
>> perspective I see Optional being the best choice here. While I think 
>> blocking could, in most cases, provide the user an easier API, we have to do 
>> this right and be safe here by providing a value that indicates that there 
>> is room for error here. As for the generator abstraction, I believe there 
>> should be a bare basic protocol that sets a layout for new generators and 
>> should be focusing on its requirements. 
>> 
>> Whether or not RandomAccessCollection and MutableCollection should get 
>> .random and .shuffle/.shuffled in this first proposal is completely up in 
>> the air for me. It makes sense, to me, to include the .random in this 
>> proposal and open another one .shuffle/.shuffled, but I can see arguments 
>> that should say we create something separate for these two, or include all 
>> of it in this proposal.
>> 
>> - Alejandro
>> 
>> On Sep 27, 2017, 7:29 PM -0500, Xiaodi Wu <[email protected] 
>> <mailto:[email protected]>>, wrote:
>>> 
>>> On Wed, Sep 27, 2017 at 00:18 Félix Cloutier <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>>> Le 26 sept. 2017 à 16:14, Xiaodi Wu <[email protected] 
>>>> <mailto:[email protected]>> a écrit :
>>>> 
>>> 
>>>> On Tue, Sep 26, 2017 at 11:26 AM, Félix Cloutier <[email protected] 
>>>> <mailto:[email protected]>> wrote:
>>>> 
>>>> It's possible to use a CSPRNG-grade algorithm and seed it once to get a 
>>>> reproducible sequence, but when you use it as a CSPRNG, you typically feed 
>>>> entropy back into it at nondeterministic points to ensure that even if you 
>>>> started with a bad seed, you'll eventually get to an alright state. Unless 
>>>> you keep track of when entropy was mixed in and what the values were, 
>>>> you'll never get a reproducible CSPRNG.
>>>> 
>>>> We would give developers a false sense of security if we provided them 
>>>> with CSPRNG-grade algorithms that we called CSPRNGs and that they could 
>>>> seed themselves. Just because it says "crypto-secure" in the name doesn't 
>>>> mean that it'll be crypto-secure if it's seeded with time(). Therefore, 
>>>> "reproducible" vs "non-reproducible" looks like a good distinction to me.
>>>> 
>>>> I disagree here, in two respects:
>>>> 
>>>> First, whether or not a particular PRNG is cryptographically secure is an 
>>>> intrinsic property of the algorithm; whether it's "reproducible" or not is 
>>>> determined by the published API. In other words, the distinction between 
>>>> CSPRNG vs. non-CSPRNG is important to document because it's semantics that 
>>>> cannot be deduced by the user otherwise, and it is an important one for 
>>>> writing secure code because it tells you whether an attacker can predict 
>>>> future outputs based only on observing past outputs. "Reproducible" in the 
>>>> sense of seedable or not is trivially noted by inspection of the published 
>>>> API, and it is rather immaterial to writing secure code.
>>> 
>>> 
>>> Cryptographically secure is not a property that I'm comfortable applying to 
>>> an algorithm. You cannot say that you've made a cryptographically secure 
>>> thing just because you've used all the right algorithms: you also have to 
>>> use them right, and one of the most critical components of a 
>>> cryptographically secure PRNG is its seed.
>>> 
>>> A cryptographically secure algorithm isn’t sufficient, but it is necessary. 
>>> That’s why it’s important to mark them as such. If I'm a careful developer, 
>>> then it is absolutely important to me to know that I’m using a PRNG with a 
>>> cryptographically secure algorithm, and that the particular implementation 
>>> of that algorithm is correct and secure.
>>> 
>>> It is a *feature* of a lot of modern CSPRNGs that you can't seed them:
>>> 
>>> You cannot seed or add entropy to std::random_device
>>> 
>>> Although std::random_device may in practice be backed by a software CSPRNG, 
>>> IIUC, the intention is that it can provide access to a hardware 
>>> non-deterministic source when available.
>>> 
>>> You cannot seed or add entropy to CryptGenRandom
>>> You can only add entropy to /dev/(u)random
>>> You can only add entropy to BSD's arc4random
>>> 
>>> Ah, I see. I think we mean different things when we say PRNG. A PRNG is an 
>>> entirely deterministic algorithm; the output is non-random and the 
>>> algorithm itself requires no entropy. If a PRNG is seeded with a random 
>>> sequence of bits, its output can "appear" to be random. A CSPRNG is a PRNG 
>>> that fulfills certain criteria such that its output can be appropriate for 
>>> use in cryptographic applications in place of a truly random sequence *if* 
>>> the input to the CSPRNG is itself random.
>>> 
>>> The examples you give above *incorporate* a CSPRNG, environment entropy, 
>>> and a set of rules about when to mix in additional entropy in order to 
>>> produce output indistinguishable from a random sequence, but they are *not* 
>>> themselves really *pseudorandom* generators because they are not 
>>> deterministic. Not only do such sources of random numbers not require an 
>>> interface to allow seeding, they do not even have to be publicly 
>>> instantiable: Swift need only expose a single thread-safe instance (or an 
>>> instance per  thread) of a single type that provides access to 
>>> CryptGenRandom/urandom/arc4random, since after all the output of multiple 
>>> instances of that type should be statistically indistinguishable from the 
>>> output of only one.
>>> 
>>> What I was trying to respond to, by contrast, is the design of a hierarchy 
>>> of protocols CSPRNG : PRNG (or, in Alejandro's proposal, UnsafeRandomSource 
>>> : RandomSource) and the appropriate APIs to expose on each. This is 
>>> entirely inapplicable to your examples. It stands to reason that a 
>>> non-instantiable source of random numbers does not require a protocol of 
>>> its own (a hypothetical RNG : CSPRNG), since there is no reason to 
>>> implement (if done correctly) more than a single publicly non-instantiable 
>>> singleton type that could conform to it. For that matter, the concrete type 
>>> itself probably doesn't need *any* public API at all. Instead, extensions 
>>> to standard library types such as Int that implement conformance to the 
>>> protocol that Alejandro names "Randomizable" could call internal APIs to 
>>> provide all the necessary functionality, and third-party types that need to 
>>> conform to "Randomizable" could then in turn use `Int.random()` or 
>>> `Double.random()` to implement their own conformance. In fact, the concrete 
>>> random number generator type doesn't need to be public at all. All public 
>>> interaction could be through APIs such as `Int.random()`.
>>> 
>>> 
>>> Just because we can expose a seed interface doesn't mean we should, and in 
>>> this case I believe that it would go against the prime objective of 
>>> providing secure random numbers.
>>> 
>>> 
>>> If we're talking about a Swift interface to a non-deterministic source of 
>>> random numbers like urandom or arc4random, then, as I write above, not only 
>>> do I agree that it doesn't need to be seedable, it also does not need to be 
>>> instantiable at all, does not need to conform to a protocol that 
>>> specifically requires the semantics of a non-deterministic source, does not 
>>> need to expose any public interface whatsoever, and doesn't itself even 
>>> need to be public. (Does it even need to be a type, as opposed to simply a 
>>> free function?)
>>> 
>>> In fact, having reasoned through all of this, we can split the design task 
>>> into two. The most essential part, which definitely should be part of the 
>>> stdlib, would be an internal interface to a cryptographically secure 
>>> platform-specific entropy source, a public protocol named something like 
>>> Randomizable (to be bikeshedded), and the appropriate implementations on 
>>> Boolean, binary integer, and floating point types to conform them to 
>>> Randomizable so that users can write `Bool.random()` or `Int.random()`. The 
>>> second part, which can be a separate proposal or even a standalone core 
>>> library or third-party library, would be the protocols and concrete types 
>>> that implement pseudorandom number generators, allowing for reproducible 
>>> pseudorandom sequences. In other words, instead of PRNGs and CSPRNGs being 
>>> the primitives on which `Int.random()` is implemented; `Int.random()` 
>>> should be the standard library primitive which allows PRNGs and CSPRNGs to 
>>> be seeded.
>>>> If your attacker can observe your seeding once, chances are that they can 
>>>> observe your reseeding too; then, they can use their own implementation of 
>>>> the PRNG (whether CSPRNG or non-CSPRNG) and reproduce your pseudorandom 
>>>> sequence whether or not Swift exposes any particular API.
>>> 
>>> On Linux, the random devices are initially seeded with machine-specific but 
>>> rather invariant data that makes /dev/urandom spit out predictable numbers. 
>>> It is considered "seeded" after a root process writes POOL_SIZE bytes to 
>>> it. On most implementations, this initial seed is stored on disk: when the 
>>> computer shuts down, it reads POOL_SIZE bytes from /dev/urandom and saves 
>>> it in a file, and the contents of that file is loaded back into 
>>> /dev/urandom when the computer starts. A scenario where someone can read 
>>> that file is certainly not less likely than a scenario where /dev/urandom 
>>> was deleted. That doesn't mean that they have kernel code execution or that 
>>> they can pry into your process, but they have a good shot at guessing your 
>>> seed and subsequent RNG results if no stirring happens.
>>> 
>>> Sorry, I don't understand what you're getting at here. Again, I'm talking 
>>> about deterministic algorithms, not non-deterministic sources of random 
>>> numbers.
>>> 
>>>> Secondly, I see no reason to justify the notion that, simply because a 
>>>> PRNG is cryptographically secure, we ought to hide the seeding initializer 
>>>> (because one has to exist internally anyway) from the public. Obviously, 
>>>> one use case for a deterministic PRNG is to get reproducible sequences of 
>>>> random-appearing values; this can be useful whether the underlying 
>>>> algorithm is cryptographically secure or not. There are innumerably many 
>>>> ways to use data generated from a CSPRNG in non-cryptographically secure 
>>>> ways and omitting or including a public seeding initializer does not 
>>>> change that; in other words, using a deterministic seed for a CSPRNG would 
>>>> be a bad idea in certain applications, but it's a deliberate act, and 
>>>> someone who would mistakenly do that is clearly incapable of *using* the 
>>>> output from the PRNG in a secure way either; put a third way, you would be 
>>>> hard pressed to find a situation where it's true that "if only Swift had 
>>>> not made the seeding initializer public, this author would have written 
>>>> secure code, but instead the only security hole that existed in the code 
>>>> was caused by the availability of a public seeding initializer mistakenly 
>>>> used." The point of having both explicitly instantiable PRNGs and a layer 
>>>> of simpler APIs like "Int.random()" is so that the less experienced user 
>>>> can get the "right thing" by default, and the experienced user can 
>>>> customize the behavior; any user that instantiates his or her own 
>>>> ChaCha20Random instance is already calling for the power user interface; 
>>>> it is reasonable to expose the underlying primitive operations (such as 
>>>> seeding) so long as there are legitimate uses for it.
>>> 
>>> Nothing prevents us from using the same algorithm for a CSPRNG that is 
>>> safely pre-seeded and a PRNG that people seed themselves, mind you. 
>>> However, especially when it comes to security, there is a strong 
>>> responsibility to drive developers into a pit of success: the most obvious 
>>> thing to do has to be the right one, and suggesting to 
>>> cryptographically-unaware developers that they have everything they need to 
>>> manage their own seed is not a step in that direction.
>>> 
>>> I'm not opposed to a ChaCha20Random type; I'm opposed to explicitly calling 
>>> it cryptographically-secure, because it is not unless you know what to do 
>>> with it. It is emphatically not far-fetched to imagine a developer who 
>>> thinks that they can outdo the standard library by using their own 
>>> ChaCha20Random instance after it's been seeded with time() if we let them 
>>> know that it's "cryptographically secure". If you're a power user and you 
>>> don't like the default, known-good CSPRNG, then you're hopefully good 
>>> enough to know that ChaCha20 is considered a cryptographically-secure 
>>> algorithm without help labels from the language, and you know how to 
>>> operate it.
>>> 
>>>> I'm fully aware of the myths surrounding /dev/urandom and /dev/random. 
>>>> /dev/urandom might never run out, but it is also possible for it not to be 
>>>> initialized at all, as in the case of some VM setups. In some older 
>>>> versions of iOS, /dev/[u]random is reportedly sandboxed out. On systems 
>>>> where it is available, it can also be deleted, since it is a file. The 
>>>> point is, all of these scenarios cause an error during seeding of a 
>>>> CSPRNG. The question is, how to proceed in the face of inability to access 
>>>> entropy. We must do something, because we cannot therefore return a 
>>>> cryptographically secure answer. Rare trapping on invocation of 
>>>> Int.random() or permanently waiting for a never-to-be-initialized 
>>>> /dev/urandom would be terrible to debug, but returning an optional or 
>>>> throwing all the time would be verbose. How to design this API?
>>> 
>>> If the only concern is that the system might not be initialized enough, I'd 
>>> say that whatever returns an instance of a global, framework-seeded CSPRNG 
>>> should return an Optional, and the random methods that use the global 
>>> CSPRNG can trap and scream that the system is not initialized enough. If 
>>> this is a likely error for you, you can check if the CSPRNG exists or not 
>>> before jumping.
>>> 
>>> Also note that there is only one system for which Swift is officially 
>>> distributed (Ubuntu 14.04) on which the only way to get entropy from the OS 
>>> is to open a random device and read from it.
>>> 
>>> Again, I'm not only talking about urandom. As far as I'm aware, every API 
>>> to retrieve cryptographically secure sequences of random bits on every 
>>> platform for which Swift is distributed can potentially return an error 
>>> instead of random bits. The question is, what design for our API is the 
>>> most sensible way to deal with this contingency? On rethinking, I do 
>>> believe that consistently returning an Optional is the best way to go about 
>>> it, allowing the user to either (a) supply a deterministic fallback; (b) 
>>> raise an error of their own choosing; or (c) trap--all with a minimum of 
>>> fuss. This seems very Swifty to me.
>>>  
>>> 
>>>>> * What should the default CSPRNG be? There are good arguments for using a 
>>>>> cryptographically secure device random. (In my proposed implementation, 
>>>>> for device random, I use Security.framework on Apple platforms (because 
>>>>> /dev/urandom is not guaranteed to be available due to the sandbox, IIUC). 
>>>>> On Linux platforms, I would prefer to use getrandom() and avoid using 
>>>>> file system APIs, but getrandom() is new and unsupported on some versions 
>>>>> of Ubuntu that Swift supports. This is an issue in and of itself.) Now, a 
>>>>> number of these facilities strictly limit or do not guarantee 
>>>>> availability of more than a small number of random bytes at a time; they 
>>>>> are recommended for seeding other PRNGs but *not* as a routine source of 
>>>>> random numbers. Therefore, although device random should be available to 
>>>>> users, it probably shouldn’t be the default for the Swift standard 
>>>>> library as it could have negative consequences for the system as a whole. 
>>>>> There follows the significant task of implementing a CSPRNG correctly and 
>>>>> securely for the default PRNG.
>>>> 
>>>> Theo give a talk a few years ago 
>>>> <https://www.youtube.com/watch?v=aWmLWx8ut20> on randomness and how these 
>>>> problems are approached in LibreSSL.
>>>> 
>>>> Certainly, we can learn a lot from those like Theo who've dealt with the 
>>>> issue. I'm not in a position to watch the talk at the moment; can you 
>>>> summarize what the tl;dr version of it is?
>>> 
>>> I saw it three years ago, so I don't remember all the details. The gist is 
>>> that:
>>> 
>>> OpenBSD's random is available from extremely early in the boot process with 
>>> reasonable entropy
>>> LibreSSL includes OpenBSD's arc4random, and it's a "good" PRNG (which 
>>> doesn't actually use ARC4)
>>> That implementation of arc4random is good because it is fool-proof and it 
>>> has basically no failure mode
>>> Stirring is good, having multiple components take random numbers from the 
>>> same source probably makes results harder to guess too
>>> Getrandom/getentropy is in all ways better than reading from random devices
>>> 
>>> Vigorously agree on all points. Thanks for the summary. 
>>> 
>

_______________________________________________
swift-evolution mailing list
[email protected]
https://lists.swift.org/mailman/listinfo/swift-evolution

Re: [swift-evolution] [Proposal] Random Unification

Reply via email to