I'm really not enthusiastic about `random() -> Self?` or `random() throws -> Self` when the only possible error is that some global object hasn't been initialized.
The idea of having `random` straight on integers and floats and collections was to provide a simple interface, but using a global CSPRNG for those operations comes at a significant usability cost. I think that something has to go: Drop the random methods on FixedWidthInteger, FloatingPoint ...or drop the CSPRNG as a default Drop the optional/throws, and trap on error I know I wouldn't use the `Int.random()` method if I had to unwrap every single result, when getting one non-nil result guarantees that the program won't see any other nil result again until it restarts. Félix > Le 3 oct. 2017 à 23:44, Jonathan Hull <[email protected]> a écrit : > > I like the idea of splitting it into 2 separate “Random” proposals. > > The first would have Xiaodi’s built-in CSPRNG which only has the interface: > > On FixedWidthInteger: > static func random()throws -> Self > static func random(in range: ClosedRange<Self>)throws -> Self > > On Double: > static func random()throws -> Double > static func random(in range: ClosedRange<Double>)throws -> Double > > (Everything else we want, like shuffled(), could be built in later proposals > by calling those functions) > > The other option would be to remove the ‘throws’ from the above functions > (perhaps fatalError-ing), and provide an additional function which can be > used to check that there is enough entropy (so as to avoid the crash or fall > back to a worse source when the CSPRNG is unavailable). > > > > Then a second proposal would bring in the concept of RandomSources (whatever > we call them), which can return however many random bytes you ask for… and a > protocol for types which know how to initialize themselves from those bytes. > That might be spelled like 'static func random(using: RandomSource)->Self'. > As a convenience, the source would also be able to create FixedWidthIntegers > and Doubles (both with and without a range), and would also have the > coinFlip() and oneIn(UInt)->Bool functions. Most types should be able to > build themselves off of that. There would be a default source which is built > from the first protocol. > > I also really think we should have a concept of Repeatably-Random as a > subprotocol for the second proposal. I see far too many shipping apps which > have bugs due to using arc4Random when they really needed a repeatable source > (e.g. patterns and lines jump around when you resize things). If it was an > easy option, people would use it when appropriate. This would just mean a > sub-protocol which has an initializer which takes a seed, and the ability to > save/restore state (similar to CGContexts). > > The second proposal would also include things like shuffled() and > shuffled(using:). > > Thanks, > Jon > > > >> On Oct 3, 2017, at 9:31 PM, Alejandro Alonso <[email protected] >> <mailto:[email protected]>> wrote: >> >> I really like the schedule here. After reading for a while, I do agree with >> Brent that stdlib should very primitive in functionality that it provides. I >> also agree that the most important part right now is designing the internal >> crypto on which the numeric types use to return their respected random >> number. On the discussion of how we should handle not enough entropy with >> the device random, from a users perspective it makes sense that calling >> .random should just give me a random number, but from a developers >> perspective I see Optional being the best choice here. While I think >> blocking could, in most cases, provide the user an easier API, we have to do >> this right and be safe here by providing a value that indicates that there >> is room for error here. As for the generator abstraction, I believe there >> should be a bare basic protocol that sets a layout for new generators and >> should be focusing on its requirements. >> >> Whether or not RandomAccessCollection and MutableCollection should get >> .random and .shuffle/.shuffled in this first proposal is completely up in >> the air for me. It makes sense, to me, to include the .random in this >> proposal and open another one .shuffle/.shuffled, but I can see arguments >> that should say we create something separate for these two, or include all >> of it in this proposal. >> >> - Alejandro >> >> On Sep 27, 2017, 7:29 PM -0500, Xiaodi Wu <[email protected] >> <mailto:[email protected]>>, wrote: >>> >>> On Wed, Sep 27, 2017 at 00:18 Félix Cloutier <[email protected] >>> <mailto:[email protected]>> wrote: >>>> Le 26 sept. 2017 à 16:14, Xiaodi Wu <[email protected] >>>> <mailto:[email protected]>> a écrit : >>>> >>> >>>> On Tue, Sep 26, 2017 at 11:26 AM, Félix Cloutier <[email protected] >>>> <mailto:[email protected]>> wrote: >>>> >>>> It's possible to use a CSPRNG-grade algorithm and seed it once to get a >>>> reproducible sequence, but when you use it as a CSPRNG, you typically feed >>>> entropy back into it at nondeterministic points to ensure that even if you >>>> started with a bad seed, you'll eventually get to an alright state. Unless >>>> you keep track of when entropy was mixed in and what the values were, >>>> you'll never get a reproducible CSPRNG. >>>> >>>> We would give developers a false sense of security if we provided them >>>> with CSPRNG-grade algorithms that we called CSPRNGs and that they could >>>> seed themselves. Just because it says "crypto-secure" in the name doesn't >>>> mean that it'll be crypto-secure if it's seeded with time(). Therefore, >>>> "reproducible" vs "non-reproducible" looks like a good distinction to me. >>>> >>>> I disagree here, in two respects: >>>> >>>> First, whether or not a particular PRNG is cryptographically secure is an >>>> intrinsic property of the algorithm; whether it's "reproducible" or not is >>>> determined by the published API. In other words, the distinction between >>>> CSPRNG vs. non-CSPRNG is important to document because it's semantics that >>>> cannot be deduced by the user otherwise, and it is an important one for >>>> writing secure code because it tells you whether an attacker can predict >>>> future outputs based only on observing past outputs. "Reproducible" in the >>>> sense of seedable or not is trivially noted by inspection of the published >>>> API, and it is rather immaterial to writing secure code. >>> >>> >>> Cryptographically secure is not a property that I'm comfortable applying to >>> an algorithm. You cannot say that you've made a cryptographically secure >>> thing just because you've used all the right algorithms: you also have to >>> use them right, and one of the most critical components of a >>> cryptographically secure PRNG is its seed. >>> >>> A cryptographically secure algorithm isn’t sufficient, but it is necessary. >>> That’s why it’s important to mark them as such. If I'm a careful developer, >>> then it is absolutely important to me to know that I’m using a PRNG with a >>> cryptographically secure algorithm, and that the particular implementation >>> of that algorithm is correct and secure. >>> >>> It is a *feature* of a lot of modern CSPRNGs that you can't seed them: >>> >>> You cannot seed or add entropy to std::random_device >>> >>> Although std::random_device may in practice be backed by a software CSPRNG, >>> IIUC, the intention is that it can provide access to a hardware >>> non-deterministic source when available. >>> >>> You cannot seed or add entropy to CryptGenRandom >>> You can only add entropy to /dev/(u)random >>> You can only add entropy to BSD's arc4random >>> >>> Ah, I see. I think we mean different things when we say PRNG. A PRNG is an >>> entirely deterministic algorithm; the output is non-random and the >>> algorithm itself requires no entropy. If a PRNG is seeded with a random >>> sequence of bits, its output can "appear" to be random. A CSPRNG is a PRNG >>> that fulfills certain criteria such that its output can be appropriate for >>> use in cryptographic applications in place of a truly random sequence *if* >>> the input to the CSPRNG is itself random. >>> >>> The examples you give above *incorporate* a CSPRNG, environment entropy, >>> and a set of rules about when to mix in additional entropy in order to >>> produce output indistinguishable from a random sequence, but they are *not* >>> themselves really *pseudorandom* generators because they are not >>> deterministic. Not only do such sources of random numbers not require an >>> interface to allow seeding, they do not even have to be publicly >>> instantiable: Swift need only expose a single thread-safe instance (or an >>> instance per thread) of a single type that provides access to >>> CryptGenRandom/urandom/arc4random, since after all the output of multiple >>> instances of that type should be statistically indistinguishable from the >>> output of only one. >>> >>> What I was trying to respond to, by contrast, is the design of a hierarchy >>> of protocols CSPRNG : PRNG (or, in Alejandro's proposal, UnsafeRandomSource >>> : RandomSource) and the appropriate APIs to expose on each. This is >>> entirely inapplicable to your examples. It stands to reason that a >>> non-instantiable source of random numbers does not require a protocol of >>> its own (a hypothetical RNG : CSPRNG), since there is no reason to >>> implement (if done correctly) more than a single publicly non-instantiable >>> singleton type that could conform to it. For that matter, the concrete type >>> itself probably doesn't need *any* public API at all. Instead, extensions >>> to standard library types such as Int that implement conformance to the >>> protocol that Alejandro names "Randomizable" could call internal APIs to >>> provide all the necessary functionality, and third-party types that need to >>> conform to "Randomizable" could then in turn use `Int.random()` or >>> `Double.random()` to implement their own conformance. In fact, the concrete >>> random number generator type doesn't need to be public at all. All public >>> interaction could be through APIs such as `Int.random()`. >>> >>> >>> Just because we can expose a seed interface doesn't mean we should, and in >>> this case I believe that it would go against the prime objective of >>> providing secure random numbers. >>> >>> >>> If we're talking about a Swift interface to a non-deterministic source of >>> random numbers like urandom or arc4random, then, as I write above, not only >>> do I agree that it doesn't need to be seedable, it also does not need to be >>> instantiable at all, does not need to conform to a protocol that >>> specifically requires the semantics of a non-deterministic source, does not >>> need to expose any public interface whatsoever, and doesn't itself even >>> need to be public. (Does it even need to be a type, as opposed to simply a >>> free function?) >>> >>> In fact, having reasoned through all of this, we can split the design task >>> into two. The most essential part, which definitely should be part of the >>> stdlib, would be an internal interface to a cryptographically secure >>> platform-specific entropy source, a public protocol named something like >>> Randomizable (to be bikeshedded), and the appropriate implementations on >>> Boolean, binary integer, and floating point types to conform them to >>> Randomizable so that users can write `Bool.random()` or `Int.random()`. The >>> second part, which can be a separate proposal or even a standalone core >>> library or third-party library, would be the protocols and concrete types >>> that implement pseudorandom number generators, allowing for reproducible >>> pseudorandom sequences. In other words, instead of PRNGs and CSPRNGs being >>> the primitives on which `Int.random()` is implemented; `Int.random()` >>> should be the standard library primitive which allows PRNGs and CSPRNGs to >>> be seeded. >>>> If your attacker can observe your seeding once, chances are that they can >>>> observe your reseeding too; then, they can use their own implementation of >>>> the PRNG (whether CSPRNG or non-CSPRNG) and reproduce your pseudorandom >>>> sequence whether or not Swift exposes any particular API. >>> >>> On Linux, the random devices are initially seeded with machine-specific but >>> rather invariant data that makes /dev/urandom spit out predictable numbers. >>> It is considered "seeded" after a root process writes POOL_SIZE bytes to >>> it. On most implementations, this initial seed is stored on disk: when the >>> computer shuts down, it reads POOL_SIZE bytes from /dev/urandom and saves >>> it in a file, and the contents of that file is loaded back into >>> /dev/urandom when the computer starts. A scenario where someone can read >>> that file is certainly not less likely than a scenario where /dev/urandom >>> was deleted. That doesn't mean that they have kernel code execution or that >>> they can pry into your process, but they have a good shot at guessing your >>> seed and subsequent RNG results if no stirring happens. >>> >>> Sorry, I don't understand what you're getting at here. Again, I'm talking >>> about deterministic algorithms, not non-deterministic sources of random >>> numbers. >>> >>>> Secondly, I see no reason to justify the notion that, simply because a >>>> PRNG is cryptographically secure, we ought to hide the seeding initializer >>>> (because one has to exist internally anyway) from the public. Obviously, >>>> one use case for a deterministic PRNG is to get reproducible sequences of >>>> random-appearing values; this can be useful whether the underlying >>>> algorithm is cryptographically secure or not. There are innumerably many >>>> ways to use data generated from a CSPRNG in non-cryptographically secure >>>> ways and omitting or including a public seeding initializer does not >>>> change that; in other words, using a deterministic seed for a CSPRNG would >>>> be a bad idea in certain applications, but it's a deliberate act, and >>>> someone who would mistakenly do that is clearly incapable of *using* the >>>> output from the PRNG in a secure way either; put a third way, you would be >>>> hard pressed to find a situation where it's true that "if only Swift had >>>> not made the seeding initializer public, this author would have written >>>> secure code, but instead the only security hole that existed in the code >>>> was caused by the availability of a public seeding initializer mistakenly >>>> used." The point of having both explicitly instantiable PRNGs and a layer >>>> of simpler APIs like "Int.random()" is so that the less experienced user >>>> can get the "right thing" by default, and the experienced user can >>>> customize the behavior; any user that instantiates his or her own >>>> ChaCha20Random instance is already calling for the power user interface; >>>> it is reasonable to expose the underlying primitive operations (such as >>>> seeding) so long as there are legitimate uses for it. >>> >>> Nothing prevents us from using the same algorithm for a CSPRNG that is >>> safely pre-seeded and a PRNG that people seed themselves, mind you. >>> However, especially when it comes to security, there is a strong >>> responsibility to drive developers into a pit of success: the most obvious >>> thing to do has to be the right one, and suggesting to >>> cryptographically-unaware developers that they have everything they need to >>> manage their own seed is not a step in that direction. >>> >>> I'm not opposed to a ChaCha20Random type; I'm opposed to explicitly calling >>> it cryptographically-secure, because it is not unless you know what to do >>> with it. It is emphatically not far-fetched to imagine a developer who >>> thinks that they can outdo the standard library by using their own >>> ChaCha20Random instance after it's been seeded with time() if we let them >>> know that it's "cryptographically secure". If you're a power user and you >>> don't like the default, known-good CSPRNG, then you're hopefully good >>> enough to know that ChaCha20 is considered a cryptographically-secure >>> algorithm without help labels from the language, and you know how to >>> operate it. >>> >>>> I'm fully aware of the myths surrounding /dev/urandom and /dev/random. >>>> /dev/urandom might never run out, but it is also possible for it not to be >>>> initialized at all, as in the case of some VM setups. In some older >>>> versions of iOS, /dev/[u]random is reportedly sandboxed out. On systems >>>> where it is available, it can also be deleted, since it is a file. The >>>> point is, all of these scenarios cause an error during seeding of a >>>> CSPRNG. The question is, how to proceed in the face of inability to access >>>> entropy. We must do something, because we cannot therefore return a >>>> cryptographically secure answer. Rare trapping on invocation of >>>> Int.random() or permanently waiting for a never-to-be-initialized >>>> /dev/urandom would be terrible to debug, but returning an optional or >>>> throwing all the time would be verbose. How to design this API? >>> >>> If the only concern is that the system might not be initialized enough, I'd >>> say that whatever returns an instance of a global, framework-seeded CSPRNG >>> should return an Optional, and the random methods that use the global >>> CSPRNG can trap and scream that the system is not initialized enough. If >>> this is a likely error for you, you can check if the CSPRNG exists or not >>> before jumping. >>> >>> Also note that there is only one system for which Swift is officially >>> distributed (Ubuntu 14.04) on which the only way to get entropy from the OS >>> is to open a random device and read from it. >>> >>> Again, I'm not only talking about urandom. As far as I'm aware, every API >>> to retrieve cryptographically secure sequences of random bits on every >>> platform for which Swift is distributed can potentially return an error >>> instead of random bits. The question is, what design for our API is the >>> most sensible way to deal with this contingency? On rethinking, I do >>> believe that consistently returning an Optional is the best way to go about >>> it, allowing the user to either (a) supply a deterministic fallback; (b) >>> raise an error of their own choosing; or (c) trap--all with a minimum of >>> fuss. This seems very Swifty to me. >>> >>> >>>>> * What should the default CSPRNG be? There are good arguments for using a >>>>> cryptographically secure device random. (In my proposed implementation, >>>>> for device random, I use Security.framework on Apple platforms (because >>>>> /dev/urandom is not guaranteed to be available due to the sandbox, IIUC). >>>>> On Linux platforms, I would prefer to use getrandom() and avoid using >>>>> file system APIs, but getrandom() is new and unsupported on some versions >>>>> of Ubuntu that Swift supports. This is an issue in and of itself.) Now, a >>>>> number of these facilities strictly limit or do not guarantee >>>>> availability of more than a small number of random bytes at a time; they >>>>> are recommended for seeding other PRNGs but *not* as a routine source of >>>>> random numbers. Therefore, although device random should be available to >>>>> users, it probably shouldn’t be the default for the Swift standard >>>>> library as it could have negative consequences for the system as a whole. >>>>> There follows the significant task of implementing a CSPRNG correctly and >>>>> securely for the default PRNG. >>>> >>>> Theo give a talk a few years ago >>>> <https://www.youtube.com/watch?v=aWmLWx8ut20> on randomness and how these >>>> problems are approached in LibreSSL. >>>> >>>> Certainly, we can learn a lot from those like Theo who've dealt with the >>>> issue. I'm not in a position to watch the talk at the moment; can you >>>> summarize what the tl;dr version of it is? >>> >>> I saw it three years ago, so I don't remember all the details. The gist is >>> that: >>> >>> OpenBSD's random is available from extremely early in the boot process with >>> reasonable entropy >>> LibreSSL includes OpenBSD's arc4random, and it's a "good" PRNG (which >>> doesn't actually use ARC4) >>> That implementation of arc4random is good because it is fool-proof and it >>> has basically no failure mode >>> Stirring is good, having multiple components take random numbers from the >>> same source probably makes results harder to guess too >>> Getrandom/getentropy is in all ways better than reading from random devices >>> >>> Vigorously agree on all points. Thanks for the summary. >>> >
_______________________________________________ swift-evolution mailing list [email protected] https://lists.swift.org/mailman/listinfo/swift-evolution
