Riffing off this a bit… I’d like to see minimal Random support in the stdlib, and then all this specialization stuff in a “non-standard” library. Ie, a library that ships with Swift, but is not imported by default.
As I’m developing apps, I don’t need the massive autocompletion overload and cognitive overhead that comes from trying to understand all these proposed protocols and use-cases unless I am actually going to be needing randomization. If I need randomization, I should be explicitly opting-in to it by doing “import Random”. Dave > On Oct 3, 2017, at 10:31 PM, Alejandro Alonso via swift-evolution > <[email protected]> wrote: > > I really like the schedule here. After reading for a while, I do agree with > Brent that stdlib should very primitive in functionality that it provides. I > also agree that the most important part right now is designing the internal > crypto on which the numeric types use to return their respected random > number. On the discussion of how we should handle not enough entropy with the > device random, from a users perspective it makes sense that calling .random > should just give me a random number, but from a developers perspective I see > Optional being the best choice here. While I think blocking could, in most > cases, provide the user an easier API, we have to do this right and be safe > here by providing a value that indicates that there is room for error here. > As for the generator abstraction, I believe there should be a bare basic > protocol that sets a layout for new generators and should be focusing on its > requirements. > > Whether or not RandomAccessCollection and MutableCollection should get > .random and .shuffle/.shuffled in this first proposal is completely up in the > air for me. It makes sense, to me, to include the .random in this proposal > and open another one .shuffle/.shuffled, but I can see arguments that should > say we create something separate for these two, or include all of it in this > proposal. > > - Alejandro > > On Sep 27, 2017, 7:29 PM -0500, Xiaodi Wu <[email protected]>, wrote: >> >> On Wed, Sep 27, 2017 at 00:18 Félix Cloutier <[email protected] >> <mailto:[email protected]>> wrote: >>> Le 26 sept. 2017 à 16:14, Xiaodi Wu <[email protected] >>> <mailto:[email protected]>> a écrit : >>> >> >>> On Tue, Sep 26, 2017 at 11:26 AM, Félix Cloutier <[email protected] >>> <mailto:[email protected]>> wrote: >>> >>> It's possible to use a CSPRNG-grade algorithm and seed it once to get a >>> reproducible sequence, but when you use it as a CSPRNG, you typically feed >>> entropy back into it at nondeterministic points to ensure that even if you >>> started with a bad seed, you'll eventually get to an alright state. Unless >>> you keep track of when entropy was mixed in and what the values were, >>> you'll never get a reproducible CSPRNG. >>> >>> We would give developers a false sense of security if we provided them with >>> CSPRNG-grade algorithms that we called CSPRNGs and that they could seed >>> themselves. Just because it says "crypto-secure" in the name doesn't mean >>> that it'll be crypto-secure if it's seeded with time(). Therefore, >>> "reproducible" vs "non-reproducible" looks like a good distinction to me. >>> >>> I disagree here, in two respects: >>> >>> First, whether or not a particular PRNG is cryptographically secure is an >>> intrinsic property of the algorithm; whether it's "reproducible" or not is >>> determined by the published API. In other words, the distinction between >>> CSPRNG vs. non-CSPRNG is important to document because it's semantics that >>> cannot be deduced by the user otherwise, and it is an important one for >>> writing secure code because it tells you whether an attacker can predict >>> future outputs based only on observing past outputs. "Reproducible" in the >>> sense of seedable or not is trivially noted by inspection of the published >>> API, and it is rather immaterial to writing secure code. >> >> >> Cryptographically secure is not a property that I'm comfortable applying to >> an algorithm. You cannot say that you've made a cryptographically secure >> thing just because you've used all the right algorithms: you also have to >> use them right, and one of the most critical components of a >> cryptographically secure PRNG is its seed. >> >> A cryptographically secure algorithm isn’t sufficient, but it is necessary. >> That’s why it’s important to mark them as such. If I'm a careful developer, >> then it is absolutely important to me to know that I’m using a PRNG with a >> cryptographically secure algorithm, and that the particular implementation >> of that algorithm is correct and secure. >> >> It is a *feature* of a lot of modern CSPRNGs that you can't seed them: >> >> You cannot seed or add entropy to std::random_device >> >> Although std::random_device may in practice be backed by a software CSPRNG, >> IIUC, the intention is that it can provide access to a hardware >> non-deterministic source when available. >> >> You cannot seed or add entropy to CryptGenRandom >> You can only add entropy to /dev/(u)random >> You can only add entropy to BSD's arc4random >> >> Ah, I see. I think we mean different things when we say PRNG. A PRNG is an >> entirely deterministic algorithm; the output is non-random and the algorithm >> itself requires no entropy. If a PRNG is seeded with a random sequence of >> bits, its output can "appear" to be random. A CSPRNG is a PRNG that fulfills >> certain criteria such that its output can be appropriate for use in >> cryptographic applications in place of a truly random sequence *if* the >> input to the CSPRNG is itself random. >> >> The examples you give above *incorporate* a CSPRNG, environment entropy, and >> a set of rules about when to mix in additional entropy in order to produce >> output indistinguishable from a random sequence, but they are *not* >> themselves really *pseudorandom* generators because they are not >> deterministic. Not only do such sources of random numbers not require an >> interface to allow seeding, they do not even have to be publicly >> instantiable: Swift need only expose a single thread-safe instance (or an >> instance per thread) of a single type that provides access to >> CryptGenRandom/urandom/arc4random, since after all the output of multiple >> instances of that type should be statistically indistinguishable from the >> output of only one. >> >> What I was trying to respond to, by contrast, is the design of a hierarchy >> of protocols CSPRNG : PRNG (or, in Alejandro's proposal, UnsafeRandomSource >> : RandomSource) and the appropriate APIs to expose on each. This is entirely >> inapplicable to your examples. It stands to reason that a non-instantiable >> source of random numbers does not require a protocol of its own (a >> hypothetical RNG : CSPRNG), since there is no reason to implement (if done >> correctly) more than a single publicly non-instantiable singleton type that >> could conform to it. For that matter, the concrete type itself probably >> doesn't need *any* public API at all. Instead, extensions to standard >> library types such as Int that implement conformance to the protocol that >> Alejandro names "Randomizable" could call internal APIs to provide all the >> necessary functionality, and third-party types that need to conform to >> "Randomizable" could then in turn use `Int.random()` or `Double.random()` to >> implement their own conformance. In fact, the concrete random number >> generator type doesn't need to be public at all. All public interaction >> could be through APIs such as `Int.random()`. >> >> >> Just because we can expose a seed interface doesn't mean we should, and in >> this case I believe that it would go against the prime objective of >> providing secure random numbers. >> >> >> If we're talking about a Swift interface to a non-deterministic source of >> random numbers like urandom or arc4random, then, as I write above, not only >> do I agree that it doesn't need to be seedable, it also does not need to be >> instantiable at all, does not need to conform to a protocol that >> specifically requires the semantics of a non-deterministic source, does not >> need to expose any public interface whatsoever, and doesn't itself even need >> to be public. (Does it even need to be a type, as opposed to simply a free >> function?) >> >> In fact, having reasoned through all of this, we can split the design task >> into two. The most essential part, which definitely should be part of the >> stdlib, would be an internal interface to a cryptographically secure >> platform-specific entropy source, a public protocol named something like >> Randomizable (to be bikeshedded), and the appropriate implementations on >> Boolean, binary integer, and floating point types to conform them to >> Randomizable so that users can write `Bool.random()` or `Int.random()`. The >> second part, which can be a separate proposal or even a standalone core >> library or third-party library, would be the protocols and concrete types >> that implement pseudorandom number generators, allowing for reproducible >> pseudorandom sequences. In other words, instead of PRNGs and CSPRNGs being >> the primitives on which `Int.random()` is implemented; `Int.random()` should >> be the standard library primitive which allows PRNGs and CSPRNGs to be >> seeded. >>> If your attacker can observe your seeding once, chances are that they can >>> observe your reseeding too; then, they can use their own implementation of >>> the PRNG (whether CSPRNG or non-CSPRNG) and reproduce your pseudorandom >>> sequence whether or not Swift exposes any particular API. >> >> On Linux, the random devices are initially seeded with machine-specific but >> rather invariant data that makes /dev/urandom spit out predictable numbers. >> It is considered "seeded" after a root process writes POOL_SIZE bytes to it. >> On most implementations, this initial seed is stored on disk: when the >> computer shuts down, it reads POOL_SIZE bytes from /dev/urandom and saves it >> in a file, and the contents of that file is loaded back into /dev/urandom >> when the computer starts. A scenario where someone can read that file is >> certainly not less likely than a scenario where /dev/urandom was deleted. >> That doesn't mean that they have kernel code execution or that they can pry >> into your process, but they have a good shot at guessing your seed and >> subsequent RNG results if no stirring happens. >> >> Sorry, I don't understand what you're getting at here. Again, I'm talking >> about deterministic algorithms, not non-deterministic sources of random >> numbers. >> >>> Secondly, I see no reason to justify the notion that, simply because a PRNG >>> is cryptographically secure, we ought to hide the seeding initializer >>> (because one has to exist internally anyway) from the public. Obviously, >>> one use case for a deterministic PRNG is to get reproducible sequences of >>> random-appearing values; this can be useful whether the underlying >>> algorithm is cryptographically secure or not. There are innumerably many >>> ways to use data generated from a CSPRNG in non-cryptographically secure >>> ways and omitting or including a public seeding initializer does not change >>> that; in other words, using a deterministic seed for a CSPRNG would be a >>> bad idea in certain applications, but it's a deliberate act, and someone >>> who would mistakenly do that is clearly incapable of *using* the output >>> from the PRNG in a secure way either; put a third way, you would be hard >>> pressed to find a situation where it's true that "if only Swift had not >>> made the seeding initializer public, this author would have written secure >>> code, but instead the only security hole that existed in the code was >>> caused by the availability of a public seeding initializer mistakenly >>> used." The point of having both explicitly instantiable PRNGs and a layer >>> of simpler APIs like "Int.random()" is so that the less experienced user >>> can get the "right thing" by default, and the experienced user can >>> customize the behavior; any user that instantiates his or her own >>> ChaCha20Random instance is already calling for the power user interface; it >>> is reasonable to expose the underlying primitive operations (such as >>> seeding) so long as there are legitimate uses for it. >> >> Nothing prevents us from using the same algorithm for a CSPRNG that is >> safely pre-seeded and a PRNG that people seed themselves, mind you. However, >> especially when it comes to security, there is a strong responsibility to >> drive developers into a pit of success: the most obvious thing to do has to >> be the right one, and suggesting to cryptographically-unaware developers >> that they have everything they need to manage their own seed is not a step >> in that direction. >> >> I'm not opposed to a ChaCha20Random type; I'm opposed to explicitly calling >> it cryptographically-secure, because it is not unless you know what to do >> with it. It is emphatically not far-fetched to imagine a developer who >> thinks that they can outdo the standard library by using their own >> ChaCha20Random instance after it's been seeded with time() if we let them >> know that it's "cryptographically secure". If you're a power user and you >> don't like the default, known-good CSPRNG, then you're hopefully good enough >> to know that ChaCha20 is considered a cryptographically-secure algorithm >> without help labels from the language, and you know how to operate it. >> >>> I'm fully aware of the myths surrounding /dev/urandom and /dev/random. >>> /dev/urandom might never run out, but it is also possible for it not to be >>> initialized at all, as in the case of some VM setups. In some older >>> versions of iOS, /dev/[u]random is reportedly sandboxed out. On systems >>> where it is available, it can also be deleted, since it is a file. The >>> point is, all of these scenarios cause an error during seeding of a CSPRNG. >>> The question is, how to proceed in the face of inability to access entropy. >>> We must do something, because we cannot therefore return a >>> cryptographically secure answer. Rare trapping on invocation of >>> Int.random() or permanently waiting for a never-to-be-initialized >>> /dev/urandom would be terrible to debug, but returning an optional or >>> throwing all the time would be verbose. How to design this API? >> >> If the only concern is that the system might not be initialized enough, I'd >> say that whatever returns an instance of a global, framework-seeded CSPRNG >> should return an Optional, and the random methods that use the global CSPRNG >> can trap and scream that the system is not initialized enough. If this is a >> likely error for you, you can check if the CSPRNG exists or not before >> jumping. >> >> Also note that there is only one system for which Swift is officially >> distributed (Ubuntu 14.04) on which the only way to get entropy from the OS >> is to open a random device and read from it. >> >> Again, I'm not only talking about urandom. As far as I'm aware, every API to >> retrieve cryptographically secure sequences of random bits on every platform >> for which Swift is distributed can potentially return an error instead of >> random bits. The question is, what design for our API is the most sensible >> way to deal with this contingency? On rethinking, I do believe that >> consistently returning an Optional is the best way to go about it, allowing >> the user to either (a) supply a deterministic fallback; (b) raise an error >> of their own choosing; or (c) trap--all with a minimum of fuss. This seems >> very Swifty to me. >> >> >>>> * What should the default CSPRNG be? There are good arguments for using a >>>> cryptographically secure device random. (In my proposed implementation, >>>> for device random, I use Security.framework on Apple platforms (because >>>> /dev/urandom is not guaranteed to be available due to the sandbox, IIUC). >>>> On Linux platforms, I would prefer to use getrandom() and avoid using file >>>> system APIs, but getrandom() is new and unsupported on some versions of >>>> Ubuntu that Swift supports. This is an issue in and of itself.) Now, a >>>> number of these facilities strictly limit or do not guarantee availability >>>> of more than a small number of random bytes at a time; they are >>>> recommended for seeding other PRNGs but *not* as a routine source of >>>> random numbers. Therefore, although device random should be available to >>>> users, it probably shouldn’t be the default for the Swift standard library >>>> as it could have negative consequences for the system as a whole. There >>>> follows the significant task of implementing a CSPRNG correctly and >>>> securely for the default PRNG. >>> >>> Theo give a talk a few years ago >>> <https://www.youtube.com/watch?v=aWmLWx8ut20> on randomness and how these >>> problems are approached in LibreSSL. >>> >>> Certainly, we can learn a lot from those like Theo who've dealt with the >>> issue. I'm not in a position to watch the talk at the moment; can you >>> summarize what the tl;dr version of it is? >> >> I saw it three years ago, so I don't remember all the details. The gist is >> that: >> >> OpenBSD's random is available from extremely early in the boot process with >> reasonable entropy >> LibreSSL includes OpenBSD's arc4random, and it's a "good" PRNG (which >> doesn't actually use ARC4) >> That implementation of arc4random is good because it is fool-proof and it >> has basically no failure mode >> Stirring is good, having multiple components take random numbers from the >> same source probably makes results harder to guess too >> Getrandom/getentropy is in all ways better than reading from random devices >> >> Vigorously agree on all points. Thanks for the summary. >> > _______________________________________________ > swift-evolution mailing list > [email protected] > https://lists.swift.org/mailman/listinfo/swift-evolution
_______________________________________________ swift-evolution mailing list [email protected] https://lists.swift.org/mailman/listinfo/swift-evolution
