Understood and agreed. I suspected there would be circuit-state to maintain. As you say, concurrent cells on the same circuit should be queued or thread-locked. I suspect thread-locking will be simple enough - the best approach.
And given it's only a problem for the biggest nodes, a design should be chosen that is efficient and focuses on achieving the goals of such users. I believe this is that efficient and focused design. On Thu, 10 Jan 2019 at 00:54, Ian Goldberg <i...@cs.uwaterloo.ca> wrote: > On Wed, Jan 09, 2019 at 08:17:15AM -0500, Ian Goldberg wrote: > > On Wed, Jan 09, 2019 at 08:42:18PM +1100, Todd Hubers wrote: > > > There are early plans to distribute crypto operations across multiple > cores > > > [https://trac.torproject.org/projects/tor/ticket/1749], but there > might be > > > a better way. > > > > > > (I registered, but I couldn't find a way to annotate the ticket, so I'm > > > emailing for now) > > > > > > The ticket states the reason being to saturate the bandwidth available > (by > > > using all the cores as efficiently as possible). > > > > > > I don't understand why a relay needs to have a "main thread". Network > > > traffic arrives as an async operation and can be sent back out > > > asynchronously. So a final strategy shouldn't have a central thread. > The > > > main thread might still be needed for startup, runtime adjustment, and > > > system upkeep, but not for the core network-crypto processing; that > should > > > never need to touch the main thread. > > > > > > The current proposal speaks about multi-threading crypto operations, > let's > > > call that "A) Speed - Speeding up processing of a single cell". > Instead, I > > > propose "B) Concurrency - Restructuring so multiple cells can be > processed > > > concurrently". > > > > > > A cell of data should arrive via IO-Completion thread on a random CPU > core, > > > have crypto transformation applied on the same one core, then be > dispatched > > > onward out via the network. This seems to be quite a simple approach > where > > > I would think crypto code can remain the same "single-threaded" > > > implementation. > > > > > > Approach [A] will have diminishing returns as the number of cores > > > increases. You can only break up a cell unit of work so much until > you're > > > encrypting one byte per cpu core. However, with approach [B], if you > have > > > millions of CPU cores (as an extreme) you can be processing millions of > > > cells concurrently. Therefore, I believe approach [B] would be more > > > scalable. > > > > > > What do you think? > > > > You'll have troubles if cells *on the same circuit* try to be processed > > in parallel on different cores, at least with the current circuit-level > > crypto. But, once circuits are established, handing each circuit to a > > different thread/core (or more clever worker structure) is something > > that I think at least boradly makes sense, and indeed I have been > > proposing to have my students work on. > > (Of course, this only is even relevant for the very highest-bandwidth > nodes; my own node, for example, running on 5-year-old hardware with no > special configuration, was pushing 400 Mbps last month, with one core > at 80%, one at 11%, one at 6%, and the rest trivially small.) > -- > Ian Goldberg > Professor and University Research Chair > Cheriton School of Computer Science > University of Waterloo > _______________________________________________ > tor-dev mailing list > tor-dev@lists.torproject.org > https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev > -- -- Todd Hubers
_______________________________________________ tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev