Hi Samir, this sounds like an interesting summer project.
Since you are interested in using PCSA, our work on privacy-preserving statistics, which actually develops a privacy-enhanced version of PCSA, might be helpful. We also propose it as a way to collect distributed statistics. In our HotPETs paper [1], we sketch the basic idea. In our journal paper [2], we provide additional details on the algorithm. If you have any questions, just let me know. Cheers, Florian. [1] https://petsymposium.org/2011/papers/hotpets11-final5Tschorsch.pdf [2] https://www.sciencedirect.com/science/article/pii/S1389128613001941 > On 30. Mar 2017, at 03:45, samir menon <menon.sa...@gmail.com> wrote: > > Hi there! > > I'm Samir, a Computer Science student at Stanford University, with a > focus in applied cryptography and computer security. This summer, I > want to work (through GSoC) on computing usage statistics without > keeping IP addresses in memory (see tickets #7532 and #15469) [1] [2]. > > Currently, we keep sets of IP's (or hashed IP's) in memory so that we > can compute the number of unique client connections. This has been > pointed out as a pretty serious concern, because the IP's themselves > are sensitive info that we don't want an attacker to acquire, but the > statistics are relatively valuable. > > As Nick first pointed out in #15469, we can use proven techniques to > compute these statistics without actually explicitly storing any IP's > (or IP hashes) in memory. The technique I want to use, "Probabilistic > Counting with Stochastic Averaging", or PCSA, is relatively > well-studied, and can provide good estimates (<5% error) of the number > of unique elements in a time series. > > The basic idea is to count the number of 0's before the least > significant 1 in every (Jenkins hashed) IP, and then recognize that > the more unique IP's we encounter, the more likely it is that we see a > hashed IP with a large number of 0's before the least significant 1. > (Shoutout to Jaskaran and [3] for helping me understand this). A more > detailed explanation and more resources for understanding PCSA are in > the proposal. > > Here is my draft proposal (also attached, but links don't work): > http://stanford.edu/~samir2/TorGSoCApplication.html > > I'd love to hear feedback on it - what's feasible, what's most useful, > and what I should focus on, etc. You can also chat with me about it on > IRC at `samir2`! > > Thanks, > ~Samir Menon > menon.sa...@gmail.com > Stanford University, B.S. Computer Science, 2019 > > [1] https://trac.torproject.org/projects/tor/ticket/7532 > [2] https://trac.torproject.org/projects/tor/ticket/15469 > [3] https://www.cs.princeton.edu/~rs/talks/AC11-Cardinality.pdf > <TorGSoCAnonymousLocalStats.pdf>_______________________________________________ > tor-dev mailing list > tor-dev@lists.torproject.org > https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev _______________________________________________ tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev