-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 09/07/15 05:39, Roger Dingledine wrote: > On Wed, Jul 08, 2015 at 07:45:04PM -0700, David Fifield wrote: >> I'm trying to use CollecTor data to find out how much bandwidth >> is offered by different pluggable transports over time. I.e., I >> want to be able to say something like, "On July 1, bridges with >> obfs3 offered X MB/s, bridges with obfs4 offered Y MB/s," etc. > > Great! > >> I'm having trouble because sometimes, a router digest listed in >> a bridge-network-status document is not found in the same >> tarball. > [snip] >> Here's an example of where it goes wrong. >> bridge-descriptors-2015-07/statuses/01/20150701-060138-4A0CCD2DDC7995083D73F5D667100C8A5831F16D > >> > Yeah, I'm not surprised it goes wrong, since the descriptor from > 0701-06:01 was likely published in the previous month. > >> However, I did find it in the previous month's tarball, > > Yep.
I think you picked the wrong example for something going wrong, because that descriptor is actually included in the 2015-07 tarball. But there are indeed cases when a status published in 2015-07 references a server descriptor that was published in 2015-06, and that server descriptor would be contained in the 2015-06 tarball. Example from the same status: bridge-descriptors-2015-07/statuses/01/20150701-060138-4A0CCD2DDC7995083D73F5D667100C8A5831F16D contains a line: r Unnamed ABQ4ZADwj8WkfgApkhVTFalGweU GqjwHG/sFpFzY4sx9SWuzVTcHag 2015-06-30 12:59:03 10.135.171.161 443 0 which references the following server descriptor: bridge-descriptors-2015-06/server-descriptors/1/a/1aa8f01c6fec169173638b31f525aecd54dc1da8 >> It seems rare that the bridge-server-descriptor is missing. In >> the 2015-07 tarball, it happened for 5891/477496 relays (1.2%). > [snip] >> How do you handle cases like this? I had a browse through the >> Onionoo source code, but did not quickly understand it. Onionoo typically reads descriptors from CollecTor's recent/ directory which have been published in the past 72 hours, not the tarballs in the archive/ directory that are organized by publication month. >> Should I just always include the month preceding the earliest >> month I want to process? Yes, you should do that. > How many of the 5891 cases does that resolve? If you happen to find cases which are not explained by that, please let me know. All the best, Karsten -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 Comment: GPGTools - http://gpgtools.org iQEcBAEBAgAGBQJVnjBPAAoJEJD5dJfVqbCrfjYH/1kYG9hl10sekKpfhV7y3nAq wjm/hhyz7bqz9uPJmXs9d8+rkgJBIhUGC+LWqdmmgU8VNRb4NpCq7vBO6MIRJQQG a7C3XNYRw10+Bs+jfBiE5D6z4i2rLXGDqaFkmKCEbrh6To5pqo2ziJkWUP6Y/8gH EHjsEINFB4doV2EAccAAAjN6L1cLQPLBEVVAPtN7Pm78hcNuZ9D+n8TA+XWfmOvV JG26kerEMkA2XPj3nbPvBLTYM5AMvMr/lDQpAuaSZYHb0E8DiLcVlUcaX4Y/IpY8 SqwLmheZdrFItxCH3Fd8c3hxiZ/Qs6iVZ6EPFRuqbBSOu7VLvyo7N4aXrk2bt6c= =OKle -----END PGP SIGNATURE----- _______________________________________________ tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev