[tcpdump-workers] IP Address Anonymization Feature in tcpdump

2024-06-10 Thread Alberto Perez Bogantes via tcpdump-workers
--- Begin Message ---
Hello tcpdump workers,

We've been working on adding a new feature to tcpdump that will allow IP
address anonymization via the Crypto-PAn (Cryptography-based
Prefix-preserving Anonymization) approach. The feature we’re adding to
tcpdump is motivated by the importance of preserving user privacy and
complying with data processing security regulations. The Crypto-PAn
anonymization approach keeps the original IP addresses' prefixes while
anonymizing the suffixes, preserving the network structure.

The goal of this email is to poll the interest of the tcpdump community in
merging this feature once it’s complete, and to get in touch with potential
reviewers of our patch.

We are aware that there are external tools that seek a similar goal, as
discussed in PR #615 (https://github.com/the-tcpdump-group/tcpdump/pull/615).
However, the anonymization methods used by these tools often fall short to
achieve a balance between privacy and preserving data utility. For example,
Black Marker sets the IP to all zeros, resulting in a complete loss of
utility. Permutation can distort the original data distribution, resulting
in skewed results and lower analytical value. Similarly, traditional IP
randomization methods frequently treat each octet independently, omitting
the importance of preserving the hierarchical structure of IP addresses and
compromising the integrity of network analysis and management.

For this reason, we believe that the best approach is to use
prefix-preserving anonymization techniques, which are similar to
permutation techniques but preserve the prefixes. The mapping is kept
consistent using cryptographic keys, which addresses the issue of balancing
privacy and utility in data anonymization.

We believe that this functionality is well suited for tcpdump because much
of the logic used to print an IP address for a specific packet can be
reused to access that IP and anonymize it. The logic for dissecting packet
headers can be slightly adapted to implement this feature, including
anonymization of application headers. For example, much of the code written
to print an IP address offered by DHCP can be used to access that address
and anonymize it.

We have an early prototype of this patch. The feature we’re adding uses the
cryptopANT library. This library provides a comprehensive set of
anonymization functions designed for IPv4 and IPv6 addresses. With the
addition of a new flag, "--anon," users enable IP address anonymization in
tcpdump by providing a key file that will be used by the Crypto-PAn
anonymization algorithm.

Here's a brief overview of how the implementation works:

1.  Activation Flag: Users can activate the anonymization feature by
using the "--anon" flag along with tcpdump commands.

2.  Key File: A key file containing the encryption key required for the
Crypto-PAn algorithm must be provided as an input parameter alongside the
"--anon" flag.

3.  Callback Invocation: When the "loop_pcap" function acquires a
packet, the designated callback method responsible for anonymizing IP
addresses is invoked. This method anonymizes the IP addresses in the packet
headers.

4.  Execution of Real Callback: Following anonymization, the "real
callback" is triggered. This callback can do current implemented actions
such as dumping packet contents, writing contents to a pcap file, etc.

An example of how to use this flag is: ./tcpdump --anon keyfile.txt -n where,
keyfile.txt is a file containing the key produced by cryptopANT using
scramble_ips
--newkey keyfile.txt.

Currently, we have implemented support for anonymizing IPv4 addresses. Our
roadmap includes extending support to accommodate additional anonymization
methods, and enabling users to specify anonymization parameters dynamically.

I am sharing my GitHub project (https://github.com/aperezb21/tcpdump),
which is forked from commit bb704ed32d770e84fdc340de8276c261bb6e9ee1,
containing the current prototype. We welcome any discussion or feedback,
both on or off-list.

Thank you,

Alberto.
--- End Message ---
___
tcpdump-workers mailing list -- tcpdump-workers@lists.tcpdump.org
To unsubscribe send an email to tcpdump-workers-le...@lists.tcpdump.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[tcpdump-workers] Re: [Ext] Re: IP Address Anonymization Feature in tcpdump

2024-11-05 Thread Alberto Perez Bogantes via tcpdump-workers
--- Begin Message ---
Thank you for watching the video; I hope it helped clarify the proposal.

You are correct that a MAC address is a piece of personal information. One
possible approach is to randomize MAC addresses, which is easier than
pseudonymizing an IP address. Besides, following the convention used in
cryptopANT (the library we use for address pseudonymization), we don't hide
the MAC addresses (since cryptopANT is used in settings where layer 2
headers are usually stripped out), but extending it to the Ethernet header
is planned for future work.

Regarding the intrusive nature of the changes and their extent beyond
anonymization, we initially considered utilizing the existing print
statements and applying anonymization just before the IP address was
printed. However, we ran into a problem when the packet was dumped in
hexadecimal or written to a pcap file, as the anonymization did not take
effect because the print statements weren’t executed.  One solution we came
up with is to program it so that when a packet is flagged for printing,
anonymization is executed directly from the print flow. If the flags
indicate that the packet is being dumped or written to another pcap, the
"centralized version" of the preprocessing will be executed instead. We
would like to know if there are any other methods to tackle this issue.
As for the whitespaces, this code is a kind of proof of concept to assess
whether this idea could fit within tcpdump. The commits, whitespaces, etc.,
can be corrected to adhere to the best programming standards for tcpdump.

Regards,
Alberto.

On Wed, Oct 16, 2024 at 10:28 PM Denis Ovsienko  wrote:

> On Wed, 16 Oct 2024 19:55:41 +0100
> Denis Ovsienko  wrote:
>
> > and Ethernet
> > OUI is always 48 bit long
>
> 24 bits long, of course.  Half the MAC address is OUI, not the entire
> address.  Which may or may not make the mapping easier to implement,
> but that's not the point.
>
> --
> Denis Ovsienko
> ___
> tcpdump-workers mailing list -- tcpdump-workers@lists.tcpdump.org
> To unsubscribe send an email to tcpdump-workers-le...@lists.tcpdump.org
> %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
>
--- End Message ---
___
tcpdump-workers mailing list -- tcpdump-workers@lists.tcpdump.org
To unsubscribe send an email to tcpdump-workers-le...@lists.tcpdump.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s