Hi all

I run a reasonably sized PowerDNS setup (high millions of domains across a few 
instances). So far the way I have been scaling it is working fine but I would 
like to get some addition suggestions in case I missed something. When we need 
extra capacity currently its a matter of adding a dnadist server for the front 
end or PowerDNS with MariaDB for backend

Dnsdist answers a large number of queries from cache which reduces the load 
nicely but every now and then we will get an attack which will punch through 
the caching with random subdomains and then cause a high load on the PowerDNS 
auth servers. If that occurs our strategy has been to add the domain to a pre 
defined suffix match group on dnsdist which applies stricter rate limiting 
which works well enough. We use other rules to limit QPS from prefixes of 
certain sizes which does help sometimes but for the latest attacks they seem to 
be all spoofed IP's not in any particularly easy to limit prefix.

The setup we use is:

* 2 sets of MariaDB "master" VM's (2 clusters in 2 geographically separated 
locations) which are active/active and replicate from/to each other. All write 
queries are directed to these.

* 3 PowerDNS "delayed slave" auth VM's geographically distributed, each of 
which has its own MariaDB install which acts as a read only slave to the master 
servers. These servers are configured with a replication delay for DR purposes, 
they do not normally get any traffic.

* Multiple PowerDNS auth VM's geographically distributed (in at least pairs) 
with the same setup as the delayed slave servers. They do not have any 
replication delay configured and they are the servers that receive traffic from 
dnsdist normally.

* Multiple dnsdist servers in geographically distributed areas. Queries prefer 
to be sent to the local auth servers if they are available, if not then remote 
auth servers if they are available followed by the delayed DR servers. For 
stability the IP's dnsdist listens on for queries is bound to loopback adapter 
and it is advertised to the rest of the network with bgp.

The servers are all on SSD's except 2 (waiting for hardware refresh...) With a 
reasonable amount of RAM and CPU resources. During the attacks the biggest 
bottleneck seems to be the DB. I plan on doing some simulated benchmarks 
directly on the DB to see what numbers I am getting without the overhead of 
PowerDNS parsing the quest, generating query, waiting for answer etc.

I would be curious if there is already a tool which could perform the test I 
mentioned above or if I will have to end up writing on. If I do write one my 
goal would be to run test, change setting (from MariaDB or PowerDNS) and repeat.

Also if you know of any other relevant OS related tuning or MariaDB related 
tuning that would help. I would be happy to run additional benchmarks to see 
what the impact would be and publish them later.
_______________________________________________
Pdns-users mailing list
Pdns-users@mailman.powerdns.com
https://mailman.powerdns.com/mailman/listinfo/pdns-users

Reply via email to