Hi there and have a Good day!
Andrey Sedletsky on behalf PJSC MGTS (Moscow City Telephone Network) company!

We are using your recursive DNS servers (Open Source PowerDNS recurser) and we've got a couple of questions to you (actually more). We were contacted by one of our clients with the problem of the inability to resolve the domain name "cm.taxi". From the request trace on the server, it can be seen that PowerDNS does not accept a response from an authoritative server because the AA (Authoritative Answer) flag is not set to one.

Sep 04 01:47:38 a975-icache02 pdns_recursor[2575]: Removing record 'cm.taxi|A|91.231.114.19' in the answer section without the AA bit set received from cm.taxi Sep 04 01:47:38 a975-icache02 pdns_recursor[2575]: Removing record 'cm.taxi|A|91.231.114.18' in the answer section without the AA bit set received from cm.taxi

The full log can be found in the attachment, there is also a dump file illustrating the problem. So our first question. Whether this is a normal behavior of PowerDNS Recursor and can it be changed (in general or for specific zones) ?


Also, not so long ago, we had an issue when restarting the pdns-recursor process. After the restart (around 11 am), the number of servfail responses towards clients began to increase. The load on the server at this moment was about 300 thousand requests per second. By the evening (about 22 hours), the number of servfail responses began to approach 30 percent of the total number of requests, and the call center began to receive mass appeals from subscribers about the impossibility of resolving domain names. By this time, the load has grown to 400 thousand requests per second (the standard value for the current time of day). Switching to a backup server with a similar configuration (hardware and software) did not solve the problem. It was reproduced on the backup server too.  The restart did not help either. In the end, the problem was solved by reducing the parameter max-threads=16 to eight.
In this regard, there are a number of questions.
What could be the reason for this behavior (until the problem occurred, the server was working normally for several months at the same load and with the same configuration) ? What tests should be performed to identify bottlenecks in the system and the pdns-recursor itself? What metrics should be put on monitoring to prevent the occurrence of such situations? And again in the attachment there is a screenshot illustrating the situation at that time.

One last question.
Our company would like to have commercial support for your product. Is this possible and, if so, what needs to be done for this ?
Below is the link to the attachments:
https://cloud.mail.ru/public/3y53/RzaP6z2a6

Additional information:

>rec_control version
4.3.6
> less /etc/oracle-release
Oracle Linux Server release 8.4
>2 CPUs (28 cores, 56 threads)
>128 GB RAM


PDNS was installed from EPEL Repo
grep -i process recursor.conf
# dnssec        DNSSEC mode: off/process-no-validate (default)/process/log-fail/validate
# dnssec=process-no-validate



Best Regards,
Andrey
_______________________________________________
Pdns-users mailing list
Pdns-users@mailman.powerdns.com
https://mailman.powerdns.com/mailman/listinfo/pdns-users

Reply via email to