On 09/03/2022 07:08, Daniel Miller via Pdns-users wrote:
Anyway, after all that - when I make a change to a domain record using pdnsutil or an external tool using the API - the changes are immediately applied to the zone but are not immediately visible through the recursor. To make that happen I need to either flush the cache or just restart the recursor.

This is an issue when creating/updating ACME challenge records - I haven't been able to totally automate the process. I need to introduce lengthy delays, try manually applying the changes, restart the servers, whatever.

That doesn't really make sense as an explanation of whatever problem you see.

1. LetsEncrypt will be talking to your authoritative server, not your recursor.

2. Even if it were talking to the recursor, it would be querying _acme-challenge.somedomain TXT. Unless that query had been made recently, it won't be in the recursor's cache.

If you're hitting a caching problem here, it's not to do with the recursor, but either the packet cache or the query cache in pdns-authoritative. See: https://doc.powerdns.com/authoritative/performance.html#packet-cache

If LetsEncrypt had queried _acme-challenge.somedomain TXT a few seconds before you had changed the zone, and then again afterwards, it could see the old data. However, that shouldn't be happening: you should be inserting the TXT record *before* LetsEncrypt does the query. Therefore, although you can disable those caches, you shouldn't really need to do so.

The most likely problem I can think of is that your authoritative zones are replicated, and there's some delay in updates to the primary getting replicated to the secondaries.  Remember that LetsEncrypt could query *any* of your auth nameservers with equal probability.

One solution is to ensure that notifies are working properly, and then insert a short (say 5 second) delay in your ACME process to ensure it has had time to complete.

Another solution is to get LetsEncrypt to talk to a single instance, by putting a single NS record wherever you need:

_acme-challenge.www.example.com.  NS  ns-primary.example.com.

If you wish, this approach also lets you have a completely separate authoritative server, dedicated to handling ACME challenges. That in turn can be something that accepts dynamic updates, without having to allow dynamic updates on your main infrastructure.

If you need to debug this further, I suggest you capture the data between LetsEncrypt and your authoritative servers, with query logging or at worst using tcpdump, to work out what's going on.



is there a way to make changes in the auth server immediately visible in the recursor?

You mean, clients using your local recursor are querying local zones and seeing stale data? That's a completely different matter: that's just standard recursor caching, and it's how the DNS is designed.

You can avoid that by setting a low TTL on the records in your zone, and for negative caching using the "minimum" parameter in the SOA record.  In the extreme, you'd set those to zero, and then the recursor would directly forward all queries to the authoritative server - but something like 60 seconds is more system friendly.  You might as well get *some* benefit from the recursor cache.

Or else, whenever you bump the auth zone, you can flush the corresponding recursor zone - but that's a step you'd have to do yourself.

_______________________________________________
Pdns-users mailing list
Pdns-users@mailman.powerdns.com
https://mailman.powerdns.com/mailman/listinfo/pdns-users

Reply via email to