Hi Tobi,

I managed an MSP for more than 15 years, we moved a lot of email as well, so I 
feel your pain.

However, in all cases (about a hand-full that I can recall over that time) 
where we had real reachability issues, we routed the other AS using a different 
network path. In BGP speak: we preferred a different next-hop for that 
particular AS or prefix. This is still my advise to fix your problem and to be 
honest: it’s the only real fix. Because once you’ve solved the resolving 
problem, next up is the mail delivery problem: if you have issues reaching the 
nameservers of that particular network, you’re going to have issues reaching 
the mailservers as well.

If you do want to solve at what I still feel is the incorrect place, and you 
don’t have the list of domain names only the ip-addresses of the nameservers, 
then things get complicated, as you need 2 dns requests, and that’s not 
something dnsdist would easily fix.

What you might do, but again, this is very ugly and will bite you some day:

I am just freewheeling here, no idea if this will actually work, there could be 
design flaws in here, disclaimer, yadayadayada. Imagine your primary resolver 
has ip 10.10.10.10, your backup resolver has ip 10.200.200.200, and 
192.168.123.123 is the “problem ip” of the remote auth server.

Setup:
-  ip 10.10.10.10 on eth0, pdns_recursor binds on port 53 of this ip and of 
localhost
- add 10.10.10.11 as alias, dnsdist binds on port 53 of this ip. Make sure 
dnsdist uses ip 10.10.10.11 for all outbound ip connections
- add an iptables dNAT rule to rewrite all packets with source ip 10.10.10.10, 
destination ip 192.168.123.123, destination port 53 to destination ip 
10.10.10.11 


Flow:
- have the query arrive at the resolver ip
- resolver will do it’s job, and notice that the auth NS has ip 192.168.123.123
- resolver dispatches the Q to the dnsdist on 192.168.123.123, which get 
rewritten to 10.10.10.11, our local dnsdist
- dnsdist then dispatches the Q to 10.200.200.200 (you’d probably need to 
fiddle with the flags at this point).

But again: this is ugly, very ugly, and I feel it’s the worst solution to your 
problem, in my 20+ years experience as a network engineer and MSP-manager.

You might also get away with longer TTLs on the problematic NS records?

Frank Louwers
Certified PowerDNS Consultant @ Kiwazo.be

> On 21 May 2019, at 07:51, Tobi <jahli...@gmx.ch> <jahli...@gmx.ch> wrote:
> 
> Brian
> 
>> In any case, it's the responsibility of the authoritative domain owner
>> to host their domain on at least two different ASes (RFC 2182), if
>> they care about people being able to resolve it.
> 
> Full agree with that, but our customer is not interested why he cannot
> send a mail to the other end of the world. It just needs to work :-) We
> had such problems where after a 5 day investigation by our provider they
> found out that such a BGP issue occured somewhere in the world with
> their peering partner.
> 
>> An authoritative server with that sort of limit, such as could affect
>> a single end-user site, would be completely broken IMO.
> 
> who said it's concerning my homebrew dns server? That issue occured on
> our resolvers at the company where I work. We're working in email
> filtering buissiness and we have quite a lot of dns queries per day.
> 
> Frank
> 
>> Note that the second reason you mention (src address rate limiting)
>> won’t be fixed by implementing this solution…
> 
> true, not fixed as in "not occur anymore" but fixed as in "more than one
> src address --> more queries in total before per SRC address limits kick in"
> 
> 
>> If you *do* want to solve it at the configuration layer: do you have a
>> list of domains that should use the other resolver?
> 
> thats our "problem": we only have the IP address(es) of the authorative
> nameservers we want to reach via the 2nd resolver.
> 
> 
> Cheers
> 
> --
> 
> tobi
> 
> Am 20.05.19 um 20:43 schrieb Brian Candler:
>> On 20/05/2019 17:57, Tobi <jahli...@gmx.ch> wrote:
>>> - BGP routing issues (ex from Provider 1 you can reach target and from
>>> provider 2 not)
>> 
>> That happens, but very rarely in my experience.  In any case, it's the
>> responsibility of the authoritative domain owner to host their domain on
>> at least two different ASes (RFC 2182), if they care about people being
>> able to resolve it.
>> 
>>> - per SRC limits on the recipient side
>> 
>> An authoritative server with that sort of limit, such as could affect a
>> single end-user site, would be completely broken IMO.
>> 
>> If you can replicate this issue, then I think it would be worth drilling
>> down further with tests to prove or disprove these theories.  It sounds
>> more likely that the problem is local to you, either in your network, or
>> with your upstream provider - especially if this affects a wide range of
>> domains and not just a specific few.  However, routing issues in your
>> part of the world may be different to what I see here (in the UK).
>> 
> _______________________________________________
> Pdns-users mailing list
> Pdns-users@mailman.powerdns.com
> https://mailman.powerdns.com/mailman/listinfo/pdns-users

_______________________________________________
Pdns-users mailing list
Pdns-users@mailman.powerdns.com
https://mailman.powerdns.com/mailman/listinfo/pdns-users

Reply via email to