Hi Maria,
“
What may work is this, not running on ASPA verification but as an auxiliary BGP
session check.
* During BGP session initiation, both parties MUST check whether either:
* the Customer has no ASPA record, or
* their SPAS includes the Provider’s AS. If the check fails, the BGP
session MUST be terminated immediately.
* For any established BGP session, the check MUST be repeated any time the
appropriate SPAS changes, appears or disappears. The session SHOULD be
terminated immediately if the condition is not met anymore. If not terminated,
the operators SHOULD resolve the issue as soon as possible to prevent possible
ASPA Invalids being spread out.
“
I recall that I once commented on the ASPA draft about whether it was needed to
check for conflicts between BGP role negotiation and ASPA records
(https://mailarchive.ietf.org/arch/msg/sidrops/9p0-W8zUjR730iDHVi_PsND30n4/),
which might be somewhat similar to the approach you mentioned above. However, I
would like to ask why the established BGP session should be terminated when the
check fails? The check is for preventing possible ASPA Invalids being spread
out, thereby avoiding network disruptions caused by incorrect routing discards.
If the established BGP session were terminated when the check fails, the
accessibility of the network may be directly affected.
Best,
Nan
From: Maria Matejka <[email protected]>
Sent: Sunday, July 13, 2025 11:26 PM
To: Sriram, Kotikalapudi (Fed) <[email protected]>
Cc: jia zhang <[email protected]>; [email protected]; [email protected]
Subject: [Sidrops] Re: Question: How best to deal with network operator error
in creation of ASPA?
Hello Sriram,
(writing as an implementor, doing also techsupport)
please note that the large providers are not these who would do any ASPA
deployment first. The end networks will do, and ultimately my question is How
do I, as a leaf network operator, find out that I have made an error?
The approach proposed by Maria (which you support) does not function as
intended when the erring remote AS is multi-homed. In such cases, the remote
AS’s alternate route propagates to all ASes in the Internet – whether they
perform ASPA verification or not – resulting in the remote AS remaining unaware
of the error in their ASPA.
To reiterate, the approach proposed by me, after discussing in the previous
thread, is ultimately this:
1. ingress check from Customer: prepend self, run Upstream algo
2. ingress check from Peer / RS: run Upstream algo
3. ingress check from Provider: prepend self, run Downstream algo
4. egress check to Customer: prepend self and the Customer, run Downstream
algo
5. egress check to Peer / RS: prepend self, run Upstream algo
6. egress check to Provider: prepend self and the Provider, run Upstream algo
The motivation behind the prepending is this:
* the route is inevitably doomed to get that exact specific AS Path later on
* in cases 3, 5, 6, we catch our own error (this is the major advantage)
* in case 1, we ensure that our customer ran their own check (6)
* in case 4, we catch our Customer’s error on our other side before they
even run their check on ingress (3)
This way, we check as much and as soon as we can. And the BGP Role still tells
us which variant we use.
This indeed does not work for Complex relationships. That’s OK, it’s the same
case as with BGP Roles. Exactly the same case. They will figure it out. We just
have to design the algorithm in such a way that it fails at the source of the
error, or in other words, as Randy Bush said earlier in that aforementioned
thread on this topic last month, no garbage in, no garbage out.
https://mailarchive.ietf.org/arch/msg/sidrops/Vs9Yx5x8T8qk5PsvcmUIjyP7oOY/
As you seem to agree, the network operator at the local AS should not be left
unaware if a customer is effectively cut off (i.e., all their routes are
dropped). The local AS operator must have the ability to manage such situations
proactively.
Which means they should be able to see it before they send anything out.
Considering Maria’s and your inputs, I suggest the following approach:
* During ASPA verification, when the remote (sending) AS is a customer, the
following check if performed: * The remote AS has an ASPA record, and * The
SPAS obtained from the ASPA does not include the local AS.
* If this check evaluates to True, an alert MUST be generated for the local
AS.
* The local AS operator MUST have an automated procedure to process this
alert and decide whether to terminate the BGP session with the remote AS.
* Regardless of whether the BGP session is terminated, the local AS MUST
notify the remote AS about the error in their ASPA.
* If the BGP session was terminated, it is re-initiated after the error in
the ASPA is fixed.
This needs:
* the implementation to implement an additional BGP instance check
alongside ASPA validation, and generate specific alerts
* the operator to actually catch these alerts and deploy a customer
notification tool which would be completely dormant for most of the time
* the provider of the erring customer to actually deploy ASPA at all.
This is what I call bending backwards, but on the operator side.
Maria and I agreed earlier that the combination of the existing ASPA-based path
verification at ingress and the OTC procedure [RFC 9234] eliminate the need for
egress verification. Especially, when there is a supplementary procedure (as
described above) to remedy the omission error in the direct customer’s ASPA.
I was, at least, very clear that I consider this very much suboptimal.
What may work is this, not running on ASPA verification but as an auxiliary BGP
session check.
* During BGP session initiation, both parties MUST check whether either:
* the Customer has no ASPA record, or
* their SPAS includes the Provider’s AS. If the check fails, the BGP
session MUST be terminated immediately.
* For any established BGP session, the check MUST be repeated any time the
appropriate SPAS changes, appears or disappears. The session SHOULD be
terminated immediately if the condition is not met anymore. If not terminated,
the operators SHOULD resolve the issue as soon as possible to prevent possible
ASPA Invalids being spread out.
In the end, considering the scenarios described by RFC 4264 in conjunction with
ASPA-Role discrepancy, I stand very firmly on the side that the egress check is
not only a much better option but also much easier to implement, deploy and
ultimately debug in production.
I’m willing to update the draft myself if the current authors lack time or
energy to do that.
Have a nice day!
Maria
–
Maria Matejka (she/her) | BIRD Team Leader | CZ.NIC, z.s.p.o.
_______________________________________________
GROW mailing list -- [email protected]
To unsubscribe send an email to [email protected]