Hi Maria,

“

What may work is this, not running on ASPA verification but as an auxiliary BGP 
session check.

  *   During BGP session initiation, both parties MUST check whether either:
     *   the Customer has no ASPA record, or
     *   their SPAS includes the Provider’s AS. If the check fails, the BGP 
session MUST be terminated immediately.
  *   For any established BGP session, the check MUST be repeated any time the 
appropriate SPAS changes, appears or disappears. The session SHOULD be 
terminated immediately if the condition is not met anymore. If not terminated, 
the operators SHOULD resolve the issue as soon as possible to prevent possible 
ASPA Invalids being spread out.
“

I recall that I once commented on the ASPA draft about whether it was needed to 
check for conflicts between BGP role negotiation and ASPA records 
(https://mailarchive.ietf.org/arch/msg/sidrops/9p0-W8zUjR730iDHVi_PsND30n4/), 
which might be somewhat similar to the approach you mentioned above. However, I 
would like to ask why the established BGP session should be terminated when the 
check fails? The check is for preventing possible ASPA Invalids being spread 
out, thereby avoiding network disruptions caused by incorrect routing discards. 
If the established BGP session were terminated when the check fails, the 
accessibility of the network may be directly affected.


Best,
Nan


From: Maria Matejka <[email protected]>
Sent: Sunday, July 13, 2025 11:26 PM
To: Sriram, Kotikalapudi (Fed) <[email protected]>
Cc: jia zhang <[email protected]>; [email protected]; [email protected]
Subject: [Sidrops] Re: Question: How best to deal with network operator error 
in creation of ASPA?


Hello Sriram,

(writing as an implementor, doing also techsupport)

please note that the large providers are not these who would do any ASPA 
deployment first. The end networks will do, and ultimately my question is How 
do I, as a leaf network operator, find out that I have made an error?

The approach proposed by Maria (which you support) does not function as 
intended when the erring remote AS is multi-homed. In such cases, the remote 
AS’s alternate route propagates to all ASes in the Internet – whether they 
perform ASPA verification or not – resulting in the remote AS remaining unaware 
of the error in their ASPA.

To reiterate, the approach proposed by me, after discussing in the previous 
thread, is ultimately this:

  1.  ingress check from Customer: prepend self, run Upstream algo
  2.  ingress check from Peer / RS: run Upstream algo
  3.  ingress check from Provider: prepend self, run Downstream algo
  4.  egress check to Customer: prepend self and the Customer, run Downstream 
algo
  5.  egress check to Peer / RS: prepend self, run Upstream algo
  6.  egress check to Provider: prepend self and the Provider, run Upstream algo

The motivation behind the prepending is this:

  *   the route is inevitably doomed to get that exact specific AS Path later on
  *   in cases 3, 5, 6, we catch our own error (this is the major advantage)
  *   in case 1, we ensure that our customer ran their own check (6)
  *   in case 4, we catch our Customer’s error on our other side before they 
even run their check on ingress (3)

This way, we check as much and as soon as we can. And the BGP Role still tells 
us which variant we use.

This indeed does not work for Complex relationships. That’s OK, it’s the same 
case as with BGP Roles. Exactly the same case. They will figure it out. We just 
have to design the algorithm in such a way that it fails at the source of the 
error, or in other words, as Randy Bush said earlier in that aforementioned 
thread on this topic last month, no garbage in, no garbage out.

https://mailarchive.ietf.org/arch/msg/sidrops/Vs9Yx5x8T8qk5PsvcmUIjyP7oOY/

As you seem to agree, the network operator at the local AS should not be left 
unaware if a customer is effectively cut off (i.e., all their routes are 
dropped). The local AS operator must have the ability to manage such situations 
proactively.

Which means they should be able to see it before they send anything out.

Considering Maria’s and your inputs, I suggest the following approach:

  *   During ASPA verification, when the remote (sending) AS is a customer, the 
following check if performed: * The remote AS has an ASPA record, and * The 
SPAS obtained from the ASPA does not include the local AS.
  *   If this check evaluates to True, an alert MUST be generated for the local 
AS.
  *   The local AS operator MUST have an automated procedure to process this 
alert and decide whether to terminate the BGP session with the remote AS.
  *   Regardless of whether the BGP session is terminated, the local AS MUST 
notify the remote AS about the error in their ASPA.
  *   If the BGP session was terminated, it is re-initiated after the error in 
the ASPA is fixed.

This needs:

  *   the implementation to implement an additional BGP instance check 
alongside ASPA validation, and generate specific alerts
  *   the operator to actually catch these alerts and deploy a customer 
notification tool which would be completely dormant for most of the time
  *   the provider of the erring customer to actually deploy ASPA at all.

This is what I call bending backwards, but on the operator side.

Maria and I agreed earlier that the combination of the existing ASPA-based path 
verification at ingress and the OTC procedure [RFC 9234] eliminate the need for 
egress verification. Especially, when there is a supplementary procedure (as 
described above) to remedy the omission error in the direct customer’s ASPA.

I was, at least, very clear that I consider this very much suboptimal.

What may work is this, not running on ASPA verification but as an auxiliary BGP 
session check.

  *   During BGP session initiation, both parties MUST check whether either:

     *   the Customer has no ASPA record, or
     *   their SPAS includes the Provider’s AS. If the check fails, the BGP 
session MUST be terminated immediately.

  *   For any established BGP session, the check MUST be repeated any time the 
appropriate SPAS changes, appears or disappears. The session SHOULD be 
terminated immediately if the condition is not met anymore. If not terminated, 
the operators SHOULD resolve the issue as soon as possible to prevent possible 
ASPA Invalids being spread out.

In the end, considering the scenarios described by RFC 4264 in conjunction with 
ASPA-Role discrepancy, I stand very firmly on the side that the egress check is 
not only a much better option but also much easier to implement, deploy and 
ultimately debug in production.

I’m willing to update the draft myself if the current authors lack time or 
energy to do that.

Have a nice day!
Maria

–
Maria Matejka (she/her) | BIRD Team Leader | CZ.NIC, z.s.p.o.
_______________________________________________
GROW mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to