Hi Authors,

Dhruv has asked that I take over Shepherd for this document as Dhruv is listed 
as a contributor. Thanks to Dhruv for the previous review, and to the authors 
for addressing  the previous comments which are applied in the latest -13 
version.

While re-reading the (-13) document thoroughly again to fill in the shepherd 
report, I stumbled upon items that I did not notice before. Apologies that I 
did not realize and raise this during the WGLC poll which I supported, as they 
did not become apparent to me until reading yet-again.

Majority of this email content is editorial suggestions and do not change the 
content and are simply a recommendation. However, there are two topics (A, B) 
that I think need to be clarified in the functional section below. I don’t have 
content suggestions for them as I’m unclear about them.

 I have the Shepherd writeup prepared to submit but will hold until a reply 
regarding the two functional items to discuss below but will not hold for the 
editorial items.

Thank you!
Andrew



-------------------
Functional
-------------------

Topic A: Section 3.5 (snippet below). what is the definition of "is failing" in 
this context? examples? Does this imply a degradation but still operationally 
connected? Flapping? PCE Overload bit received via Notification Object? It 
seems to also imply the PCC is aware that the local pce and highest priority 
pce has failed, how? and how does it actually signal to cause a switch over to 
the next highest priority PCE for the group if the group might be amongst 
multiple LSP headends? it does mention local policy decision, but I don't 
follow the mechanics for how it knows it needs to do something (definition of 
failing), especially if it's an issue between PCE1<->PCE2, and what action 
(msg) it sends.  According to text before this, PCC also has no need for 
priority awareness.  Appreciate clarity on this paragraph.


   If the highest priority PCE is failing or if the state-sync session between 
the local PCE and the highest priority PCE failed, the PCC MAY decide to 
instruct a switch-over to delegate the LSP to the next highest priority PCE or 
to take back control of the LSP. It is a local policy decision.



Topic B:  Section 3.5 - what is the reason to send PcUpd to all state-sync 
sessions and not just the session in which it received the delegation or 
sub-delegation from?  To make the other PCEs aware of this inflight make-before 
break? What should the other state-sync sessions do with the PcUpd msg when 
it's not connected to the underlying PCC? Drop it? Remember it? my guess is: 
when the PCC receives the PcRpt, it will ack back to the PCE it received it 
from with the SRP-ID. That PcRpt+SRP-ID is echo'd to all PCEs, thus some 
benefit the other state sync sessions should know about the PcUpd. However, 
there appears to be a race condition risk here. Let’s assume PCC1 delegated to 
PCE1 directly. It could be possible the PCE1<->PCE2 communication is busy, 
meanwhile the PCC<->PCE1 and PCC<->PCE2 is quiet. PCE2 could receive the PcRpt 
with SRPID before PCE2 learns about the PcUpd. It leads me to wonder exactly 
what is the value of PCE2 learning of the PcUpd if this risk exists?


When a PCE has the delegation for an LSP and needs to update this LSP, it MUST 
send a Path Compute Update (PCUpd) message to all state-sync sessions and to 
the PCC session on which it received the delegation.





-------------------
Editorial
-------------------

- Section 1.2: What is "this" delay in this context? Is it the delay for the 
PCC to notify a PCE in general regardless of multi-pce deployment? or the delay 
for PCC to notify PCE2 because it's currently busy notifying PCE1 and can't 
concurrently do so? Or the delay for PCE1 and PCE2 to come to the same 
eventually consistent converged state?  I think the text may benefit from 
clarifying which delay is under consideration/concern here. Considering the 
next paragraph my understanding is it's about converged state, so perhaps the 
proposed text instead?

    Original: "This delay may affect the reaction time of the other PCEs if 
they need to take action after being notified of the LSP parameter change."
    Proposed: "This convergence delay may hinder the reaction time of other 
PCEs that must take action after being notified of the LSP parameter change."


- Section 1.2: Minor nit to avoid confusion with the PCEP "Update" message.

    Original: "...PCC1 reporting the update of LSP1 to PCE2"
    Proposed: "...PCC1 reporting the state of LSP1 to PCE2"


-Section 1.3 Might be worth clearly saying LSP1 is on PCC1 and LSP2 is on PCC2.

    Original: "In the example in Figure 2, we consider that by configuration, 
both PCCs will first delegate their LSPs to PCE1. So, PCE1 is responsible for 
computing a path for both LSP1 and LSP2."
    Proposed: "In the example in Figure 2, PCC1 and PCC2 are configured to 
delegate their respective LSPs (LSP1 and LSP2) to PCE1. Therefore, PCE1 is 
responsible for computing a path for both LSPs."


- section 1.3 Minor re-wording:

    Original: "When the PCC2-PCE1 session is back online, PCC2 will keep using 
PCE2 as the active PCE (consider no preemption in this example)."
    Proposed: "Once the PCC2-PCE1 session is restored, PCC2 continues using 
PCE2 as the active PCE, assuming no preemption in this example."


- Section 1.3 - I suspect the "unit" terminology may get found during an IESG 
review, perhaps can collapse to:

    Original: "This situation is called a split-brain scenario, as there are 
multiple computation brains running at the same time, while a central 
computation unit is required in some deployments/use cases. Further, there are 
use cases where a particular LSP path computation is linked to another LSP path 
computation: the most common use case is path disjointness (see [RFC8800]) and 
Bidirectional LSPs (see [RFC9059]). The set of LSPs that are dependent on each 
other may start from different head-ends."

    Proposed: This scenario is called a 'split-brain' and occurs when multiple 
PCEs operate simultaneously in a deployment requiring a single centralized 
entity for computation. Such lack of coordination is particularly problematic 
for interdependent LSPs such as those requiring path disjointness [RFC8800] or 
Bidirectional paths [RFC9059] where the computation of one path is strictly 
dependent upon the state of another potentially originating from a different 
head-end.


- section 2.1 -

    Original: "can help in some scenarios where PCEP sessions are lost between 
PCCs and PCEs"
    Proposed: I think this sentence can be dropped. Section 1.0 already makes 
the arguments for it, and, has shown it's not just when it's lost but also 
returned.

    Original: "PCE1 will be able to do state synchronization via PCRpt messages 
for its LSPs to PCE2 and PCE2 will do the same"
    Proposed: "PCE1 will synchronize its LSP state to PCE2 via PCRpt messages; 
PCE2 will similarly synchronize its state to PCE1."


- section 2.2 indicates "as seen in section 1" .... "may provide"... 
"computation loops". However the computation loops are in the appendix, and 
section 1 does point to Appendix examples, and B.6 is an example of a loop, so 
perhaps instead might be easier for a reader to instead have:

    Original: "As seen in Section 1,..."
    Proposed: "As seen in Section 1 and Appendix B. Scenarios, ...."


- section 3.1.1 - Capability topic in general: let's take an example being Path 
Setup Type capability exchange (RFC8408). Do two PCEs need to support 
symmetrical capabilities in order to perform state-sync? such as path setup 
type? My understanding (and opinion) is no, and  state-sync procedures should 
continue following the rules associated with that specific capability. One 
could read between the lines (the PCE behaves like a PCC in 3.2) of this but 
might be best to be explicit:

    Proposed: State synchronization procedures are independent of the support 
for other PCEP capabilities, for example, Path Setup Type (RFC 8408). While two 
PCEs MUST support state-sync extensions to perform synchronization, they do not 
need to support a symmetrical set of additional capabilities. In cases of 
asymmetry, state-sync SHOULD continue however, the PCE MUST handle information 
and behavior according to the specific rules defined for those capabilities 
(e.g., ignoring paths with an unsupported Path Setup Type or reporting an error 
as defined in the relevant capability's specification)"


- Section 3.7 - The text use of "could" and "would" could be tighter:

    Original: "It is possible that a PCE does not have a PCEP session with the 
headend to initiate an LSP as per [RFC8281]. A PCE could send the Path Compute 
Initiate (PCInitiate) message on the state-sync sessions to another PCE to 
request it to create a PCE-Initiated LSP on its behalf. If the PCE is able to 
initiate the LSP it would report it on the state-sync session via PCRpt 
message. If the PCE does not have a session to the headend, it MUST send a 
PCErr message with Error-type=24 (PCE instantiation error) and Error-value=TBD5 
(No PCEP session with the headend). PCE could try to initiate via another 
state-sync PCE if available.""

    Proposed: A PCE may not have a direct PCEP session to a PCC to initiate an 
LSP as per [RFC8281]. A PCE MAY send the Path Compute Initiate (PCInitiate) 
message on the state-sync sessions to another PCE to request it to create a 
PCE-initiated LSP on its behalf. If the PCE is able to initiate the LSP, it 
reports it on the state-sync session via a PCRpt message."



- section 8.2 a few references differ from other parts in the document
    Original: state sync
    Proposed: state-sync




From: Dhruv Dhody <[email protected]>
Date: Thursday, October 30, 2025 at 11:48 AM
To: [email protected] <[email protected]>
Cc: [email protected] <[email protected]>
Subject: [Pce] Shepherd review of draft-ietf-pce-state-sync


CAUTION: This is an external email. Please be very careful when clicking links 
or opening attachments. See the URL nok.it/ext for additional information.



Hi Authors,

I have done the Shepherd review of the I-D. Once the update is posted, I will 
send this to the AD.

# Shepherd review of draft-ietf-pce-state-sync

## Minor

- Section 3.3, "When a PCE receives a new PCRpt from a PCC without the 
LSP-DB-VERSION, it SHOULD NOT forward the PCRpt on any state-sync sessions and 
SHOULD log such an event on the first occurrence", when can the SHOULD NOT be 
ignored? Otherwise make it a MUST NOT.

- Related to above, there are a lot of SHOULD in the draft, check them against 
the IESG statement 
https://datatracker.ietf.org/doc/statement-iesg-statement-on-clarifying-the-use-of-bcp-14-key-words/

- Section 3.5, "The computation priority is a number...", it is important to go 
a little more in detail like an unsigned integer of range 0-7 (same as 
delegation-pref in the PCEP YANG model) with 7 reflecting the highest 
preference. Update the examples to keep the priority in this range.

- Section 3.5, "the highest IP address has more priority", we need to handle 
the case for comparing IPv4 and IPv6 address as well, perhaps say when comparan 
IPv4 address MUST first be converted to its IPv4-mapped IPv6 form [RFC4291] 
before comparison.

- Section 3.5, "...the operator MAY decide to instruct a switch-over to 
delegate the LSP to the next highest priority PCE or to take back control of 
the LSP. It is a local policy decision", is it the operator or the PCC?

- I suggest explicitly clarifying that the full mesh of PCEP sessions between 
PCEs should be okay from scalability point of view. Perhaps in Section 8.6? 
Something like - 'The “full mesh” requirement applies only among PCEs that 
participate in inter-PCE state synchronization for the same set of PCCs or 
associations. In operational deployments, this typically involves a small 
number of PCEs (e.g., two or three for redundancy), making a full mesh feasible 
for deterministic state consistency and loop prevention.'


## Nits

- s/updates all PCEs/updates to all PCEs/

- Add reference on first mention of association groups

Thanks!
Dhruv
_______________________________________________
Pce mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to