| draft-ietf-tsvwg-ecn-tunnel-04.txt | draft-ietf-tsvwg-ecn-tunnel-06.txt | |||
|---|---|---|---|---|
| Transport Area Working Group B. Briscoe | Transport Area Working Group B. Briscoe | |||
| Internet-Draft BT | Internet-Draft BT | |||
| Updates: 3168, 4301 October 24, 2009 | Updates: 3168, 4301 December 20, 2009 | |||
| (if approved) | (if approved) | |||
| Intended status: Standards Track | Intended status: Standards Track | |||
| Expires: April 27, 2010 | Expires: June 23, 2010 | |||
| Tunnelling of Explicit Congestion Notification | Tunnelling of Explicit Congestion Notification | |||
| draft-ietf-tsvwg-ecn-tunnel-04 | draft-ietf-tsvwg-ecn-tunnel-06 | |||
| Abstract | ||||
| This document redefines how the explicit congestion notification | ||||
| (ECN) field of the IP header should be constructed on entry to and | ||||
| exit from any IP in IP tunnel. On encapsulation it updates RFC3168 | ||||
| to bring all IP in IP tunnels (v4 or v6) into line with RFC4301 IPsec | ||||
| ECN processing. On decapsulation it updates both RFC3168 and RFC4301 | ||||
| to add new behaviours for previously unused combinations of inner and | ||||
| outer header. The new rules ensure the ECN field is correctly | ||||
| propagated across a tunnel whether it is used to signal one or two | ||||
| severity levels of congestion, whereas before only one severity level | ||||
| was supported. Tunnel endpoints can be updated in any order without | ||||
| affecting pre-existing uses of the ECN field (backward compatible). | ||||
| Nonetheless, operators wanting to support two severity levels (e.g. | ||||
| for pre-congestion notification--PCN) can require compliance with | ||||
| this new specification. A thorough analysis of the reasoning for | ||||
| these changes and the implications is included. In the unlikely | ||||
| event that the new rules do not meet a specific need, RFC4774 gives | ||||
| guidance on designing alternate ECN semantics and this document | ||||
| extends that to include tunnelling issues. | ||||
| Status of This Memo | Status of This Memo | |||
| This Internet-Draft is submitted to IETF in full conformance with the | This Internet-Draft is submitted to IETF in full conformance with the | |||
| provisions of BCP 78 and BCP 79. | provisions of BCP 78 and BCP 79. | |||
| Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
| Task Force (IETF), its areas, and its working groups. Note that | Task Force (IETF), its areas, and its working groups. Note that | |||
| other groups may also distribute working documents as Internet- | other groups may also distribute working documents as Internet- | |||
| Drafts. | Drafts. | |||
| skipping to change at page 1, line 34 | skipping to change at page 2, line 9 | |||
| and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
| time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
| material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
| The list of current Internet-Drafts can be accessed at | The list of current Internet-Drafts can be accessed at | |||
| http://www.ietf.org/ietf/1id-abstracts.txt. | http://www.ietf.org/ietf/1id-abstracts.txt. | |||
| The list of Internet-Draft Shadow Directories can be accessed at | The list of Internet-Draft Shadow Directories can be accessed at | |||
| http://www.ietf.org/shadow.html. | http://www.ietf.org/shadow.html. | |||
| This Internet-Draft will expire on April 27, 2010. | This Internet-Draft will expire on June 23, 2010. | |||
| Copyright Notice | Copyright Notice | |||
| Copyright (c) 2009 IETF Trust and the persons identified as the | Copyright (c) 2009 IETF Trust and the persons identified as the | |||
| document authors. All rights reserved. | document authors. All rights reserved. | |||
| This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
| Provisions Relating to IETF Documents in effect on the date of | Provisions Relating to IETF Documents | |||
| publication of this document (http://trustee.ietf.org/license-info). | (http://trustee.ietf.org/license-info) in effect on the date of | |||
| Please review these documents carefully, as they describe your rights | publication of this document. Please review these documents | |||
| and restrictions with respect to this document. | carefully, as they describe your rights and restrictions with respect | |||
| to this document. Code Components extracted from this document must | ||||
| Abstract | include Simplified BSD License text as described in Section 4.e of | |||
| the Trust Legal Provisions and are provided without warranty as | ||||
| This document redefines how the explicit congestion notification | described in the BSD License. | |||
| (ECN) field of the IP header should be constructed on entry to and | ||||
| exit from any IP in IP tunnel. On encapsulation it updates RFC3168 | ||||
| to bring all IP in IP tunnels (v4 or v6) into line with RFC4301 IPsec | ||||
| ECN processing. On decapsulation it updates both RFC3168 and RFC4301 | ||||
| to add new behaviours for previously unused combinations of inner and | ||||
| outer header. The new rules propagate the ECN field whether it is | ||||
| used to signal one or two severity levels of congestion, whereas | ||||
| before they propagated only one. Tunnel endpoints can be updated in | ||||
| any order without affecting pre-existing uses of the ECN field | ||||
| (backward compatible). Nonetheless, operators wanting to support two | ||||
| severity levels (e.g. for pre-congestion notification--PCN) can | ||||
| require compliance with this new specification. A thorough analysis | ||||
| of the reasoning for these changes and the implications is included. | ||||
| Table of Contents | Table of Contents | |||
| 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 8 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 9 | |||
| 1.1. Scope . . . . . . . . . . . . . . . . . . . . . . . . . . 9 | 1.1. Scope . . . . . . . . . . . . . . . . . . . . . . . . . . 10 | |||
| 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 10 | 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 11 | |||
| 3. Summary of Pre-Existing RFCs . . . . . . . . . . . . . . . . . 11 | 3. Summary of Pre-Existing RFCs . . . . . . . . . . . . . . . . . 12 | |||
| 3.1. Encapsulation at Tunnel Ingress . . . . . . . . . . . . . 11 | 3.1. Encapsulation at Tunnel Ingress . . . . . . . . . . . . . 12 | |||
| 3.2. Decapsulation at Tunnel Egress . . . . . . . . . . . . . . 12 | 3.2. Decapsulation at Tunnel Egress . . . . . . . . . . . . . . 13 | |||
| 4. New ECN Tunnelling Rules . . . . . . . . . . . . . . . . . . . 13 | 4. New ECN Tunnelling Rules . . . . . . . . . . . . . . . . . . . 14 | |||
| 4.1. Default Tunnel Ingress Behaviour . . . . . . . . . . . . . 14 | 4.1. Default Tunnel Ingress Behaviour . . . . . . . . . . . . . 14 | |||
| 4.2. Default Tunnel Egress Behaviour . . . . . . . . . . . . . 14 | 4.2. Default Tunnel Egress Behaviour . . . . . . . . . . . . . 15 | |||
| 4.3. Encapsulation Modes . . . . . . . . . . . . . . . . . . . 16 | 4.3. Encapsulation Modes . . . . . . . . . . . . . . . . . . . 17 | |||
| 4.4. Single Mode of Decapsulation . . . . . . . . . . . . . . . 18 | 4.4. Single Mode of Decapsulation . . . . . . . . . . . . . . . 18 | |||
| 5. Updates to Earlier RFCs . . . . . . . . . . . . . . . . . . . 18 | 5. Updates to Earlier RFCs . . . . . . . . . . . . . . . . . . . 19 | |||
| 5.1. Changes to RFC4301 ECN processing . . . . . . . . . . . . 18 | 5.1. Changes to RFC4301 ECN processing . . . . . . . . . . . . 19 | |||
| 5.2. Changes to RFC3168 ECN processing . . . . . . . . . . . . 19 | 5.2. Changes to RFC3168 ECN processing . . . . . . . . . . . . 20 | |||
| 5.3. Motivation for Changes . . . . . . . . . . . . . . . . . . 20 | 5.3. Motivation for Changes . . . . . . . . . . . . . . . . . . 20 | |||
| 5.3.1. Motivation for Changing Encapsulation . . . . . . . . 20 | 5.3.1. Motivation for Changing Encapsulation . . . . . . . . 21 | |||
| 5.3.2. Motivation for Changing Decapsulation . . . . . . . . 21 | 5.3.2. Motivation for Changing Decapsulation . . . . . . . . 22 | |||
| 6. Backward Compatibility . . . . . . . . . . . . . . . . . . . . 23 | 6. Backward Compatibility . . . . . . . . . . . . . . . . . . . . 24 | |||
| 6.1. Non-Issues Updating Decapsulation . . . . . . . . . . . . 23 | 6.1. Non-Issues Updating Decapsulation . . . . . . . . . . . . 24 | |||
| 6.2. Non-Update of RFC4301 IPsec Encapsulation . . . . . . . . 24 | 6.2. Non-Update of RFC4301 IPsec Encapsulation . . . . . . . . 25 | |||
| 6.3. Update to RFC3168 Encapsulation . . . . . . . . . . . . . 24 | 6.3. Update to RFC3168 Encapsulation . . . . . . . . . . . . . 25 | |||
| 7. Design Principles for Future Non-Default Schemes . . . . . . . 25 | 7. Design Principles for Alternate ECN Tunnelling Semantics . . . 26 | |||
| 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 26 | 8. Security Considerations . . . . . . . . . . . . . . . . . . . 28 | |||
| 9. Security Considerations . . . . . . . . . . . . . . . . . . . 26 | 9. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 29 | |||
| 10. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 28 | 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 30 | |||
| 11. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 28 | 11. References . . . . . . . . . . . . . . . . . . . . . . . . . . 30 | |||
| 12. Comments Solicited . . . . . . . . . . . . . . . . . . . . . . 29 | 11.1. Normative References . . . . . . . . . . . . . . . . . . . 30 | |||
| 13. References . . . . . . . . . . . . . . . . . . . . . . . . . . 29 | 11.2. Informative References . . . . . . . . . . . . . . . . . . 31 | |||
| 13.1. Normative References . . . . . . . . . . . . . . . . . . . 29 | Appendix A. Early ECN Tunnelling RFCs . . . . . . . . . . . . . . 33 | |||
| 13.2. Informative References . . . . . . . . . . . . . . . . . . 29 | Appendix B. Design Constraints . . . . . . . . . . . . . . . . . 33 | |||
| Appendix A. Early ECN Tunnelling RFCs . . . . . . . . . . . . . . 31 | B.1. Security Constraints . . . . . . . . . . . . . . . . . . . 33 | |||
| Appendix B. Design Constraints . . . . . . . . . . . . . . . . . 32 | B.2. Control Constraints . . . . . . . . . . . . . . . . . . . 35 | |||
| B.1. Security Constraints . . . . . . . . . . . . . . . . . . . 32 | B.3. Management Constraints . . . . . . . . . . . . . . . . . . 36 | |||
| B.2. Control Constraints . . . . . . . . . . . . . . . . . . . 34 | Appendix C. Contribution to Congestion across a Tunnel . . . . . 37 | |||
| B.3. Management Constraints . . . . . . . . . . . . . . . . . . 35 | Appendix D. Why Losing ECT(1) on Decapsulation Impedes PCN . . . 38 | |||
| Appendix C. Contribution to Congestion across a Tunnel . . . . . 36 | Appendix E. Why Resetting ECN on Encapsulation Impedes PCN . . . 39 | |||
| Appendix D. Why Losing ECT(1) on Decapsulation Impedes PCN . . . 37 | ||||
| Appendix E. Why Resetting ECN on Encapsulation Impedes PCN . . . 38 | ||||
| Appendix F. Compromise on Decap with ECT(1) Inner and ECT(0) | Appendix F. Compromise on Decap with ECT(1) Inner and ECT(0) | |||
| Outer . . . . . . . . . . . . . . . . . . . . . . . . 39 | Outer . . . . . . . . . . . . . . . . . . . . . . . . 40 | |||
| Appendix G. Open Issues . . . . . . . . . . . . . . . . . . . . . 40 | Appendix G. Open Issues . . . . . . . . . . . . . . . . . . . . . 41 | |||
| Request to the RFC Editor (to be removed on publication): | Request to the RFC Editor (to be removed on publication): | |||
| In the RFC index, RFC3168 should be identified as an update to | In the RFC index, RFC3168 should be identified as an update to | |||
| RFC2003. RFC4301 should be identified as an update to RFC3168. | RFC2003. RFC4301 should be identified as an update to RFC3168. | |||
| Changes from previous drafts (to be removed by the RFC Editor) | Changes from previous drafts (to be removed by the RFC Editor) | |||
| Full text differences between IETF draft versions are available at | Full text differences between IETF draft versions are available at | |||
| <http://tools.ietf.org/wg/tsvwg/draft-ietf-tsvwg-ecn-tunnel/>, and | <http://tools.ietf.org/wg/tsvwg/draft-ietf-tsvwg-ecn-tunnel/>, and | |||
| between earlier individual draft versions at | between earlier individual draft versions at | |||
| <http://www.briscoe.net/pubs.html#ecn-tunnel> | <http://www.briscoe.net/pubs.html#ecn-tunnel> | |||
| From ietf-03 to ietf-04 (current): | From ietf-05 to ietf-06 (current): | |||
| * Minor textual clarifications and corrections. | ||||
| From ietf-04 to ietf-05: | ||||
| * Functional changes: | ||||
| + Section 4.2: ECT(1) outer with Not-ECT inner: reverted to | ||||
| forwarding as Not-ECT (as in RFC3168 & RFC4301), rather than | ||||
| dropping. | ||||
| + Altered rationale in bullet 3 of Section 5.3.2 to justify | ||||
| this. | ||||
| + Distinguished alarms for dangerous and invalid combinations | ||||
| and allowed combinations that are valid in some tunnel | ||||
| configurations but dangerous in others to be alarmed at the | ||||
| discretion of the implementer and/or operator. | ||||
| + Altered advice on designing alternate ECN tunnelling | ||||
| semantics to reflect the above changes. | ||||
| * Textual changes: | ||||
| + Changed "Future non-default schemes" to "Alternate ECN | ||||
| Tunnelling Semantics" throughout. | ||||
| + Cut down Appendix D and Appendix E for brevity. | ||||
| + A number of clarifying edits & updated refs. | ||||
| From ietf-03 to ietf-04: | ||||
| * Functional changes: none | * Functional changes: none | |||
| * Structural changes: | * Structural changes: | |||
| + Added "Open Issues" appendix | + Added "Open Issues" appendix | |||
| * Textual changes: | * Textual changes: | |||
| + Section title: "Changes from Earlier RFCs" -> "Updates to | + Section title: "Changes from Earlier RFCs" -> "Updates to | |||
| skipping to change at page 7, line 47 | skipping to change at page 8, line 38 | |||
| Roadmap), added new Introductory subsection on "Scope" and | Roadmap), added new Introductory subsection on "Scope" and | |||
| improved clarity; | improved clarity; | |||
| * Added Design Guidelines for New Encapsulations of Congestion | * Added Design Guidelines for New Encapsulations of Congestion | |||
| Notification; | Notification; | |||
| * Considerably clarified the Backward Compatibility section | * Considerably clarified the Backward Compatibility section | |||
| (Section 6); | (Section 6); | |||
| * Considerably extended the Security Considerations section | * Considerably extended the Security Considerations section | |||
| (Section 9); | (Section 8); | |||
| * Summarised the primary rationale much better in the | * Summarised the primary rationale much better in the | |||
| conclusions; | conclusions; | |||
| * Added numerous extra acknowledgements; | * Added numerous extra acknowledgements; | |||
| * Added Appendix E. "Why resetting CE on encapsulation harms | * Added Appendix E. "Why resetting CE on encapsulation harms | |||
| PCN", Appendix C. "Contribution to Congestion across a Tunnel" | PCN", Appendix C. "Contribution to Congestion across a Tunnel" | |||
| and Appendix D. "Ideal Decapsulation Rules"; | and Appendix D. "Ideal Decapsulation Rules"; | |||
| skipping to change at page 8, line 52 | skipping to change at page 9, line 41 | |||
| When ECN and its tunnelling was defined in RFC3168, only the minimum | When ECN and its tunnelling was defined in RFC3168, only the minimum | |||
| necessary changes to the ECN field were propagated through tunnel | necessary changes to the ECN field were propagated through tunnel | |||
| endpoints--just enough for the basic ECN mechanism to work. This was | endpoints--just enough for the basic ECN mechanism to work. This was | |||
| due to concerns that the ECN field might be toggled to communicate | due to concerns that the ECN field might be toggled to communicate | |||
| between a secure site and someone on the public Internet--a covert | between a secure site and someone on the public Internet--a covert | |||
| channel. This was because a mutable field like ECN cannot be | channel. This was because a mutable field like ECN cannot be | |||
| protected by IPsec's integrity mechanisms--it has to be able to | protected by IPsec's integrity mechanisms--it has to be able to | |||
| change as it traverses the Internet. | change as it traverses the Internet. | |||
| Nonetheless, the latest IPsec architecture [RFC4301] considers a | Nonetheless, the latest IPsec architecture [RFC4301] considered a | |||
| bandwidth limit of 2 bits per packet on a covert channel makes it a | bandwidth limit of 2 bits per packet on a covert channel made it a | |||
| manageable risk. Therefore, for simplicity, an RFC4301 ingress | manageable risk. Therefore, for simplicity, an RFC4301 ingress | |||
| copies the whole ECN field to encapsulate a packet. It also | copied the whole ECN field to encapsulate a packet. It also | |||
| dispenses with the two modes of RFC3168, one which partially copied | dispensed with the two modes of RFC3168, one which partially copied | |||
| the ECN field, and the other which blocked all propagation of ECN | the ECN field, and the other which blocked all propagation of ECN | |||
| changes. | changes. | |||
| Unfortunately, this entirely reasonable sequence of standards actions | Unfortunately, this entirely reasonable sequence of standards actions | |||
| resulted in a perverse outcome; non-IPsec tunnels (RFC3168) blocked | resulted in a perverse outcome; non-IPsec tunnels (RFC3168) blocked | |||
| the 2-bit covert channel, while IPsec tunnels (RFC4301) did not--at | the 2-bit covert channel, while IPsec tunnels (RFC4301) did not--at | |||
| least not at the ingress. At the egress, both IPsec and non-IPsec | least not at the ingress. At the egress, both IPsec and non-IPsec | |||
| tunnels still partially restricted propagation of the full ECN field. | tunnels still partially restricted propagation of the full ECN field. | |||
| The trigger for the changes in this document was the introduction of | The trigger for the changes in this document was the introduction of | |||
| pre-congestion notification (PCN [I-D.ietf-pcn-marking-behaviour]) to | pre-congestion notification (PCN [RFC5670]) to the IETF standards | |||
| the IETF standards track. PCN needs the ECN field to be copied at a | track. PCN needs the ECN field to be copied at a tunnel ingress and | |||
| tunnel ingress and it needs four states of congestion signalling to | it needs four states of congestion signalling to be propagated at the | |||
| be propagated at the egress, but pre-existing tunnels only propagate | egress, but pre-existing tunnels only propagate three in the ECN | |||
| three in the ECN field. | field. | |||
| This document draws on currently unused (CU) combinations of inner | This document draws on currently unused (CU) combinations of inner | |||
| and outer headers to add tunnelling of four-state congestion | and outer headers to add tunnelling of four-state congestion | |||
| signalling to RFC3168 and RFC4301. Operators of tunnels who | signalling to RFC3168 and RFC4301. Operators of tunnels who | |||
| specifically want to support four states can require that all their | specifically want to support four states can require that all their | |||
| tunnels comply with this specification. Nonetheless, all tunnel | tunnels comply with this specification. Nonetheless, all tunnel | |||
| endpoint implementations (RFC4301, RFC3168, RFC2481, RFC2401, | endpoint implementations (RFC4301, RFC3168, RFC2481, RFC2401, | |||
| RFC2003) can safely be updated to this new specification as part of | RFC2003) can safely be updated to this new specification as part of | |||
| general code maintenance. This will gradually add support for four | general code maintenance. This will gradually add support for four | |||
| congestion states to the Internet. Existing three state schemes will | congestion states to the Internet. Existing three state schemes will | |||
| skipping to change at page 11, line 28 | skipping to change at page 12, line 18 | |||
| Resetting ECN: On encapsulation, setting the ECN field of the new | Resetting ECN: On encapsulation, setting the ECN field of the new | |||
| outer header to be a copy of the ECN field in the incoming header | outer header to be a copy of the ECN field in the incoming header | |||
| except the outer ECN field is set to the ECT(0) codepoint if the | except the outer ECN field is set to the ECT(0) codepoint if the | |||
| incoming ECN field is CE ("11"). | incoming ECN field is CE ("11"). | |||
| 3. Summary of Pre-Existing RFCs | 3. Summary of Pre-Existing RFCs | |||
| This section is informative not normative, as it recaps pre-existing | This section is informative not normative, as it recaps pre-existing | |||
| RFCs. Earlier relevant RFCs that were either experimental or | RFCs. Earlier relevant RFCs that were either experimental or | |||
| incomplete with respect to ECN tunnelling (RFC2481, RFC2401 and | incomplete with respect to ECN tunnelling (RFC2481, RFC2401 and | |||
| RFC2003) are briefly outlined inAppendix A. The question of whether | RFC2003) are briefly outlined in Appendix A. The question of whether | |||
| tunnel implementations used in the Internet comply with any of these | tunnel implementations used in the Internet comply with any of these | |||
| RFCs is not discussed. | RFCs is not discussed. | |||
| 3.1. Encapsulation at Tunnel Ingress | 3.1. Encapsulation at Tunnel Ingress | |||
| At the encapsulator, the controversy has been over whether to | At the encapsulator, the controversy has been over whether to | |||
| propagate information about congestion experienced on the path so far | propagate information about congestion experienced on the path so far | |||
| into the outer header of the tunnel. | into the outer header of the tunnel. | |||
| Specifically, RFC3168 says that, if a tunnel fully supports ECN | Specifically, RFC3168 says that, if a tunnel fully supports ECN | |||
| skipping to change at page 13, line 45 | skipping to change at page 14, line 21 | |||
| Inappropriate changes were not specifically enumerated. RFC4301 did | Inappropriate changes were not specifically enumerated. RFC4301 did | |||
| not mention inappropriate ECN changes. | not mention inappropriate ECN changes. | |||
| 4. New ECN Tunnelling Rules | 4. New ECN Tunnelling Rules | |||
| The standards actions below in Section 4.1 (ingress encapsulation) | The standards actions below in Section 4.1 (ingress encapsulation) | |||
| and Section 4.2 (egress decapsulation) define new default ECN tunnel | and Section 4.2 (egress decapsulation) define new default ECN tunnel | |||
| processing rules for any IP packet (v4 or v6) with any Diffserv | processing rules for any IP packet (v4 or v6) with any Diffserv | |||
| codepoint. | codepoint. | |||
| If absolutely necessary, an alternate congestion encapsulation | If these defaults do not meet a particular requirement, an alternate | |||
| behaviour can be introduced as part of the definition of an alternate | ECN tunnelling scheme can be introduced as part of the definition of | |||
| congestion marking scheme used by a specific Diffserv PHB (see S.5 of | an alternate congestion marking scheme used by a specific Diffserv | |||
| [RFC3168] and [RFC4774]). When designing such new encapsulation | PHB (see S.5 of [RFC3168] and [RFC4774]). When designing such | |||
| schemes, the principles in Section 7 should be followed. However, | alternate ECN tunnelling schemes, the principles in Section 7 should | |||
| alternate ECN tunnelling schemes are NOT RECOMMENDED as the | be followed. However, alternate ECN tunnelling schemes are NOT | |||
| deployment burden of handling exceptional PHBs in implementations of | RECOMMENDED as the deployment burden of handling exceptional PHBs in | |||
| all affected tunnels should not be underestimated. There is no | implementations of all affected tunnels should not be underestimated. | |||
| requirement for a PHB definition to state anything about ECN | There is no requirement for a PHB definition to state anything about | |||
| tunnelling behaviour if the default behaviour in the present | ECN tunnelling behaviour if the default behaviour in the present | |||
| specification is sufficient. | specification is sufficient. | |||
| 4.1. Default Tunnel Ingress Behaviour | 4.1. Default Tunnel Ingress Behaviour | |||
| Two modes of encapsulation are defined here; `normal mode' and | Two modes of encapsulation are defined here; `normal mode' and | |||
| `compatibility mode', which is for backward compatibility with tunnel | `compatibility mode', which is for backward compatibility with tunnel | |||
| decapsulators that do not understand ECN. Section 4.3 explains why | decapsulators that do not understand ECN. Section 4.3 explains why | |||
| two modes are necessary and specifies the circumstances in which it | two modes are necessary and specifies the circumstances in which it | |||
| is sufficient to solely implement normal mode. Note that these are | is sufficient to solely implement normal mode. Note that these are | |||
| modes of the ingress tunnel endpoint only, not the whole tunnel. | modes of the ingress tunnel endpoint only, not the whole tunnel. | |||
| Whatever the mode, an encapsulator forwards the inner header without | Whatever the mode, an encapsulator forwards the inner header without | |||
| changing the ECN field. | changing the ECN field. | |||
| In normal mode an encapsulator compliant with this specification MUST | In normal mode an encapsulator compliant with this specification MUST | |||
| construct the outer encapsulating IP header by copying the 2-bit ECN | construct the outer encapsulating IP header by copying the 2-bit ECN | |||
| field of the incoming IP header. In compatibility mode it clears the | field of the incoming IP header. In compatibility mode it clears the | |||
| ECN field in the outer header to the Not-ECT codepoint. These rules | ECN field in the outer header to the Not-ECT codepoint (the IPv4 | |||
| are tabulated for convenience in Figure 3. | header checksum also changes whenever the ECN field is changed). | |||
| These rules are tabulated for convenience in Figure 3. | ||||
| +-----------------+-------------------------------+ | +-----------------+-------------------------------+ | |||
| | Incoming Header | Outgoing Outer Header | | | Incoming Header | Outgoing Outer Header | | |||
| | (also equal to +---------------+---------------+ | | (also equal to +---------------+---------------+ | |||
| | Outgoing Inner | Compatibility | Normal | | | Outgoing Inner | Compatibility | Normal | | |||
| | Header) | Mode | Mode | | | Header) | Mode | Mode | | |||
| +-----------------+---------------+---------------+ | +-----------------+---------------+---------------+ | |||
| | Not-ECT | Not-ECT | Not-ECT | | | Not-ECT | Not-ECT | Not-ECT | | |||
| | ECT(0) | Not-ECT | ECT(0) | | | ECT(0) | Not-ECT | ECT(0) | | |||
| | ECT(1) | Not-ECT | ECT(1) | | | ECT(1) | Not-ECT | ECT(1) | | |||
| | CE | Not-ECT | CE | | | CE | Not-ECT | CE | | |||
| skipping to change at page 15, line 11 | skipping to change at page 15, line 38 | |||
| intersection of the appropriate incoming inner header (row) and outer | intersection of the appropriate incoming inner header (row) and outer | |||
| header (column) in Figure 4 (the IPv4 header checksum also changes | header (column) in Figure 4 (the IPv4 header checksum also changes | |||
| whenever the ECN field is changed). There is no need for more than | whenever the ECN field is changed). There is no need for more than | |||
| one mode of decapsulation, as these rules cater for all known | one mode of decapsulation, as these rules cater for all known | |||
| requirements. | requirements. | |||
| +---------+------------------------------------------------+ | +---------+------------------------------------------------+ | |||
| |Incoming | Incoming Outer Header | | |Incoming | Incoming Outer Header | | |||
| | Inner +---------+------------+------------+------------+ | | Inner +---------+------------+------------+------------+ | |||
| | Header | Not-ECT | ECT(0) | ECT(1) | CE | | | Header | Not-ECT | ECT(0) | ECT(1) | CE | | |||
| +---------+---------+------------+------------+------------+ | +---------+---------+------------+------------+------------+ | |||
| | Not-ECT | Not-ECT |Not-ECT(!!!)| drop(!!!)| drop(!!!)| | | Not-ECT | Not-ECT |Not-ECT(!!!)|Not-ECT(!!!)| drop(!!!)| | |||
| | ECT(0) | ECT(0) | ECT(0) | ECT(1)(!!!)| CE | | | ECT(0) | ECT(0) | ECT(0) | ECT(1) | CE | | |||
| | ECT(1) | ECT(1) | ECT(1)(!!!)| ECT(1) | CE | | | ECT(1) | ECT(1) | ECT(1) (!) | ECT(1) | CE | | |||
| | CE | CE | CE | CE(!!!)| CE | | | CE | CE | CE | CE(!!!)| CE | | |||
| +---------+---------+------------+------------+------------+ | +---------+---------+------------+------------+------------+ | |||
| | Outgoing Header | | | Outgoing Header | | |||
| +------------------------------------------------+ | +------------------------------------------------+ | |||
| Unexpected combinations are indicated by '(!!!)' | Currently unused combinations are indicated by '(!!!)' or '(!)' | |||
| Figure 4: New IP in IP Decapsulation Behaviour | Figure 4: New IP in IP Decapsulation Behaviour | |||
| This table for decapsulation behaviour is derived from the following | This table for decapsulation behaviour is derived from the following | |||
| logic: | logic: | |||
| o If the inner ECN field is Not-ECT the decapsulator MUST NOT | o If the inner ECN field is Not-ECT the decapsulator MUST NOT | |||
| propagate any other ECN codepoint onwards. This is because the | propagate any other ECN codepoint onwards. This is because the | |||
| inner Not-ECT marking is set by transports that use drop as an | inner Not-ECT marking is set by transports that use drop as an | |||
| indication of congestion and would not understand or respond to | indication of congestion and would not understand or respond to | |||
| any other ECN codepoint [RFC4774]. In addition: | any other ECN codepoint [RFC4774]. In addition: | |||
| * If the inner ECN field is Not-ECT and the outer ECN field is | * If the inner ECN field is Not-ECT and the outer ECN field is CE | |||
| ECT(1) or CE the decapsulator MUST drop the packet. | the decapsulator MUST drop the packet. | |||
| * If the inner ECN field is Not-ECT and the outer ECN field is | * If the inner ECN field is Not-ECT and the outer ECN field is | |||
| ECT(0) or Not-ECT the decapsulator MUST forward the outgoing | Not-ECT, ECT(0) or ECT(1) the decapsulator MUST forward the | |||
| packet with the ECN field cleared to Not-ECT. | outgoing packet with the ECN field cleared to Not-ECT. | |||
| * This specification mandates that any future standards action | ||||
| SHOULD NOT use the ECT(0) codepoint as an indication of | ||||
| congestion, without giving strong reasons, given the above rule | ||||
| forwards an ECT(0) outer as Not-ECT. | ||||
| o In all other cases where the inner supports ECN, the outgoing ECN | o In all other cases where the inner supports ECN, the decapsulator | |||
| field is set to the more severe marking of the outer and inner ECN | MUST set the outgoing ECN field to the more severe marking of the | |||
| fields, where the ranking of severity from highest to lowest is | outer and inner ECN fields, where the ranking of severity from | |||
| CE, ECT(1), ECT(0), Not-ECT. This in no way precludes cases where | highest to lowest is CE, ECT(1), ECT(0), Not-ECT. This in no way | |||
| ECT(1) and ECT(0) have the same severity; | precludes cases where ECT(1) and ECT(0) have the same severity; | |||
| o Certain combinations of inner and outer ECN fields cannot result | o Certain combinations of inner and outer ECN fields cannot result | |||
| from any currently used transition in any current or previous ECN | from any transition in any current or previous ECN tunneling | |||
| tunneling specification. These cases are indicated in Figure 4 by | specification. These currently unused (CU) combinations are | |||
| '(!!!)'). In these cases, the decapsulator SHOULD log the event | indicated in Figure 4 by '(!!!)' or '(!)', where '(!!!)' means the | |||
| and MAY also raise an alarm. Alarms should be rate-limited so | combination is CU and always potentially dangerous, while '(!)' | |||
| that the illegal combinations will not amplify into a flood of | means it is CU and possibly dangerous. In these cases, | |||
| alarm messages. It MUST be possible to suppress alarms or | particularly the more dangerous ones, the decapsulator SHOULD log | |||
| logging, e.g. if it becomes apparent that a combination that | the event and MAY also raise an alarm. | |||
| previously was not used has started to be used for legitimate | ||||
| purposes such as a new standards action. An example is an ECT(0) | Just because the highlighted combinations are currently unused, | |||
| inner combined with an ECT(1) outer, which is proposed as a legal | does not mean that all the other combinations are always valid. | |||
| combination for PCN [I-D.ietf-pcn-3-in-1-encoding], so an operator | Some are only valid if they have arrived from a particular type of | |||
| that deploys support for PCN should turn off logging and alarms in | legacy ingress, and dangerous otherwise. Therefore an | |||
| this case. | implementation MAY allow an operator to configure logging and | |||
| alarms for such additional header combinations known to be | ||||
| dangerous or CU for the particular configuration of tunnel | ||||
| endpoints deployed at run-time. | ||||
| Alarms should be rate-limited so that the anomalous combinations | ||||
| will not amplify into a flood of alarm messages. It MUST be | ||||
| possible to suppress alarms or logging, e.g. if it becomes | ||||
| apparent that a combination that previously was not used has | ||||
| started to be used for legitimate purposes such as a new standards | ||||
| action. | ||||
| The above logic allows for ECT(0) and ECT(1) to both represent the | The above logic allows for ECT(0) and ECT(1) to both represent the | |||
| same severity of congestion marking (e.g. "not congestion marked"). | same severity of congestion marking (e.g. "not congestion marked"). | |||
| But it also allows future schemes to be defined where ECT(1) is a | But it also allows future schemes to be defined where ECT(1) is a | |||
| more severe marking than ECT(0), in particular enabling the simplest | more severe marking than ECT(0), in particular enabling the simplest | |||
| possible encoding for PCN [I-D.ietf-pcn-3-in-1-encoding]. This | possible encoding for PCN [I-D.ietf-pcn-3-in-1-encoding]. This | |||
| approach is discussed in Appendix D and in the discussion of the ECN | approach is discussed in Appendix D and in the discussion of the ECN | |||
| nonce [RFC3540] in Section 9, which in turn refers to Appendix F. | nonce [RFC3540] in Section 8, which in turn refers to Appendix F. | |||
| 4.3. Encapsulation Modes | 4.3. Encapsulation Modes | |||
| Section 4.1 introduces two encapsulation modes, normal mode and | Section 4.1 introduces two encapsulation modes, normal mode and | |||
| compatibility mode, defining their encapsulation behaviour (i.e. | compatibility mode, defining their encapsulation behaviour (i.e. | |||
| header copying or zeroing respectively). Note that these are modes | header copying or zeroing respectively). Note that these are modes | |||
| of the ingress tunnel endpoint only, not the tunnel as a whole. | of the ingress tunnel endpoint only, not the tunnel as a whole. | |||
| A tunnel ingress MUST at least implement `normal mode' and, if it | A tunnel ingress MUST at least implement `normal mode' and, if it | |||
| might be used with legacy tunnel egress nodes (RFC2003, RFC2401 or | might be used with legacy tunnel egress nodes (RFC2003, RFC2401 or | |||
| skipping to change at page 17, line 44 | skipping to change at page 18, line 28 | |||
| packets in compatibility mode in case the egress it discovers is a | packets in compatibility mode in case the egress it discovers is a | |||
| legacy egress. If, through the discovery protocol, the egress | legacy egress. If, through the discovery protocol, the egress | |||
| indicates that it is compliant with the present specification, with | indicates that it is compliant with the present specification, with | |||
| RFC4301 or with RFC3168 full functionality mode, the ingress can | RFC4301 or with RFC3168 full functionality mode, the ingress can | |||
| switch itself into normal mode. If the egress denies compliance with | switch itself into normal mode. If the egress denies compliance with | |||
| any of these or returns an error that implies it does not understand | any of these or returns an error that implies it does not understand | |||
| a request to work to any of these ECN specifications, the tunnel | a request to work to any of these ECN specifications, the tunnel | |||
| ingress MUST remain in compatibility mode. | ingress MUST remain in compatibility mode. | |||
| An ingress cannot claim compliance with this specification simply by | An ingress cannot claim compliance with this specification simply by | |||
| disabling ECN processing across the tunnel (i.e. only implementing | permanently disabling ECN processing across the tunnel (i.e. only | |||
| compatibility mode). It is true that such a tunnel ingress is at | implementing compatibility mode). It is true that such a tunnel | |||
| least safe with the ECN behaviour of any egress it may encounter, but | ingress is at least safe with the ECN behaviour of any egress it may | |||
| it does not meet the aim of introducing ECN support to tunnels. | encounter, but it does not meet the aim of introducing ECN support to | |||
| tunnels. | ||||
| Implementation note: if a compliant node is the ingress for multiple | Implementation note: if a compliant node is the ingress for multiple | |||
| tunnels, a mode setting will need to be stored for each tunnel | tunnels, a mode setting will need to be stored for each tunnel | |||
| ingress. However, if a node is the egress for multiple tunnels, none | ingress. However, if a node is the egress for multiple tunnels, none | |||
| of the tunnels will need to store a mode setting, because a compliant | of the tunnels will need to store a mode setting, because a compliant | |||
| egress can only be in one mode. | egress can only be in one mode. | |||
| 4.4. Single Mode of Decapsulation | 4.4. Single Mode of Decapsulation | |||
| A compliant decapsulator only has one mode of operation. However, if | A compliant decapsulator only has one mode of operation. However, if | |||
| skipping to change at page 18, line 31 | skipping to change at page 19, line 16 | |||
| negotiate to use limited functionality or full functionality mode | negotiate to use limited functionality or full functionality mode | |||
| [RFC3168]. In all these cases, a decapsulating tunnel egress | [RFC3168]. In all these cases, a decapsulating tunnel egress | |||
| compliant with this specification MUST agree to any of these | compliant with this specification MUST agree to any of these | |||
| requests, since it will behave identically in all these cases. | requests, since it will behave identically in all these cases. | |||
| If no ECN-related mode is requested, a compliant tunnel egress MUST | If no ECN-related mode is requested, a compliant tunnel egress MUST | |||
| continue without raising any error or warning as its egress behaviour | continue without raising any error or warning as its egress behaviour | |||
| is compatible with all the legacy ingress behaviours that do not | is compatible with all the legacy ingress behaviours that do not | |||
| negotiate capabilities. | negotiate capabilities. | |||
| For 'forward compatibility', a compliant tunnel egress SHOULD raise a | A compliant tunnel egress SHOULD raise a warning alarm about any | |||
| warning alarm about any requests to enter modes it does not | requests to enter modes it does not recognise but, for 'forward | |||
| recognise, but it SHOULD continue operating. | compatibility' with standards actions possibly defined after it was | |||
| implemented, it SHOULD continue operating. | ||||
| 5. Updates to Earlier RFCs | 5. Updates to Earlier RFCs | |||
| 5.1. Changes to RFC4301 ECN processing | 5.1. Changes to RFC4301 ECN processing | |||
| Ingress: An RFC4301 IPsec encapsulator is not changed at all by the | Ingress: An RFC4301 IPsec encapsulator is not changed at all by the | |||
| present specification | present specification | |||
| Egress: The new decapsulation behaviour in Figure 4 updates RFC4301. | Egress: The new decapsulation behaviour in Figure 4 updates RFC4301. | |||
| However, it solely updates combinations of inner and outer that | However, it solely updates combinations of inner and outer that | |||
| have never been used on the Internet, even though they were | would never result from any protocol defined in the RFC series so | |||
| defined in RFC4301 for completeness. Therefore, the present | far, even though they were catered for in RFC4301 for | |||
| specification adds new behaviours to RFC4301 decapsulation without | completeness. Therefore, the present specification adds new | |||
| altering existing behaviours. The following specific updates have | behaviours to RFC4301 decapsulation without altering existing | |||
| been made: | behaviours. The following specific updates have been made: | |||
| * The outer, not the inner, is propagated when the outer is | * The outer, not the inner, is propagated when the outer is | |||
| ECT(1) and the inner is ECT(0); | ECT(1) and the inner is ECT(0); | |||
| * A packet with Not-ECT in the inner and an outer of ECT(1) or CE | * A packet with Not-ECT in the inner and an outer of CE is | |||
| is dropped rather than forwarded as Not-ECT; | dropped rather than forwarded as Not-ECT; | |||
| * Certain combinations of inner and outer ECN field have been | * Certain combinations of inner and outer ECN field have been | |||
| identified as currently unused. These can trigger logging | identified as currently unused. These can trigger logging | |||
| and/or raise alarms. | and/or raise alarms. | |||
| Modes: RFC4301 does not need modes and is not updated by the modes | Modes: RFC4301 does not need modes and is not updated by the modes | |||
| in the present specification. The normal mode of encapsulation is | in the present specification. The normal mode of encapsulation is | |||
| unchanged from RFC4301 encapsulation and an RFC4301 IPsec ingress | unchanged from RFC4301 encapsulation and an RFC4301 IPsec ingress | |||
| will never need compatibility mode as explained in Section 4.3 | will never need compatibility mode as explained in Section 4.3 | |||
| (except in one corner-case described below). | (except in one corner-case described below). | |||
| One corner case can exist where an RFC4301 ingress does not use | One corner case can exist where an RFC4301 ingress does not use | |||
| IKEv2, but uses manual keying instead. Then an RFC4301 ingress | IKEv2, but uses manual keying instead. Then an RFC4301 ingress | |||
| could conceivably be configured to tunnel to an egress with | could conceivably be configured to tunnel to an egress with | |||
| limited functionality ECN handling. Strictly, for this corner- | limited functionality ECN handling. Strictly, for this corner- | |||
| case, the requirement to use compatibility mode in this | case, the requirement to use compatibility mode in this | |||
| specification updates RFC4301. However, this is such a remote | specification updates RFC4301. However, this is such a remote | |||
| possibility that in general RFC4301 IPsec implementations are NOT | possibility that RFC4301 IPsec implementations are NOT REQUIRED to | |||
| REQUIRED to implement compatibility mode. | implement compatibility mode. | |||
| 5.2. Changes to RFC3168 ECN processing | 5.2. Changes to RFC3168 ECN processing | |||
| Ingress: On encapsulation, the new rule in Figure 3 that a normal | Ingress: On encapsulation, the new rule in Figure 3 that a normal | |||
| mode tunnel ingress copies any ECN field into the outer header | mode tunnel ingress copies any ECN field into the outer header | |||
| updates the ingress behaviour of RFC3168. Nonetheless, the new | updates the ingress behaviour of RFC3168. Nonetheless, the new | |||
| compatibility mode is identical to the limited functionality mode | compatibility mode is identical to the limited functionality mode | |||
| of RFC3168. | of RFC3168. | |||
| Egress: The new decapsulation behaviour in Figure 4 updates RFC3168. | Egress: The new decapsulation behaviour in Figure 4 updates RFC3168. | |||
| However, the present specification solely updates combinations of | However, the present specification solely updates combinations of | |||
| inner and outer that have never been used on the Internet, even | inner and outer that would never result from any protocol defined | |||
| though they were defined in RFC3168 for completeness. Therefore, | in the RFC series so far, even though they were catered for in | |||
| the present specification adds new behaviours to RFC3168 | RFC3168 for completeness. Therefore, the present specification | |||
| decapsulation without altering existing behaviours. The following | adds new behaviours to RFC3168 decapsulation without altering | |||
| specific updates have been made: | existing behaviours. The following specific updates have been | |||
| made: | ||||
| * The outer, not the inner, is propagated when the outer is | * The outer, not the inner, is propagated when the outer is | |||
| ECT(1) and the inner is ECT(0); | ECT(1) and the inner is ECT(0); | |||
| * A packet with Not-ECT in the inner and an outer of ECT(1) is | ||||
| dropped rather than forwarded as Not-ECT; | ||||
| * Certain combinations of inner and outer ECN field have been | * Certain combinations of inner and outer ECN field have been | |||
| identified as currently unused. These can trigger logging | identified as currently unused. These can trigger logging | |||
| and/or raise alarms. | and/or raise alarms. | |||
| Modes: RFC3168 defines a (required) limited functionality mode and | Modes: RFC3168 defines a (required) limited functionality mode and | |||
| an (optional) full functionality mode for a tunnel. In RFC3168, | an (optional) full functionality mode for a tunnel. In RFC3168, | |||
| modes applied to both ends of the tunnel, while in the present | modes applied to both ends of the tunnel, while in the present | |||
| specification, modes are only used at the ingress--a single egress | specification, modes are only used at the ingress--a single egress | |||
| behaviour covers all cases. The normal mode of encapsulation | behaviour covers all cases. The normal mode of encapsulation | |||
| updates the encapsulation behaviour of the full functionality mode | updates the encapsulation behaviour of the full functionality mode | |||
| skipping to change at page 20, line 38 | skipping to change at page 21, line 21 | |||
| both RFC4301 IPsec [RFC4301] and IP in MPLS or MPLS in MPLS | both RFC4301 IPsec [RFC4301] and IP in MPLS or MPLS in MPLS | |||
| encapsulation [RFC5129] construct the ECN field. | encapsulation [RFC5129] construct the ECN field. | |||
| Compatibility mode has also been defined so a non-RFC4301 ingress can | Compatibility mode has also been defined so a non-RFC4301 ingress can | |||
| still switch to using drop across a tunnel for backwards | still switch to using drop across a tunnel for backwards | |||
| compatibility with legacy decapsulators that do not propagate ECN | compatibility with legacy decapsulators that do not propagate ECN | |||
| correctly. | correctly. | |||
| The trigger that motivated this update to RFC3168 encapsulation was a | The trigger that motivated this update to RFC3168 encapsulation was a | |||
| standards track proposal for pre-congestion notification (PCN | standards track proposal for pre-congestion notification (PCN | |||
| [I-D.ietf-pcn-marking-behaviour]). PCN excess rate marking only | [RFC5670]). PCN excess rate marking only works correctly if the ECN | |||
| works correctly if the ECN field is copied on encapsulation (as in | field is copied on encapsulation (as in RFC4301 and RFC5129); it does | |||
| RFC4301 and RFC5129); it does not work if ECN is reset (as in | not work if ECN is reset (as in RFC3168). This is because PCN excess | |||
| RFC3168). This is because PCN excess rate marking depends on the | rate marking depends on the outer header revealing any congestion | |||
| outer header revealing any congestion experienced so far on the whole | experienced so far on the whole path, not just since the last tunnel | |||
| path, not just since the last tunnel ingress (see Appendix E for a | ingress (see Appendix E for a full explanation). | |||
| full explanation). | ||||
| PCN allows a network operator to add flow admission and termination | PCN allows a network operator to add flow admission and termination | |||
| for inelastic traffic at the edges of a Diffserv domain, but without | for inelastic traffic at the edges of a Diffserv domain, but without | |||
| any per-flow mechanisms in the interior and without the generous | any per-flow mechanisms in the interior and without the generous | |||
| provisioning typical of Diffserv, aiming to significantly reduce | provisioning typical of Diffserv, aiming to significantly reduce | |||
| costs. The PCN architecture [RFC5559] states that RFC3168 IP in IP | costs. The PCN architecture [RFC5559] states that RFC3168 IP in IP | |||
| tunnelling of the ECN field cannot be used for any tunnel ingress in | tunnelling of the ECN field cannot be used for any tunnel ingress in | |||
| a PCN domain. Prior to the present specification, this left a stark | a PCN domain. Prior to the present specification, this left a stark | |||
| choice between not being able to use PCN for inelastic traffic | choice between not being able to use PCN for inelastic traffic | |||
| control or not being able to use the many tunnels already deployed | control or not being able to use the many tunnels already deployed | |||
| skipping to change at page 21, line 45 | skipping to change at page 22, line 28 | |||
| preferable. | preferable. | |||
| o From the traffic security perspective (enforcing congestion | o From the traffic security perspective (enforcing congestion | |||
| control, mitigating denial of service etc) copying is preferable. | control, mitigating denial of service etc) copying is preferable. | |||
| o From the information security perspective resetting is preferable, | o From the information security perspective resetting is preferable, | |||
| but the IETF Security Area now considers copying acceptable given | but the IETF Security Area now considers copying acceptable given | |||
| the bandwidth of a 2-bit covert channel can be managed. | the bandwidth of a 2-bit covert channel can be managed. | |||
| Therefore there are two points against resetting CE on ingress while | Therefore there are two points against resetting CE on ingress while | |||
| copying CE causes no harm (other than opening a 2-bit covert channel | copying CE causes no significant harm. | |||
| that is deemed manageable). | ||||
| 5.3.2. Motivation for Changing Decapsulation | 5.3.2. Motivation for Changing Decapsulation | |||
| The specification for decapsulation in Section 4 fixes three problems | The specification for decapsulation in Section 4 fixes three problems | |||
| with the pre-existing behaviours of both RFC3168 and RFC4301: | with the pre-existing behaviours of both RFC3168 and RFC4301: | |||
| 1. The pre-existing rules prevented the introduction of alternate | 1. The pre-existing rules prevented the introduction of alternate | |||
| ECN semantics to signal more than one severity level of | ECN semantics to signal more than one severity level of | |||
| congestion [RFC4774], [RFC5559]. The four states of the 2-bit | congestion [RFC4774], [RFC5559]. The four states of the 2-bit | |||
| ECN field provide room for signalling two severity levels in | ECN field provide room for signalling two severity levels in | |||
| skipping to change at page 23, line 8 | skipping to change at page 23, line 39 | |||
| the box was deployed, often on the grounds that anything | the box was deployed, often on the grounds that anything | |||
| unexpected might be an attack. This tends to bar future use of | unexpected might be an attack. This tends to bar future use of | |||
| CU values. The new decapsulation rules specify optional logging | CU values. The new decapsulation rules specify optional logging | |||
| and/or alarms for specific combinations of inner and outer header | and/or alarms for specific combinations of inner and outer header | |||
| that are currently unused. The aim is to give implementers a | that are currently unused. The aim is to give implementers a | |||
| recourse other than drop if they are concerned about the security | recourse other than drop if they are concerned about the security | |||
| of CU values. It recognises legitimate security concerns about | of CU values. It recognises legitimate security concerns about | |||
| CU values but still eases their future use. If the alarms are | CU values but still eases their future use. If the alarms are | |||
| interpreted as an attack (e.g. by a management system) the | interpreted as an attack (e.g. by a management system) the | |||
| offending packets can be dropped. But alarms can be turned off | offending packets can be dropped. But alarms can be turned off | |||
| if these combinations come into use (e.g. a through a future | if these combinations come into regular use (e.g. through a | |||
| standards action). | future standards action). | |||
| 3. While reviewing currently unused combinations of inner and outer, | 3. While reviewing currently unused combinations of inner and outer, | |||
| the opportunity was taken to define a single consistent behaviour | the opportunity was taken to define a single consistent behaviour | |||
| for the cases with a Not-ECT inner header but a different outer. | for the three cases with a Not-ECT inner header but a different | |||
| RFC3168 and RFC4301 had diverged in this respect. These | outer. RFC3168 and RFC4301 had diverged in this respect. None | |||
| combinations should not result from known Internet protocols. | of these combinations should result from Internet protocols in | |||
| So, for safety, it was decided to drop a packet if the outer | the RFC series, but future standards actions might put any or all | |||
| carries codepoints CE or ECT(1) that respectively signal | of them to good use. Therefore it was decided that a | |||
| congestion or could potentially signal congestion in a scheme | decapsulator must forward a Not-ECT inner unchanged, even if the | |||
| progressing through the IETF [I-D.ietf-pcn-3-in-1-encoding]. | arriving outer was ECT(0) or ECT(1). But for safety it should | |||
| Given an inner of Not-ECT implies the transport only understands | drop a combination of Not-ECT inner and CE outer. Then, if some | |||
| drop as a signal of congestion, this was the safest course of | unfortunate misconfiguration resulted in a congested router | |||
| action. | marking CE on a packet that was originally Not-ECT, drop would be | |||
| the only appropriate signal for the egress to propagate--the only | ||||
| signal a non-ECN-capable transport (Not-ECT) would understand. | ||||
| A decapsulator can forward a Not-ECT inner unchanged if its outer | ||||
| is ECT(1), even though ECT(1) is being proposed as an | ||||
| intermediate level of congestion in a scheme progressing through | ||||
| the IETF [I-D.ietf-pcn-3-in-1-encoding]. The rationale is to | ||||
| ensure this CU combination will be usable if needed in the | ||||
| future. If any misconfiguration led to ECT(1) congestion signals | ||||
| with a Not-ECT inner, it would not be disastrous for the tunnel | ||||
| egress to suppress them, because the congestion should then | ||||
| escalate to CE marking, which the egress would drop, thus at | ||||
| least preventing congestion collapse. | ||||
| Problems 2 & 3 alone would not warrant a change to decapsulation, but | Problems 2 & 3 alone would not warrant a change to decapsulation, but | |||
| it was decided they are worth fixing and making consistent at the | it was decided they are worth fixing and making consistent at the | |||
| same time as decapsulation code is changed to fix problem 1 (two | same time as decapsulation code is changed to fix problem 1 (two | |||
| congestion severity-levels). | congestion severity-levels). | |||
| 6. Backward Compatibility | 6. Backward Compatibility | |||
| A tunnel endpoint compliant with the present specification is | A tunnel endpoint compliant with the present specification is | |||
| backward compatible when paired with any tunnel endpoint compliant | backward compatible when paired with any tunnel endpoint compliant | |||
| skipping to change at page 25, line 13 | skipping to change at page 26, line 8 | |||
| ECN (limited functionality mode) if it is paired with a legacy egress | ECN (limited functionality mode) if it is paired with a legacy egress | |||
| (RFC 2481, RFC2401 or RFC2003), which would not propagate ECN | (RFC 2481, RFC2401 or RFC2003), which would not propagate ECN | |||
| correctly. The present specification carries forward those rules | correctly. The present specification carries forward those rules | |||
| (Section 4.3). It uses compatibility mode whenever RFC3168 would | (Section 4.3). It uses compatibility mode whenever RFC3168 would | |||
| have used limited functionality mode, and their per-packet behaviours | have used limited functionality mode, and their per-packet behaviours | |||
| are identical. Therefore, all other things being equal, an ingress | are identical. Therefore, all other things being equal, an ingress | |||
| using the new rules will interwork with any legacy tunnel egress in | using the new rules will interwork with any legacy tunnel egress in | |||
| exactly the same way as an RFC3168 ingress (still black-box backward | exactly the same way as an RFC3168 ingress (still black-box backward | |||
| compatible). | compatible). | |||
| 7. Design Principles for Future Non-Default Schemes | 7. Design Principles for Alternate ECN Tunnelling Semantics | |||
| This section is informative not normative. | This section is informative not normative. | |||
| S.5 of RFC3168 permits the Diffserv codepoint (DSCP)[RFC2474] to | S.5 of RFC3168 permits the Diffserv codepoint (DSCP)[RFC2474] to | |||
| 'switch in' alternative behaviours for marking the ECN field, just as | 'switch in' alternative behaviours for marking the ECN field, just as | |||
| it switches in different per-hop behaviours (PHBs) for scheduling. | it switches in different per-hop behaviours (PHBs) for scheduling. | |||
| [RFC4774] gives best current practice for designing such alternative | [RFC4774] gives best current practice for designing such alternative | |||
| ECN semantics and very briefly mentions that tunnelling should be | ECN semantics and very briefly mentions in section 5.4 that | |||
| considered. Here we give additional guidance on designing alternate | tunnelling should be considered. The guidance below extends RFC4774, | |||
| ECN semantics that would also require alternate tunnelling semantics. | giving additional guidance on designing any alternate ECN semantics | |||
| that would also require alternate tunnelling semantics. | ||||
| In one word the guidance is "Don't". If a scheme requires tunnels to | The overriding guidance is: "Avoid designing alternate ECN tunnelling | |||
| semantics, if at all possible." If a scheme requires tunnels to | ||||
| implement special processing of the ECN field for certain DSCPs, it | implement special processing of the ECN field for certain DSCPs, it | |||
| is highly unlikely that every implementer of every tunnel will want | will be hard to guarantee that every implementer of every tunnel will | |||
| to add the required exception and that operators will want to deploy | have added the required exception or that operators will have | |||
| the required configuration options. Therefore it is highly likely | ubiquitously deployed the required updates. It is unlikely a single | |||
| that some tunnels within a network will not implement the required | authority is even aware of all the tunnels in a network, which may | |||
| special case. Therefore, designers of new protocols should avoid | include tunnels set up by applications between endpoints, or | |||
| non-default tunnelling schemes if at all possible. | dynamically created in the network. Therefore it is highly likely | |||
| that some tunnels within a network or on hosts connected to it will | ||||
| not implement the required special case. | ||||
| That said, if a non-default scheme for tunnelling the ECN field is | That said, if a non-default scheme for tunnelling the ECN field is | |||
| really required, the following guidelines may prove useful in its | really required, the following guidelines may prove useful in its | |||
| design: | design: | |||
| On encapsulation in any new scheme: | On encapsulation in any alternate scheme: | |||
| 1. The ECN field of the outer header should be cleared to Not-ECT | 1. The ECN field of the outer header should be cleared to Not-ECT | |||
| ("00") unless it is guaranteed that the corresponding tunnel | ("00") unless it is guaranteed that the corresponding tunnel | |||
| egress will correctly propagate congestion markings introduced | egress will correctly propagate congestion markings introduced | |||
| across the tunnel in the outer header. | across the tunnel in the outer header. | |||
| 2. If it has established that ECN will be correctly propagated, | 2. If it has established that ECN will be correctly propagated, | |||
| an encapsulator should also copy incoming congestion | an encapsulator should also copy incoming congestion | |||
| notification into the outer header. The general principle | notification into the outer header. The general principle | |||
| here is that the outer header should reflect congestion | here is that the outer header should reflect congestion | |||
| skipping to change at page 26, line 25 | skipping to change at page 27, line 25 | |||
| Then the code module doing encapsulation can keep to the | Then the code module doing encapsulation can keep to the | |||
| copying rule and the load regulator module can reset | copying rule and the load regulator module can reset | |||
| congestion, without any code in either module being | congestion, without any code in either module being | |||
| conditional on whether the other is there. | conditional on whether the other is there. | |||
| On decapsulation in any new scheme: | On decapsulation in any new scheme: | |||
| 1. If the arriving inner header is Not-ECT it implies the | 1. If the arriving inner header is Not-ECT it implies the | |||
| transport will not understand other ECN codepoints. If the | transport will not understand other ECN codepoints. If the | |||
| outer header carries an explicit congestion marking, the | outer header carries an explicit congestion marking, the | |||
| packet should be dropped--the only indication of congestion | alternate scheme will probably need to drop the packet--the | |||
| the transport will understand. If the outer carries any other | only indication of congestion the transport will understand. | |||
| ECN codepoint the packet can be forwarded, but only as Not- | If the outer carries any other ECN codepoint that does not | |||
| ECT. | indicate congestion, the alternate scheme can forward the | |||
| packet, but probably only as Not-ECT. | ||||
| 2. If the arriving inner header is other than Not-ECT, the ECN | 2. If the arriving inner header is other than Not-ECT, the ECN | |||
| field that the tunnel egress forwards should reflect the more | field that the alternate decapsulation scheme forwards should | |||
| severe congestion marking of the arriving inner and outer | reflect the more severe congestion marking of the arriving | |||
| headers. | inner and outer headers. | |||
| 3. If a combination of inner and outer headers is encountered | 3. Any alternate scheme MUST define a behaviour for all | |||
| that is not currently used in known standards, this event | combinations of inner and outer headers, even those that would | |||
| should be logged and an alarm raised. This is a preferable | not be expected to result from standards known at the time and | |||
| approach to dropping currently unused combinations in case | even those that would not be expected from the tunnel ingress | |||
| they represent an attack. The new scheme should try to define | paired with the egress at run-time. Consideration should be | |||
| a way to forward such packets, but only if a safe outgoing | given to logging such unexpected combinations and raising an | |||
| codepoint can be defined. | alarm, particularly if there is a danger that the invalid | |||
| combination implies congestion signals are not being | ||||
| propagated correctly. The presence of currently unused | ||||
| combinations may represent an attack, but the new scheme | ||||
| should try to define a way to forward such packets, at least | ||||
| if a safe outgoing codepoint can be defined. Raising an alarm | ||||
| to warn of the possibility of an attack is a preferable | ||||
| approach to dropping that ensures these combinations can be | ||||
| usable in future standards actions. | ||||
| 8. IANA Considerations | IANA Considerations (to be removed on publication): | |||
| This memo includes no request to IANA. | This memo includes no request to IANA. | |||
| 9. Security Considerations | 8. Security Considerations | |||
| Appendix B.1 discusses the security constraints imposed on ECN tunnel | Appendix B.1 discusses the security constraints imposed on ECN tunnel | |||
| processing. The new rules for ECN tunnel processing (Section 4) | processing. The new rules for ECN tunnel processing (Section 4) | |||
| trade-off between information security (covert channels) and | trade-off between information security (covert channels) and | |||
| congestion monitoring & control. In fact, ensuring congestion | congestion monitoring & control. In fact, ensuring congestion | |||
| markings are not lost is itself another aspect of security, because | markings are not lost is itself another aspect of security, because | |||
| if we allowed congestion notification to be lost, any attempt to | if we allowed congestion notification to be lost, any attempt to | |||
| enforce a response to congestion would be much harder. | enforce a response to congestion would be much harder. | |||
| Specialist security issues: | Specialist security issues: | |||
| skipping to change at page 28, line 8 | skipping to change at page 29, line 19 | |||
| 'I' will set all ECN fields in outer headers to Not-ECT, 'M' could | 'I' will set all ECN fields in outer headers to Not-ECT, 'M' could | |||
| still toggle CE or ECT(1) on and off to communicate covertly with | still toggle CE or ECT(1) on and off to communicate covertly with | |||
| 'B', because we have specified that 'E' only has one mode | 'B', because we have specified that 'E' only has one mode | |||
| regardless of what mode it says it has negotiated. We could have | regardless of what mode it says it has negotiated. We could have | |||
| specified that 'E' should have a limited functionality mode and | specified that 'E' should have a limited functionality mode and | |||
| check for such behaviour. But we decided not to add the extra | check for such behaviour. But we decided not to add the extra | |||
| complexity of two modes on a compliant tunnel egress merely to | complexity of two modes on a compliant tunnel egress merely to | |||
| cater for an historic security concern that is now considered | cater for an historic security concern that is now considered | |||
| manageable. | manageable. | |||
| 10. Conclusions | 9. Conclusions | |||
| This document uses previously unused combinations of inner and outer | This document uses previously unused combinations of inner and outer | |||
| header to augment the rules for calculating the ECN field when | header to augment the rules for calculating the ECN field when | |||
| decapsulating IP packets at the egress of IPsec (RFC4301) and non- | decapsulating IP packets at the egress of IPsec (RFC4301) and non- | |||
| IPsec (RFC3168) tunnels. In this way it allows tunnels to propagate | IPsec (RFC3168) tunnels. In this way it allows tunnels to propagate | |||
| an extra level of congestion severity. | an extra level of congestion severity. | |||
| This document also updates the ingress tunnelling encapsulation of | This document also updates the ingress tunnelling encapsulation of | |||
| RFC3168 ECN to bring all IP in IP tunnels into line with the new | RFC3168 ECN to bring all IP in IP tunnels into line with the new | |||
| behaviour in the IPsec architecture of RFC4301, which copies rather | behaviour in the IPsec architecture of RFC4301, which copies rather | |||
| skipping to change at page 28, line 33 | skipping to change at page 29, line 44 | |||
| standards track. Operators wanting to support PCN or other alternate | standards track. Operators wanting to support PCN or other alternate | |||
| ECN schemes that use an extra severity level can require that their | ECN schemes that use an extra severity level can require that their | |||
| tunnels comply with the present specification. Nonetheless, as part | tunnels comply with the present specification. Nonetheless, as part | |||
| of general code maintenance, any tunnel can safely be updated to | of general code maintenance, any tunnel can safely be updated to | |||
| comply with this specification, because it is backward compatible | comply with this specification, because it is backward compatible | |||
| with all previous tunnelling behaviours which will continue to work | with all previous tunnelling behaviours which will continue to work | |||
| as before--just using one severity level. | as before--just using one severity level. | |||
| The new rules propagate changes to the ECN field across tunnel end- | The new rules propagate changes to the ECN field across tunnel end- | |||
| points that previously blocked them to restrict the bandwidth of a | points that previously blocked them to restrict the bandwidth of a | |||
| potential covert channel. But limiting the channel's bandwidth to 2 | potential covert channel. Limiting the channel's bandwidth to 2 bits | |||
| bits per packet is now considered sufficient. | per packet is now considered sufficient. | |||
| At the same time as removing these legacy constraints, the | At the same time as removing these legacy constraints, the | |||
| opportunity has been taken to draw together diverging tunnel | opportunity has been taken to draw together diverging tunnel | |||
| specifications into a single consistent behaviour. Then any tunnel | specifications into a single consistent behaviour. Then any tunnel | |||
| can be deployed unilaterally, and it will support the full range of | can be deployed unilaterally, and it will support the full range of | |||
| congestion control and management schemes without any modes or | congestion control and management schemes without any modes or | |||
| configuration. Further, any host or router can expect the ECN field | configuration. Further, any host or router can expect the ECN field | |||
| to behave in the same way, whatever type of tunnel might intervene in | to behave in the same way, whatever type of tunnel might intervene in | |||
| the path. This new certainty could enable new uses of the ECN field | the path. This new certainty could enable new uses of the ECN field | |||
| that would otherwise be confounded by ambiguity. | that would otherwise be confounded by ambiguity. | |||
| 11. Acknowledgements | 10. Acknowledgements | |||
| Thanks to Anil Agawaal for pointing out a case where it's safe for a | Thanks to Anil Agawaal for pointing out a case where it's safe for a | |||
| tunnel decapsulator to forward a combination of headers it does not | tunnel decapsulator to forward a combination of headers it does not | |||
| understand. Thanks to David Black for explaining a better way to | understand. Thanks to David Black for explaining a better way to | |||
| think about function placement. Also thanks to Arnaud Jacquet for | think about function placement. Also thanks to Arnaud Jacquet for | |||
| the idea for Appendix C. Thanks to Michael Menth, Bruce Davie, Toby | the idea for Appendix C. Thanks to Michael Menth, Bruce Davie, Toby | |||
| Moncaster, Gorry Fairhurst, Sally Floyd, Alfred Hoenes, Gabriele | Moncaster, Gorry Fairhurst, Sally Floyd, Alfred Hoenes, Gabriele | |||
| Corliano, Ingemar Johansson, David Black and Phil Eardley for their | Corliano, Ingemar Johansson, David Black and Phil Eardley for their | |||
| thoughts and careful review comments. | thoughts and careful review comments. | |||
| Bob Briscoe is partly funded by Trilogy, a research project (ICT- | Bob Briscoe is partly funded by Trilogy, a research project (ICT- | |||
| 216372) supported by the European Community under its Seventh | 216372) supported by the European Community under its Seventh | |||
| Framework Programme. The views expressed here are those of the | Framework Programme. The views expressed here are those of the | |||
| author only. | author only. | |||
| 12. Comments Solicited | Comments Solicited (to be removed by the RFC Editor): | |||
| Comments and questions are encouraged and very welcome. They can be | Comments and questions are encouraged and very welcome. They can be | |||
| addressed to the IETF Transport Area working group mailing list | addressed to the IETF Transport Area working group mailing list | |||
| <tsvwg@ietf.org>, and/or to the authors. | <tsvwg@ietf.org>, and/or to the authors. | |||
| 13. References | 11. References | |||
| 13.1. Normative References | 11.1. Normative References | |||
| [RFC2003] Perkins, C., "IP Encapsulation | [RFC2003] Perkins, C., "IP Encapsulation | |||
| within IP", RFC 2003, October 1996. | within IP", RFC 2003, October 1996. | |||
| [RFC2119] Bradner, S., "Key words for use in | [RFC2119] Bradner, S., "Key words for use in | |||
| RFCs to Indicate Requirement | RFCs to Indicate Requirement | |||
| Levels", BCP 14, RFC 2119, | Levels", BCP 14, RFC 2119, | |||
| March 1997. | March 1997. | |||
| [RFC3168] Ramakrishnan, K., Floyd, S., and D. | [RFC3168] Ramakrishnan, K., Floyd, S., and D. | |||
| Black, "The Addition of Explicit | Black, "The Addition of Explicit | |||
| Congestion Notification (ECN) to | Congestion Notification (ECN) to | |||
| IP", RFC 3168, September 2001. | IP", RFC 3168, September 2001. | |||
| [RFC4301] Kent, S. and K. Seo, "Security | [RFC4301] Kent, S. and K. Seo, "Security | |||
| Architecture for the Internet | Architecture for the Internet | |||
| Protocol", RFC 4301, December 2005. | Protocol", RFC 4301, December 2005. | |||
| 13.2. Informative References | 11.2. Informative References | |||
| [I-D.ietf-pcn-3-in-1-encoding] Briscoe, B. and T. Moncaster, "PCN | [I-D.ietf-pcn-3-in-1-encoding] Briscoe, B. and T. Moncaster, "PCN | |||
| 3-State Encoding Extension in a | 3-State Encoding Extension in a | |||
| single DSCP", | single DSCP", | |||
| draft-ietf-pcn-3-in-1-encoding-00 | draft-ietf-pcn-3-in-1-encoding-00 | |||
| (work in progress), July 2009. | (work in progress), July 2009. | |||
| [I-D.ietf-pcn-3-state-encoding] Moncaster, T., Briscoe, B., and M. | [I-D.ietf-pcn-3-state-encoding] Moncaster, T., Briscoe, B., and M. | |||
| Menth, "A PCN encoding using 2 | Menth, "A PCN encoding using 2 | |||
| DSCPs to provide 3 or more states", | DSCPs to provide 3 or more states", | |||
| draft-ietf-pcn-3-state-encoding-00 | ||||
| (work in progress), April 2009. | (work in progress), April 2009. | |||
| [I-D.ietf-pcn-baseline-encoding] Moncaster, T., Briscoe, B., and M. | ||||
| Menth, "Baseline Encoding and | ||||
| Transport of Pre-Congestion | ||||
| Information", | ||||
| draft-ietf-pcn-baseline-encoding-07 | ||||
| (work in progress), September 2009. | ||||
| [I-D.ietf-pcn-marking-behaviour] Eardley, P., "Metering and marking | ||||
| behaviour of PCN-nodes", | ||||
| draft-ietf-pcn-marking-behaviour-05 | ||||
| (work in progress), August 2009. | ||||
| [I-D.ietf-pcn-psdm-encoding] Menth, M., Babiarz, J., Moncaster, | [I-D.ietf-pcn-psdm-encoding] Menth, M., Babiarz, J., Moncaster, | |||
| T., and B. Briscoe, "PCN Encoding | T., and B. Briscoe, "PCN Encoding | |||
| for Packet-Specific Dual Marking | for Packet-Specific Dual Marking | |||
| (PSDM)", | (PSDM)", | |||
| draft-ietf-pcn-psdm-encoding-00 | draft-ietf-pcn-psdm-encoding-00 | |||
| (work in progress), June 2009. | (work in progress), June 2009. | |||
| [I-D.ietf-pcn-sm-edge-behaviour] Charny, A., Karagiannis, G., Menth, | [I-D.ietf-pcn-sm-edge-behaviour] Charny, A., Karagiannis, G., Menth, | |||
| M., and T. Taylor, "PCN Boundary | M., and T. Taylor, "PCN Boundary | |||
| Node Behaviour for the Single | Node Behaviour for the Single | |||
| Marking (SM) Mode of Operation", | Marking (SM) Mode of Operation", | |||
| draft-ietf-pcn-sm-edge-behaviour-00 | draft-ietf-pcn-sm-edge-behaviour-01 | |||
| (work in progress), July 2009. | (work in progress), October 2009. | |||
| [I-D.satoh-pcn-st-marking] Satoh, D., Ueno, H., Maeda, Y., and | [I-D.satoh-pcn-st-marking] Satoh, D., Ueno, H., Maeda, Y., and | |||
| O. Phanachet, "Single PCN Threshold | O. Phanachet, "Single PCN Threshold | |||
| Marking by using PCN baseline | Marking by using PCN baseline | |||
| encoding for both admission and | encoding for both admission and | |||
| termination controls", | termination controls", | |||
| draft-satoh-pcn-st-marking-02 (work | draft-satoh-pcn-st-marking-02 (work | |||
| in progress), September 2009. | in progress), September 2009. | |||
| [RFC2401] Kent, S. and R. Atkinson, "Security | [RFC2401] Kent, S. and R. Atkinson, "Security | |||
| Architecture for the Internet | Architecture for the Internet | |||
| Protocol", RFC 2401, November 1998. | Protocol", RFC 2401, November 1998. | |||
| [RFC2474] Nichols, K., Blake, S., Baker, F., | [RFC2474] Nichols, K., Blake, S., Baker, F., | |||
| and D. Black, "Definition of the | and D. Black, "Definition of the | |||
| skipping to change at page 31, line 35 | skipping to change at page 32, line 34 | |||
| November 2006. | November 2006. | |||
| [RFC5129] Davie, B., Briscoe, B., and J. Tay, | [RFC5129] Davie, B., Briscoe, B., and J. Tay, | |||
| "Explicit Congestion Marking in | "Explicit Congestion Marking in | |||
| MPLS", RFC 5129, January 2008. | MPLS", RFC 5129, January 2008. | |||
| [RFC5559] Eardley, P., "Pre-Congestion | [RFC5559] Eardley, P., "Pre-Congestion | |||
| Notification (PCN) Architecture", | Notification (PCN) Architecture", | |||
| RFC 5559, June 2009. | RFC 5559, June 2009. | |||
| [RFC5670] Eardley, P., "Metering and Marking | ||||
| Behaviour of PCN-Nodes", RFC 5670, | ||||
| November 2009. | ||||
| [RFC5696] Moncaster, T., Briscoe, B., and M. | ||||
| Menth, "Baseline Encoding and | ||||
| Transport of Pre-Congestion | ||||
| Information", RFC 5696, | ||||
| November 2009. | ||||
| [VCP] Xia, Y., Subramanian, L., Stoica, | [VCP] Xia, Y., Subramanian, L., Stoica, | |||
| I., and S. Kalyanaraman, "One more | I., and S. Kalyanaraman, "One more | |||
| bit is enough", Proc. SIGCOMM'05, | bit is enough", Proc. SIGCOMM'05, | |||
| ACM CCR 35(4)37--48, 2005, <http:// | ACM CCR 35(4)37--48, 2005, <http:// | |||
| doi.acm.org/10.1145/ | doi.acm.org/10.1145/ | |||
| 1080091.1080098>. | 1080091.1080098>. | |||
| Appendix A. Early ECN Tunnelling RFCs | Appendix A. Early ECN Tunnelling RFCs | |||
| IP in IP tunnelling was originally defined in [RFC2003]. On | IP in IP tunnelling was originally defined in [RFC2003]. On | |||
| encapsulation, the incoming header was copied to the outer and on | encapsulation, the incoming header was copied to the outer and on | |||
| decapsulation the outer was simply discarded. Initially, IPsec | decapsulation the outer was simply discarded. Initially, IPsec | |||
| tunnelling [RFC2401] followed the same behaviour. | tunnelling [RFC2401] followed the same behaviour. | |||
| When ECN was introduced experimentally in [RFC2481], legacy (RFC2003 | When ECN was introduced experimentally in [RFC2481], legacy (RFC2003 | |||
| or RFC2401) tunnels would have discarded any congestion markings | or RFC2401) tunnels would have discarded any congestion markings | |||
| added to the outer header, so RFC2481 introduced rules for | added to the outer header, so RFC2481 introduced rules for | |||
| calculating the outgoing header from a combination of the inner and | calculating the outgoing header from a combination of the inner and | |||
| outer on decapsulation. RC2481 also introduced a second mode for | outer on decapsulation. RC2481 also introduced a second mode for | |||
| IPsec tunnels, which turned off ECN processing in the outer header | IPsec tunnels, which turned off ECN processing (Not-ECT) in the outer | |||
| (Not-ECT) on encapsulation because an RFC2401 decapsulator would | header on encapsulation because an RFC2401 decapsulator would discard | |||
| discard the outer on decapsulation. For RFC2401 IPsec this had the | the outer on decapsulation. For RFC2401 IPsec this had the side- | |||
| side-effect of completely blocking the covert channel. | effect of completely blocking the covert channel. | |||
| In RFC2481 the ECN field was defined as two separate bits. But when | In RFC2481 the ECN field was defined as two separate bits. But when | |||
| ECN moved from the experimental to the standards track [RFC3168], the | ECN moved from the experimental to the standards track [RFC3168], the | |||
| ECN field was redefined as four codepoints. This required a | ECN field was redefined as four codepoints. This required a | |||
| different calculation of the ECN field from that used in RFC2481 on | different calculation of the ECN field from that used in RFC2481 on | |||
| decapsulation. RFC3168 also had two modes; a 'full functionality | decapsulation. RFC3168 also had two modes; a 'full functionality | |||
| mode' that restricted the covert channel as much as possible but | mode' that restricted the covert channel as much as possible but | |||
| still allowed ECN to be used with IPsec, and another that completely | still allowed ECN to be used with IPsec, and another that completely | |||
| turned off ECN processing across the tunnel. This 'limited | turned off ECN processing across the tunnel. This 'limited | |||
| functionality mode' both offered a way for operators to completely | functionality mode' both offered a way for operators to completely | |||
| skipping to change at page 33, line 13 | skipping to change at page 34, line 23 | |||
| spans an unprotected internetwork where there may be 'men in the | spans an unprotected internetwork where there may be 'men in the | |||
| middle', M. | middle', M. | |||
| physically unprotected physically | physically unprotected physically | |||
| <-protected domain-><--domain--><-protected domain-> | <-protected domain-><--domain--><-protected domain-> | |||
| +------------------+ +------------------+ | +------------------+ +------------------+ | |||
| | | M | | | | | M | | | |||
| | A-------->I=========>==========>E-------->B | | | A-------->I=========>==========>E-------->B | | |||
| | | | | | | | | | | |||
| +------------------+ +------------------+ | +------------------+ +------------------+ | |||
| <----IPsec secured----> | <----IPsec secured----> | |||
| tunnel | tunnel | |||
| Figure 5: IPsec Tunnel Scenario | Figure 5: IPsec Tunnel Scenario | |||
| IPsec encryption is typically used to prevent 'M' seeing messages | IPsec encryption is typically used to prevent 'M' seeing messages | |||
| from 'A' to 'B'. IPsec authentication is used to prevent 'M' | from 'A' to 'B'. IPsec authentication is used to prevent 'M' | |||
| masquerading as the sender of messages from 'A' to 'B' or altering | masquerading as the sender of messages from 'A' to 'B' or altering | |||
| their contents. But 'I' can also use IPsec tunnel mode to allow 'A' | their contents. In addition 'I' can use IPsec tunnel mode to allow | |||
| to communicate with 'B', but impose encryption to prevent 'A' leaking | 'A' to communicate with 'B', but impose encryption to prevent 'A' | |||
| information to 'M'. Or 'E' can insist that 'I' uses tunnel mode | leaking information to 'M'. Or 'E' can insist that 'I' uses tunnel | |||
| authentication to prevent 'M' communicating information to 'B'. | mode authentication to prevent 'M' communicating information to 'B'. | |||
| Mutable IP header fields such as the ECN field (as well as the TTL/ | Mutable IP header fields such as the ECN field (as well as the TTL/ | |||
| Hop Limit and DS fields) cannot be included in the cryptographic | Hop Limit and DS fields) cannot be included in the cryptographic | |||
| calculations of IPsec. Therefore, if 'I' copies these mutable fields | calculations of IPsec. Therefore, if 'I' copies these mutable fields | |||
| into the outer header that is exposed across the tunnel it will have | into the outer header that is exposed across the tunnel it will have | |||
| allowed a covert channel from 'A' to M that bypasses its encryption | allowed a covert channel from 'A' to M that bypasses its encryption | |||
| of the inner header. And if 'E' copies these fields from the outer | of the inner header. And if 'E' copies these fields from the outer | |||
| header to the inner, even if it validates authentication from 'I', it | header to the inner, even if it validates authentication from 'I', it | |||
| will have allowed a covert channel from 'M' to 'B'. | will have allowed a covert channel from 'M' to 'B'. | |||
| ECN at the IP layer is designed to carry information about congestion | ECN at the IP layer is designed to carry information about congestion | |||
| from a congested resource towards downstream nodes. Typically a | from a congested resource towards downstream nodes. Typically a | |||
| downstream transport might feed the information back somehow to the | downstream transport might feed the information back somehow to the | |||
| point upstream of the congestion that can regulate the load on the | point upstream of the congestion that can regulate the load on the | |||
| congested resource, but other actions are possible (see [RFC3168] | congested resource, but other actions are possible (see [RFC3168] | |||
| S.6). In terms of the above unicast scenario, ECN effectively | S.6). In terms of the above unicast scenario, ECN effectively | |||
| intends to create an information channel (for congestion signalling) | intends to create an information channel (for congestion signalling) | |||
| from 'M' to 'B' (for 'B' to feed back to 'A'). Therefore the goals | from 'M' to 'B' (for 'B' to feed back to 'A'). Therefore the goals | |||
| of IPsec and ECN are mutually incompatible. | of IPsec and ECN are mutually incompatible, requiring some | |||
| compromise. | ||||
| With respect to the DS or ECN fields, S.5.1.2 of RFC4301 says, | With respect to the DS or ECN fields, S.5.1.2 of RFC4301 says, | |||
| "controls are provided to manage the bandwidth of this [covert] | "controls are provided to manage the bandwidth of this [covert] | |||
| channel". Using the ECN processing rules of RFC4301, the channel | channel". Using the ECN processing rules of RFC4301, the channel | |||
| bandwidth is two bits per datagram from 'A' to 'M' and one bit per | bandwidth is two bits per datagram from 'A' to 'M' and one bit per | |||
| datagram from 'M' to 'A' (because 'E' limits the combinations of the | datagram from 'M' to 'A' (because 'E' limits the combinations of the | |||
| 2-bit ECN field that it will copy). In both cases the covert channel | 2-bit ECN field that it will copy). In both cases the covert channel | |||
| bandwidth is further reduced by noise from any real congestion | bandwidth is further reduced by noise from any real congestion | |||
| marking. RFC4301 implies that these covert channels are sufficiently | marking. RFC4301 implies that these covert channels are sufficiently | |||
| limited to be considered a manageable threat. However, with respect | limited to be considered a manageable threat. However, with respect | |||
| skipping to change at page 35, line 20 | skipping to change at page 36, line 30 | |||
| 'B', otherwise congestion notification from resources like 'M' cannot | 'B', otherwise congestion notification from resources like 'M' cannot | |||
| be fed back to the Load Regulator ('A'). But it does not seem | be fed back to the Load Regulator ('A'). But it does not seem | |||
| necessary for 'I' to copy CE markings from the inner to the outer | necessary for 'I' to copy CE markings from the inner to the outer | |||
| header. For instance, if resource 'R' is congested, it can send | header. For instance, if resource 'R' is congested, it can send | |||
| congestion information to 'B' using the congestion field in the inner | congestion information to 'B' using the congestion field in the inner | |||
| header without 'I' copying the congestion field into the outer header | header without 'I' copying the congestion field into the outer header | |||
| and 'E' copying it back to the inner header. 'E' can still write any | and 'E' copying it back to the inner header. 'E' can still write any | |||
| additional congestion marking introduced across the tunnel into the | additional congestion marking introduced across the tunnel into the | |||
| congestion field of the inner header. | congestion field of the inner header. | |||
| It might be useful for the tunnel egress to be able to tell whether | ||||
| congestion occurred across a tunnel or upstream of it. If outer | ||||
| header congestion marking was reset by the tunnel ingress ('I'), at | ||||
| the end of a tunnel ('E') the outer headers would indicate congestion | ||||
| experienced across the tunnel ('I' to 'E'), while the inner header | ||||
| would indicate congestion upstream of 'I'. But similar information | ||||
| can be gleaned even if the tunnel ingress copies the inner to the | ||||
| outer headers. At the end of the tunnel ('E'), any packet with an | ||||
| _extra_ mark in the outer header relative to the inner header | ||||
| indicates congestion across the tunnel ('I' to 'E'), while the inner | ||||
| header would still indicate congestion upstream of ('I'). Appendix C | ||||
| gives a simple and precise method for a tunnel egress to infer the | ||||
| congestion level introduced across a tunnel. | ||||
| All this shows that 'E' can preserve the control loop irrespective of | All this shows that 'E' can preserve the control loop irrespective of | |||
| whether 'I' copies congestion notification into the outer header or | whether 'I' copies congestion notification into the outer header or | |||
| resets it. | resets it. | |||
| That is the situation for existing control arrangements but, because | That is the situation for existing control arrangements but, because | |||
| copying reveals more information, it would open up possibilities for | copying reveals more information, it would open up possibilities for | |||
| better control system designs. For instance, Appendix E describes | better control system designs. For instance, Appendix E describes | |||
| how resetting CE marking on encapsulation breaks a proposed | how resetting CE marking on encapsulation breaks a proposed | |||
| congestion marking scheme on the standards track. It ends up | congestion marking scheme on the standards track. It ends up | |||
| removing excessive amounts of traffic unnecessarily. Whereas copying | removing excessive amounts of traffic unnecessarily. Whereas copying | |||
| skipping to change at page 36, line 18 | skipping to change at page 37, line 16 | |||
| In this document we define the baseline of congestion marking (or the | In this document we define the baseline of congestion marking (or the | |||
| Congestion Baseline) as the source of the layer that created (or most | Congestion Baseline) as the source of the layer that created (or most | |||
| recently reset) the congestion notification field. When monitoring | recently reset) the congestion notification field. When monitoring | |||
| congestion it would be desirable if the Congestion Baseline did not | congestion it would be desirable if the Congestion Baseline did not | |||
| depend on whether packets were tunnelled or not. Given some tunnels | depend on whether packets were tunnelled or not. Given some tunnels | |||
| cross domain borders (e.g. consider M in Figure 6 is monitoring a | cross domain borders (e.g. consider M in Figure 6 is monitoring a | |||
| border), it would therefore be desirable for 'I' to copy congestion | border), it would therefore be desirable for 'I' to copy congestion | |||
| accumulated so far into the outer headers, so that it is exposed | accumulated so far into the outer headers, so that it is exposed | |||
| across the tunnel. | across the tunnel. | |||
| For management purposes it might be useful for the tunnel egress to | ||||
| be able to monitor whether congestion occurred across a tunnel or | ||||
| upstream of it. Superficially it appears that copying congestion | ||||
| markings at the ingress would make this difficult, whereas it was | ||||
| straightforward when an RFC3168 ingress reset them. However, | ||||
| Appendix C gives a simple and precise method for a tunnel egress to | ||||
| infer the congestion level introduced across a tunnel. It works | ||||
| irrespective of whether the ingress copies or resets congestion | ||||
| markings. | ||||
| Appendix C. Contribution to Congestion across a Tunnel | Appendix C. Contribution to Congestion across a Tunnel | |||
| This specification mandates that a tunnel ingress determines the ECN | This specification mandates that a tunnel ingress determines the ECN | |||
| field of each new outer tunnel header by copying the arriving header. | field of each new outer tunnel header by copying the arriving header. | |||
| Concern has been expressed that this will make it difficult for the | Concern has been expressed that this will make it difficult for the | |||
| tunnel egress to monitor congestion introduced only along a tunnel, | tunnel egress to monitor congestion introduced only along a tunnel, | |||
| which is easy if the outer ECN field is reset at a tunnel ingress | which is easy if the outer ECN field is reset at a tunnel ingress | |||
| (RFC3168 full functionality mode). However, in fact copying CE marks | (RFC3168 full functionality mode). However, in fact copying CE marks | |||
| at ingress will still make it easy for the egress to measure | at ingress will still make it easy for the egress to measure | |||
| congestion introduced across a tunnel, as illustrated below. | congestion introduced across a tunnel, as illustrated below. | |||
| Consider 100 packets measured at the egress. Say it measures that 30 | Consider 100 packets measured at the egress. Say it measures that 30 | |||
| are CE marked in the inner and outer headers and 12 have additional | are CE marked in the inner and outer headers and 12 have additional | |||
| CE marks in the outer but not the inner. This means packets arriving | CE marks in the outer but not the inner. This means packets arriving | |||
| at the ingress had already experienced 30% congestion. However, it | at the ingress had already experienced 30% congestion. However, it | |||
| does not mean there was 12% congestion across the tunnel. The | does not mean there was 12% congestion across the tunnel. The | |||
| correct calculation of congestion across the tunnel is p_t = 12/ | correct calculation of congestion across the tunnel is p_t = 12/ | |||
| (100-30) = 12/70 = 17%. This is easy for the egress to measure. It | (100-30) = 12/70 = 17%. This is easy for the egress to measure. It | |||
| is simply the packets with additional CE marking in the outer header | is simply the proportion of packets not marked in the inner header | |||
| (12) as a proportion of packets not marked in the inner header (70). | (70) that have a CE marking in the outer header (12). This technique | |||
| works whether the ingress copies or resets CE markings, so it can be | ||||
| used by an egress that is not sure which RFC the ingress complies | ||||
| with. | ||||
| Figure 7 illustrates this in a combinatorial probability diagram. | Figure 7 illustrates this in a combinatorial probability diagram. | |||
| The square represents 100 packets. The 30% division along the bottom | The square represents 100 packets. The 30% division along the bottom | |||
| represents marking before the ingress, and the p_t division up the | represents marking before the ingress, and the p_t division up the | |||
| side represents marking introduced across the tunnel. | side represents marking introduced across the tunnel. | |||
| ^ outer header marking | ^ outer header marking | |||
| | | | | |||
| 100% +-----+---------+ The large square | 100% +-----+---------+ The large square | |||
| | | | represents 100 packets | | | | represents 100 packets | |||
| skipping to change at page 37, line 22 | skipping to change at page 38, line 24 | |||
| | | 12 | = 17% | | | 12 | = 17% | |||
| 0 +-----+---------+---> | 0 +-----+---------+---> | |||
| 0 30% 100% inner header marking | 0 30% 100% inner header marking | |||
| Figure 7: Tunnel Marking of Packets Already Marked at Ingress | Figure 7: Tunnel Marking of Packets Already Marked at Ingress | |||
| Appendix D. Why Losing ECT(1) on Decapsulation Impedes PCN | Appendix D. Why Losing ECT(1) on Decapsulation Impedes PCN | |||
| Congestion notification with two severity levels is currently on the | Congestion notification with two severity levels is currently on the | |||
| IETF's standards track agenda in the Congestion and Pre-Congestion | IETF's standards track agenda in the Congestion and Pre-Congestion | |||
| Notification (PCN) working group. The PCN working group requires | Notification (PCN) working group. PCN needs all four possible states | |||
| four congestion states (not PCN-enabled, not marked and two | of congestion signalling in the 2-bit ECN field to be propagated at | |||
| increasingly severe levels of congestion marking--see [RFC5559]). | the egress, but pre-existing tunnels only propagate three. The four | |||
| The aim is for the less severe level of marking to stop admitting new | PCN states are: not PCN-enabled, not marked and two increasingly | |||
| traffic and the more severe level to terminate sufficient existing | severe levels of congestion marking. The less severe marking means | |||
| flows to bring a network back to its operating point after a link | 'stop admitting new traffic' and the more severe marking means | |||
| failure. | 'terminate some existing flows', which may be needed after reroutes | |||
| (see [RFC5559] for more details). (Note on terminology: wherever | ||||
| this document counts four congestion states, the PCN working group | ||||
| would count this as three PCN states plus a not-PCN-enabled state.) | ||||
| (Note on terminology: wherever this document counts four congestion | Figure 2 (Section 3.2) shows that pre-existing decapsulation | |||
| states, the PCN working group would count this as three PCN states | behaviour would have discarded any ECT(1) markings in outer headers | |||
| plus a not-PCN-enabled state.) | if the inner was ECT(0). This prevented the PCN working group from | |||
| using ECT(1) -- if a PCN node used ECT(1) to indicate one of the | ||||
| severity levels of congestion, any later tunnel egress would revert | ||||
| the marking to ECT(0) as if nothing had happened. Effectively the | ||||
| decapsulation rules of RFC4301 and RFC3168 waste one ECT codepoint; | ||||
| they treat the ECT(0) and ECT(1) codepoints as a single codepoint. | ||||
| Although the ECN field gives sufficient codepoints for four states, | A number of work-rounds to this problem were proposed in the PCN w-g; | |||
| pre-existing ECN tunnelling RFCs prevented the PCN working group from | to add the fourth state another way or avoid needing it. Without | |||
| using four ECN states in case any tunnel decapsulations occur within | wishing to disparage the ingenuity of these work-rounds, none were | |||
| a PCN region. If a node in a tunnel changes the ECN field to ECT(0) | chosen for the standards track because they were either somewhat | |||
| or ECT(1), this change would be discarded by a tunnel egress | wasteful, imprecise or complicated: | |||
| compliant with RFC4301 or RFC3168. This can be seen in Figure 2 | ||||
| (Section 3.2), where ECT values in the outer header are ignored | ||||
| unless the inner header is the same. Effectively the decapsulation | ||||
| rules of RFC4301 and RFC3168 waste one ECT codepoint; they treat the | ||||
| ECT(0) and ECT(1) codepoints as a single codepoint. | ||||
| As a consequence, the PCN w-g initially took the approach of a | o One uses a pair of Diffserv codepoint(s) in place of each PCN DSCP | |||
| standards track baseline encoding for three states | to encode the extra state [I-D.ietf-pcn-3-state-encoding], using | |||
| [I-D.ietf-pcn-baseline-encoding] and a number of experimental | up the rapidly exhausting DSCP space while leaving an ECN | |||
| alternatives to add or avoid the fourth state. Without wishing to | codepoint unused. | |||
| disparage the ingenuity of these work-rounds, none were chosen for | ||||
| the standards track because they were either somewhat wasteful, | ||||
| imprecise or complicated. One uses a pair of Diffserv codepoint(s) | ||||
| in place of each PCN DSCP to encode the extra state | ||||
| [I-D.ietf-pcn-3-state-encoding], using up the rapidly exhausting DSCP | o Another survives tunnelling without an extra DSCP | |||
| space while leaving an ECN codepoint unused. Another PCN encoding | [I-D.ietf-pcn-psdm-encoding], but it requires the PCN edge | |||
| has been proposed that would survive tunnelling without an extra DSCP | gateways to share the initial state of a packet out of band. | |||
| [I-D.ietf-pcn-psdm-encoding], but it requires the PCN edge gateways | ||||
| to share state out of band so the egress edge can know which marking | o Another proposes a more involved marking algorithm in forwarding | |||
| a packet started with at the ingress edge. Yet another work-round to | elements to encode the three congestion notification states using | |||
| the ECN tunnelling problem proposes a more involved marking algorithm | only two ECN codepoints [I-D.satoh-pcn-st-marking]. | |||
| in forwarding elements to encode the three congestion notification | ||||
| states using only two ECN codepoints [I-D.satoh-pcn-st-marking]. One | o Another takes a different approach; it compromises the precision | |||
| work-round takes a different approach; it compromises the precision | of the admission control mechanism in some network scenarios, but | |||
| of the admission control mechanism in some network scenarios, but | manages to work with just three encoding states and a single | |||
| manages to work with just three encoding states and a single marking | marking algorithm [I-D.ietf-pcn-sm-edge-behaviour]. | |||
| algorithm [I-D.ietf-pcn-sm-edge-behaviour]. | ||||
| Rather than require the IETF to bless any of these experimental | Rather than require the IETF to bless any of these experimental | |||
| encoding work-rounds, the present specification fixes the root cause | encoding work-rounds, the present specification fixes the root cause | |||
| of the problem so that operators deploying PCN can simply require | of the problem so that operators deploying PCN can simply require | |||
| that tunnel end-points within a PCN region should comply with this | that tunnel end-points within a PCN region should comply with this | |||
| new ECN tunnelling specification. Universal compliance is feasible | new ECN tunnelling specification. On the public Internet it would | |||
| for PCN, because it is intended to be deployed in a controlled | not be possible to know whether all tunnels complied with this new | |||
| Diffserv region. Assuming tunnels within a PCN region will be | specification, but universal compliance is feasible for PCN, because | |||
| required to comply with the present specification, the PCN w-g is | it is intended to be deployed in a controlled Diffserv region. | |||
| progressing a trivially simple four-state ECN encoding | ||||
| [I-D.ietf-pcn-3-in-1-encoding]. | Given the present specification, the PCN w-g could progress a | |||
| trivially simple four-state ECN encoding | ||||
| [I-D.ietf-pcn-3-in-1-encoding]. This would replace the interim | ||||
| standards track baseline encoding of just three states [RFC5696] | ||||
| which makes a fourth state available for any of the experimental | ||||
| alternatives. | ||||
| Appendix E. Why Resetting ECN on Encapsulation Impedes PCN | Appendix E. Why Resetting ECN on Encapsulation Impedes PCN | |||
| The PCN architecture says "...if encapsulation is done within the | The PCN architecture says "...if encapsulation is done within the | |||
| PCN-domain: Any PCN-marking is copied into the outer header. Note: A | PCN-domain: Any PCN-marking is copied into the outer header. Note: A | |||
| tunnel will not provide this behaviour if it complies with [RFC3168] | tunnel will not provide this behaviour if it complies with [RFC3168] | |||
| tunnelling in either mode, but it will if it complies with [RFC4301] | tunnelling in either mode, but it will if it complies with [RFC4301] | |||
| IPsec tunnelling. " | IPsec tunnelling. " | |||
| The specific issue here concerns PCN excess rate marking | The specific issue here concerns PCN excess rate marking [RFC5670]. | |||
| [I-D.ietf-pcn-marking-behaviour]. The purpose of excess rate marking | The purpose of excess rate marking is to provide a bulk mechanism for | |||
| is to provide a bulk mechanism for interior nodes within a PCN domain | interior nodes within a PCN domain to mark traffic that is exceeding | |||
| to mark traffic that is exceeding a configured threshold bit-rate, | a configured threshold bit-rate, perhaps after an unexpected event | |||
| perhaps after an unexpected event such as a reroute, a link or node | such as a reroute, a link or node failure, or a more widespread | |||
| failure, or a more widespread disaster. PCN is intended for | disaster. Reroutes are a common cause of QoS degradation in IP | |||
| inelastic flows, so just removing marked packets would degrade every | networks. After reroutes it is common for multiple links in a | |||
| flow to the point of uselessness. Instead, the edge nodes around a | network to become stressed at once. Therefore, PCN excess rate | |||
| PCN domain terminate an equivalent amount of traffic, but at flow | marking has been carefully designed to ensure traffic marked at one | |||
| granularity. As well as protecting the surviving inelastic flows, | queue will not be counted again for marking at subsequent queues (see | |||
| this also protects the share of capacity set aside for elastic | the `Excess traffic meter function' of [RFC5670]). | |||
| traffic. But users are very sensitive to their flows being | ||||
| terminated while in progress, therefore no more flows should be | ||||
| terminated than absolutely necessary. | ||||
| Re-routes are a common cause of QoS degradation in IP networks. | ||||
| After re-routes it is common for multiple links in a network to | ||||
| become stressed at once. Therefore, PCN excess rate marking has been | ||||
| carefully designed to ensure traffic marked at one queue will not be | ||||
| counted again for marking at subsequent queues (see the `Excess | ||||
| traffic meter function' of [I-D.ietf-pcn-marking-behaviour]). | ||||
| However, if an RFC3168 tunnel ingress intervenes, it resets the ECN | However, if an RFC3168 tunnel ingress intervenes, it resets the ECN | |||
| field in all the outer headers. This will cause excess traffic to be | field in all the outer headers. This will cause excess traffic to be | |||
| counted more than once, leading to many flows being removed that did | counted more than once, leading to many flows being removed that did | |||
| not need to be removed at all. This is why the an RFC3168 tunnel | not need to be removed at all. This is why the an RFC3168 tunnel | |||
| ingress cannot be used in a PCN domain. | ingress cannot be used in a PCN domain. | |||
| The original reason an RFC3168 encapsulator reset the ECN field was | ||||
| to block a covert channel (Appendix B.1), with the overriding aim of | ||||
| consistent behaviour between IPsec and non-IPsec tunnels. But later | ||||
| RFC4301 IPsec encapsulation placed simplicity above the need to block | ||||
| the covert channel, simply copying the ECN field. | ||||
| The ECN reset in RFC3168 is no longer deemed necessary, it is | The ECN reset in RFC3168 is no longer deemed necessary, it is | |||
| inconsistent with RFC4301, it is not as simple as RFC4301 and it is | inconsistent with RFC4301, it is not as simple as RFC4301 and it is | |||
| impeding deployment of new protocols like PCN. The present | impeding deployment of new protocols like PCN. The present | |||
| specification corrects this perverse situation. | specification corrects this perverse situation. | |||
| Appendix F. Compromise on Decap with ECT(1) Inner and ECT(0) Outer | Appendix F. Compromise on Decap with ECT(1) Inner and ECT(0) Outer | |||
| A packet with an ECT(1) inner and an ECT(0) outer should never arise | A packet with an ECT(1) inner and an ECT(0) outer should never arise | |||
| from any known IETF protocol. Without giving a reason, RFC3168 and | from any known IETF protocol. Without giving a reason, RFC3168 and | |||
| RFC4301 both say the outer should be ignored when decapsulating such | RFC4301 both say the outer should be ignored when decapsulating such | |||
| skipping to change at page 40, line 24 | skipping to change at page 41, line 14 | |||
| so that the data source could use the ECN nonce [RFC3540] to detect | so that the data source could use the ECN nonce [RFC3540] to detect | |||
| if congestion signals were being erased. However, in this case, the | if congestion signals were being erased. However, in this case, the | |||
| decapsulator does not need a nonce to detect any anomalies introduced | decapsulator does not need a nonce to detect any anomalies introduced | |||
| within the tunnel, because it has the inner as a record of the header | within the tunnel, because it has the inner as a record of the header | |||
| at the ingress. Therefore, it was decided that the best compromise | at the ingress. Therefore, it was decided that the best compromise | |||
| would be to give precedence to solving the safety issue over | would be to give precedence to solving the safety issue over | |||
| revealing the anomaly, because the anomaly could at least be detected | revealing the anomaly, because the anomaly could at least be detected | |||
| and dealt with internally. | and dealt with internally. | |||
| Superficially, the opposite case where the inner and outer carry | Superficially, the opposite case where the inner and outer carry | |||
| different ECT values, but with an ECT(1) outer and ECT(0) inner seems | different ECT values, but with an ECT(1) outer and ECT(0) inner, | |||
| to require a similar compromise. However, because that case is | seems to require a similar compromise. However, because that case is | |||
| reversed, no compromise is necessary; it is best to forward the outer | reversed, no compromise is necessary; it is best to forward the outer | |||
| whether the transport expects the ECT(1) to mean a higher severity | whether the transport expects the ECT(1) to mean a higher severity | |||
| than ECT(0) or the same severity. Forwarding the outer either | than ECT(0) or the same severity. Forwarding the outer either | |||
| preserves a higher value (if it is higher) or it reveals an anomaly | preserves a higher value (if it is higher) or it reveals an anomaly | |||
| to the transport (if the two ECT codepoints mean the same severity). | to the transport (if the two ECT codepoints mean the same severity). | |||
| Appendix G. Open Issues | Appendix G. Open Issues | |||
| The new decapsulation behaviour defined in Section 4.2 adds support | The new decapsulation behaviour defined in Section 4.2 adds support | |||
| for propagation of 2 severity levels of congestion. However | for propagation of 2 severity levels of congestion. However | |||
| End of changes. 74 change blocks. | ||||
| 317 lines changed or deleted | 373 lines changed or added | |||
This html diff was produced by rfcdiff 1.37b. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ | ||||