| draft-briscoe-tsvwg-re-ecn-tcp-06.txt | | draft-briscoe-tsvwg-re-ecn-tcp-07.txt | |
| | | | |
| Transport Area Working Group B. Briscoe | | Transport Area Working Group B. Briscoe | |
| Internet-Draft BT & UCL | | Internet-Draft BT & UCL | |
| Intended status: Standards Track A. Jacquet | | Intended status: Standards Track A. Jacquet | |
|
| Expires: January 15, 2009 T. Moncaster | | Expires: September 4, 2009 T. Moncaster | |
| A. Smith | | A. Smith | |
| BT | | BT | |
|
| July 14, 2008 | | March 3, 2009 | |
| | | | |
| Re-ECN: Adding Accountability for Causing Congestion to TCP/IP | | Re-ECN: Adding Accountability for Causing Congestion to TCP/IP | |
|
| draft-briscoe-tsvwg-re-ecn-tcp-06 | | draft-briscoe-tsvwg-re-ecn-tcp-07 | |
| | | | |
|
| Status of this Memo | | Status of This Memo | |
| | | | |
| By submitting this Internet-Draft, each author represents that any | | By submitting this Internet-Draft, each author represents that any | |
| applicable patent or other IPR claims of which he or she is aware | | applicable patent or other IPR claims of which he or she is aware | |
| have been or will be disclosed, and any of which he or she becomes | | have been or will be disclosed, and any of which he or she becomes | |
| aware will be disclosed, in accordance with Section 6 of BCP 79. | | aware will be disclosed, in accordance with Section 6 of BCP 79. | |
| | | | |
| Internet-Drafts are working documents of the Internet Engineering | | Internet-Drafts are working documents of the Internet Engineering | |
| Task Force (IETF), its areas, and its working groups. Note that | | Task Force (IETF), its areas, and its working groups. Note that | |
| other groups may also distribute working documents as Internet- | | other groups may also distribute working documents as Internet- | |
| Drafts. | | Drafts. | |
| | | | |
| skipping to change at page 1, line 37 | | skipping to change at page 1, line 37 | |
| and may be updated, replaced, or obsoleted by other documents at any | | and may be updated, replaced, or obsoleted by other documents at any | |
| time. It is inappropriate to use Internet-Drafts as reference | | time. It is inappropriate to use Internet-Drafts as reference | |
| material or to cite them other than as "work in progress." | | material or to cite them other than as "work in progress." | |
| | | | |
| The list of current Internet-Drafts can be accessed at | | The list of current Internet-Drafts can be accessed at | |
| http://www.ietf.org/ietf/1id-abstracts.txt. | | http://www.ietf.org/ietf/1id-abstracts.txt. | |
| | | | |
| The list of Internet-Draft Shadow Directories can be accessed at | | The list of Internet-Draft Shadow Directories can be accessed at | |
| http://www.ietf.org/shadow.html. | | http://www.ietf.org/shadow.html. | |
| | | | |
|
| This Internet-Draft will expire on January 15, 2009. | | This Internet-Draft will expire on September 4, 2009. | |
| | | | |
| Copyright Notice | | Copyright Notice | |
| | | | |
|
| Copyright (C) The IETF Trust (2008). | | Copyright (c) 2009 IETF Trust and the persons identified as the | |
| | | document authors. All rights reserved. | |
| | | | |
| | | This document is subject to BCP 78 and the IETF Trust's Legal | |
| | | Provisions Relating to IETF Documents in effect on the date of | |
| | | publication of this document (http://trustee.ietf.org/license-info). | |
| | | Please review these documents carefully, as they describe your rights | |
| | | and restrictions with respect to this document. | |
| | | | |
| Abstract | | Abstract | |
| | | | |
| This document introduces a new protocol for explicit congestion | | This document introduces a new protocol for explicit congestion | |
| notification (ECN), termed re-ECN, which can be deployed | | notification (ECN), termed re-ECN, which can be deployed | |
|
| incrementally around unmodified routers. It enbales the the upstream | | incrementally around unmodified routers. The protocol works by | |
| party at any trust boundary in the internetwork to be held | | arranging an extended ECN field in each packet so that, as it crosses | |
| responsible for the congestion they cause, or allow to be caused. | | any interface in an internetwork, it will carry a truthful prediction | |
| | | of congestion on the remainder of its path. The purpose of this | |
| So, networks can introduce straightforward accountability for | | document is to specify the re-ECN protocol at the IP layer and to | |
| congestion and policing mechanisms for incoming traffic from end- | | give guidelines on any consequent changes required to transport | |
| customers or from neighbouring network domains. The protocol works | | protocols. It includes the changes required to TCP both as an | |
| by arranging an extended ECN field in each packet so that, as it | | example and as a specification. It briefly gives examples of | |
| crosses any interface in an internetwork, it will carry a truthful | | | |
| prediction of congestion on the remainder of its path. The purpose | | | |
| of this document is to specify the re-ECN protocol at the IP layer | | | |
| and to give guidelines on any consequent changes required to | | | |
| transport protocols. It includes the changes required to TCP both as | | | |
| an example and as a specification. It also gives examples of | | | |
| mechanisms that can use the protocol to ensure data sources respond | | mechanisms that can use the protocol to ensure data sources respond | |
|
| correctly to congestion. And it describes example mechanisms that | | correctly to congestion,and these are described more fully in a | |
| ensure the dominant selfish strategy of both network domains and end- | | companion document [re-ecn-motive]. | |
| points will be to set the extended ECN field honestly. | | | |
| | | | |
| Authors' Statement: Status (to be removed by the RFC Editor) | | Authors' Statement: Status (to be removed by the RFC Editor) | |
| | | | |
| Although the re-ECN protocol is intended to make a simple but far- | | Although the re-ECN protocol is intended to make a simple but far- | |
| reaching change to the Internet architecture, the most immediate | | reaching change to the Internet architecture, the most immediate | |
| priority for the authors is to delay any move of the ECN nonce to | | priority for the authors is to delay any move of the ECN nonce to | |
| Proposed Standard status. The argument for this position is | | Proposed Standard status. The argument for this position is | |
|
| developed in Appendix I. | | developed in Appendix E. | |
| | | | |
| Changes from previous drafts (to be removed by the RFC Editor) | | Changes from previous drafts (to be removed by the RFC Editor) | |
| | | | |
| Full diffs created using the rfcdiff tool are available at | | Full diffs created using the rfcdiff tool are available at | |
| <http://www.cs.ucl.ac.uk/staff/B.Briscoe/pubs.html#retcp> | | <http://www.cs.ucl.ac.uk/staff/B.Briscoe/pubs.html#retcp> | |
| | | | |
|
| From -05 to -06 (current version): | | From -06 to -07 (current version): | |
| | | | |
| Clarifications made to Section 1 and Section 3. | | | |
| | | | |
| Minor editorial changes throughout. | | | |
| | | | |
| From -04 to -05: | | | |
| | | | |
| Completed justification for packet marking with FNE during slow- | | | |
| start(Appendix D). | | | |
| | | | |
| Minor editorial changes throughout. | | | |
| | | | |
| From -03 to -04: | | | |
| | | | |
| Clarified reasons for holding back ECN nonce (Section 3.3 & | | | |
| Appendix I). | | | |
| | | | |
| Clarified Figure 2. | | | |
| | | | |
| Added Section 4.1.1.1 on equivalence of drops and ECN marks. | | | |
| | | | |
| Improved precision of Section 5.6 on IP in IP tunnels. | | | |
| | | | |
| Explained the RTT fairness is possible to enforce, but unlikely to | | | |
| be required (Section 6.1.3 & Appendix F). | | | |
| | | | |
| Explained that bulk per-user policing should be adequate but per- | | | |
| flow policing is also possible if desired, though it is not likely | | | |
| to be necessary (Section 6.1.5 & Appendix G). | | | |
| | | | |
| Reinforced need for passive policing at inter-domain borders to | | | |
| enable all-optical networking (Section 6.1.6). | | | |
| | | | |
| Minor editorial changes throughout. | | | |
| | | | |
|
| From -02 to -03: | | Major changes made following splitting this protocol document from | |
| | | the related motivations document [re-ecn-motive]. | |
| | | | |
|
| Started guidelines for re-ECN support in DCCP and SCTP. | | Significant re-ordering of remaining text. | |
| | | | |
|
| Added annex on limitations of nonce mechanism. | | New terminology introduced for clarity. | |
| | | | |
| Minor editorial changes throughout. | | Minor editorial changes throughout. | |
| | | | |
|
| From -01 to -02: | | | |
| | | | |
| Explanation on informal terminology in Section 3.5 clarified. | | | |
| | | | |
| IPv6 wire protocol encoding added (Section 5.2). | | | |
| | | | |
| Text on (non-)issues with tunnels, encryption and link layer | | | |
| congestion notification added (Section 5.6 & Section 5.7). | | | |
| | | | |
| Section added giving evolvability arguments against encouraging | | | |
| bottleneck policing (Section 6.1.2). And text on re-ECN's | | | |
| evolvability by design added to Section 6.1.3 | | | |
| | | | |
| Text on inter-domain policing (Section 6.1.6) and inter-domain | | | |
| fail-safes (Section 6.1.7) added. | | | |
| | | | |
| From -00 to -01: | | | |
| | | | |
| Encoding of re-ECN wire protocol changed for reasons given in | | | |
| Appendix B and consequently draft substantially re-written. | | | |
| | | | |
| Substantial text added in sections on applications, incremental | | | |
| deployment, architectural rationale and security considerations. | | | |
| | | | |
| Table of Contents | | Table of Contents | |
| | | | |
|
| 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 6 | | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 5 | |
| 2. Requirements notation . . . . . . . . . . . . . . . . . . . . 8 | | 2. Requirements notation . . . . . . . . . . . . . . . . . . . . 6 | |
| 3. Protocol Overview . . . . . . . . . . . . . . . . . . . . . . 8 | | 3. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 6 | |
| 3.1. Background and Applicability . . . . . . . . . . . . . . . 8 | | 4. Protocol Overview . . . . . . . . . . . . . . . . . . . . . . 7 | |
| 3.2. Simplified Re-ECN Protocol . . . . . . . . . . . . . . . . 10 | | 4.1. Simplified Re-ECN Protocol . . . . . . . . . . . . . . . . 7 | |
| 3.3. Re-ECN Abstracted Network Layer Wire Protocol (IPv4 or | | 4.1.1. Congestion Control and Policing the Protocol . . . . . 7 | |
| v6) . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 | | 4.1.2. Background and Applicability . . . . . . . . . . . . . 8 | |
| 3.4. Re-ECN Protocol Operation . . . . . . . . . . . . . . . . 12 | | 4.2. Re-ECN Abstracted Network Layer Wire Protocol (IPv4 or | |
| 3.5. Informal Terminology . . . . . . . . . . . . . . . . . . . 14 | | v6) . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 | |
| 4. Transport Layers . . . . . . . . . . . . . . . . . . . . . . . 15 | | 4.3. Re-ECN Protocol Operation . . . . . . . . . . . . . . . . 10 | |
| 4.1. TCP . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 | | 4.4. Positive and Negative Flows . . . . . . . . . . . . . . . 12 | |
| 4.1.1. RECN mode: Full Re-ECN capable transport . . . . . . . 17 | | 5. Network Layer . . . . . . . . . . . . . . . . . . . . . . . . 13 | |
| 4.1.2. RECN-Co mode: Re-ECT Sender with a RFC3168 | | 5.1. Re-ECN IPv4 Wire Protocol . . . . . . . . . . . . . . . . 13 | |
| compliant ECN Receiver . . . . . . . . . . . . . . . . 20 | | 5.2. Re-ECN IPv6 Wire Protocol . . . . . . . . . . . . . . . . 15 | |
| 4.1.3. Capability Negotiation . . . . . . . . . . . . . . . . 21 | | 5.3. Router Forwarding Behaviour . . . . . . . . . . . . . . . 16 | |
| 4.1.4. Extended ECN (EECN) Field Settings during Flow | | 5.4. Justification for Setting the First SYN to FNE . . . . . . 17 | |
| Start or after Idle Periods . . . . . . . . . . . . . 23 | | 5.5. Control and Management . . . . . . . . . . . . . . . . . . 18 | |
| 4.1.5. Pure ACKS, Retransmissions, Window Probes and | | 5.5.1. Negative Balance Warning . . . . . . . . . . . . . . . 18 | |
| Partial ACKs . . . . . . . . . . . . . . . . . . . . . 27 | | 5.5.2. Rate Response Control . . . . . . . . . . . . . . . . 19 | |
| 4.2. Other Transports . . . . . . . . . . . . . . . . . . . . . 27 | | 5.6. IP in IP Tunnels . . . . . . . . . . . . . . . . . . . . . 19 | |
| 4.2.1. General Guidelines for Adding Re-ECN to Other | | 5.7. Non-Issues . . . . . . . . . . . . . . . . . . . . . . . . 20 | |
| Transports . . . . . . . . . . . . . . . . . . . . . . 27 | | 6. Transport Layers . . . . . . . . . . . . . . . . . . . . . . . 21 | |
| 4.2.2. Guidelines for adding Re-ECN to RSVP or NSIS . . . . . 28 | | 6.1. TCP . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 | |
| 4.2.3. Guidelines for adding Re-ECN to DCCP . . . . . . . . . 28 | | 6.1.1. RECN mode: Full Re-ECN capable transport . . . . . . . 22 | |
| 4.2.4. Guidelines for adding Re-ECN to SCTP . . . . . . . . . 29 | | 6.1.2. RECN-Co mode: Re-ECT Sender with a RFC3168 | |
| 5. Network Layer . . . . . . . . . . . . . . . . . . . . . . . . 29 | | compliant ECN Receiver . . . . . . . . . . . . . . . . 24 | |
| 5.1. Re-ECN IPv4 Wire Protocol . . . . . . . . . . . . . . . . 29 | | 6.1.3. Capability Negotiation . . . . . . . . . . . . . . . . 26 | |
| 5.2. Re-ECN IPv6 Wire Protocol . . . . . . . . . . . . . . . . 30 | | 6.1.4. Extended ECN (EECN) Field Settings during Flow | |
| 5.3. Router Forwarding Behaviour . . . . . . . . . . . . . . . 31 | | Start or after Idle Periods . . . . . . . . . . . . . 27 | |
| 5.4. Justification for Setting the First SYN to FNE . . . . . . 33 | | 6.1.5. Pure ACKS, Retransmissions, Window Probes and | |
| 5.5. Control and Management . . . . . . . . . . . . . . . . . . 34 | | Partial ACKs . . . . . . . . . . . . . . . . . . . . . 31 | |
| 5.5.1. Negative Balance Warning . . . . . . . . . . . . . . . 34 | | 6.2. Other Transports . . . . . . . . . . . . . . . . . . . . . 32 | |
| 5.5.2. Rate Response Control . . . . . . . . . . . . . . . . 35 | | 6.2.1. General Guidelines for Adding Re-ECN to Other | |
| 5.6. IP in IP Tunnels . . . . . . . . . . . . . . . . . . . . . 35 | | Transports . . . . . . . . . . . . . . . . . . . . . . 32 | |
| 5.7. Non-Issues . . . . . . . . . . . . . . . . . . . . . . . . 36 | | 6.2.2. Guidelines for adding Re-ECN to RSVP or NSIS . . . . . 32 | |
| 6. Applications . . . . . . . . . . . . . . . . . . . . . . . . . 37 | | 6.2.3. Guidelines for adding Re-ECN to DCCP . . . . . . . . . 33 | |
| 6.1. Policing Congestion Response . . . . . . . . . . . . . . . 37 | | 6.2.4. Guidelines for adding Re-ECN to SCTP . . . . . . . . . 33 | |
| 6.1.1. The Policing Problem . . . . . . . . . . . . . . . . . 37 | | 7. Incremental Deployment . . . . . . . . . . . . . . . . . . . . 33 | |
| 6.1.2. The Case Against Bottleneck Policing . . . . . . . . . 38 | | 8. Related Work . . . . . . . . . . . . . . . . . . . . . . . . . 34 | |
| 6.1.3. Re-ECN Incentive Framework . . . . . . . . . . . . . . 39 | | 8.1. Congestion Notification Integrity . . . . . . . . . . . . 34 | |
| 6.1.4. Egress Dropper . . . . . . . . . . . . . . . . . . . . 46 | | 9. Security Considerations . . . . . . . . . . . . . . . . . . . 35 | |
| 6.1.5. Policing . . . . . . . . . . . . . . . . . . . . . . . 47 | | 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 37 | |
| 6.1.6. Inter-domain Policing . . . . . . . . . . . . . . . . 49 | | 11. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 37 | |
| 6.1.7. Inter-domain Fail-safes . . . . . . . . . . . . . . . 52 | | 12. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 37 | |
| 6.1.8. Simulations . . . . . . . . . . . . . . . . . . . . . 53 | | 13. Comments Solicited . . . . . . . . . . . . . . . . . . . . . . 38 | |
| 6.2. Other Applications . . . . . . . . . . . . . . . . . . . . 53 | | 14. References . . . . . . . . . . . . . . . . . . . . . . . . . . 38 | |
| 6.2.1. DDoS Mitigation . . . . . . . . . . . . . . . . . . . 53 | | 14.1. Normative References . . . . . . . . . . . . . . . . . . . 38 | |
| 6.2.2. End-to-end QoS . . . . . . . . . . . . . . . . . . . . 54 | | 14.2. Informative References . . . . . . . . . . . . . . . . . . 39 | |
| 6.2.3. Traffic Engineering . . . . . . . . . . . . . . . . . 55 | | Appendix A. Precise Re-ECN Protocol Operation . . . . . . . . . . 41 | |
| 6.2.4. Inter-Provider Service Monitoring . . . . . . . . . . 55 | | | |
| 6.3. Limitations . . . . . . . . . . . . . . . . . . . . . . . 55 | | | |
| 7. Incremental Deployment . . . . . . . . . . . . . . . . . . . . 56 | | | |
| 7.1. Incremental Deployment Features . . . . . . . . . . . . . 56 | | | |
| 7.2. Incremental Deployment Incentives . . . . . . . . . . . . 57 | | | |
| 8. Architectural Rationale . . . . . . . . . . . . . . . . . . . 62 | | | |
| 9. Related Work . . . . . . . . . . . . . . . . . . . . . . . . . 65 | | | |
| 9.1. Policing Rate Response to Congestion . . . . . . . . . . . 65 | | | |
| 9.2. Congestion Notification Integrity . . . . . . . . . . . . 66 | | | |
| 9.3. Identifying Upstream and Downstream Congestion . . . . . . 67 | | | |
| 10. Security Considerations . . . . . . . . . . . . . . . . . . . 67 | | | |
| 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 68 | | | |
| 12. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 69 | | | |
| 13. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 69 | | | |
| 14. Comments Solicited . . . . . . . . . . . . . . . . . . . . . . 69 | | | |
| 15. References . . . . . . . . . . . . . . . . . . . . . . . . . . 70 | | | |
| 15.1. Normative References . . . . . . . . . . . . . . . . . . . 70 | | | |
| 15.2. Informative References . . . . . . . . . . . . . . . . . . 70 | | | |
| Appendix A. Precise Re-ECN Protocol Operation . . . . . . . . . . 74 | | | |
| Appendix B. Justification for Two Codepoints Signifying Zero | | Appendix B. Justification for Two Codepoints Signifying Zero | |
|
| Worth Packets . . . . . . . . . . . . . . . . . . . . 75 | | Worth Packets . . . . . . . . . . . . . . . . . . . . 43 | |
| Appendix C. ECN Compatibility . . . . . . . . . . . . . . . . . . 76 | | Appendix C. ECN Compatibility . . . . . . . . . . . . . . . . . . 44 | |
| Appendix D. Packet Marking with FNE During Flow Start . . . . . . 78 | | Appendix D. Packet Marking with FNE During Flow Start . . . . . . 45 | |
| Appendix E. Example Egress Dropper Algorithm . . . . . . . . . . 80 | | Appendix E. Argument for holding back the ECN nonce . . . . . . . 47 | |
| Appendix F. Re-TTL . . . . . . . . . . . . . . . . . . . . . . . 80 | | Appendix F. Alternative Terminology Used in Other Documents . . . 49 | |
| Appendix G. Policer Designs to ensure Congestion | | | |
| Responsiveness . . . . . . . . . . . . . . . . . . . 80 | | | |
| G.1. Per-user Policing . . . . . . . . . . . . . . . . . . . . 80 | | | |
| G.2. Per-flow Rate Policing . . . . . . . . . . . . . . . . . . 82 | | | |
| Appendix H. Downstream Congestion Metering Algorithms . . . . . . 84 | | | |
| H.1. Bulk Downstream Congestion Metering Algorithm . . . . . . 84 | | | |
| H.2. Inflation Factor for Persistently Negative Flows . . . . . 85 | | | |
| Appendix I. Argument for holding back the ECN nonce . . . . . . . 86 | | | |
| Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 88 | | | |
| Intellectual Property and Copyright Statements . . . . . . . . . . 90 | | | |
| | | | |
| 1. Introduction | | 1. Introduction | |
| | | | |
|
| This document aims: | | This document aims to provide a complete specification of the | |
| | | addition of the re-ECN protocol to IP and guidelines on how to add it | |
| o To provide a complete specification of the addition of the re-ECN | | to transport layer protocols, including a complete specification of | |
| protocol to IP and guidelines on how to add it to transport layer | | re-ECN in TCP as an example. The motivation behind this proposal is | |
| protocols, including a complete specification of re-ECN in TCP as | | given in [re-ecn-motive], but we include a brief summary here. | |
| an example; | | | |
| | | | |
| o To show how a number of hard problems become much easier to solve | | | |
| once re-ECN is available in IP. | | | |
| | | | |
|
| In ECN [RFC3168] congested queues probabilistically mark packets as | | Re-ECN is intended to allow senders to inform the network of the | |
| they approach a congested state. The receiver informs the sender | | level of congestion they expect their flows to see. This information | |
| that they have seen one or more marks. In re-ECN the sender must | | is currently only visible at the transport layer. ECN [RFC3168] | |
| predict the level of congestion on the path by re-inserting feedback | | reveals the upstream congestion state of any path by monitoring the | |
| according to the marking scheme described later in this draft. This | | rate of CE marks. The receiver then informs the sender when they | |
| results in packets that carry a prediction of downstream congestion. | | have seen a marked packet. Re-ECN builds on ECN by providing new | |
| | | codepoints that allow the sender to declare the level of congestion | |
| | | they expect on the forward path. It is closely related to ECN and | |
| | | indeed we define a compatability mode to allow a re-ECN sender to | |
| | | communicate with an ECN receiver [xref]. | |
| | | | |
| If a sender understates expected congestion compared to actual | | If a sender understates expected congestion compared to actual | |
| congestion then the network could discard packets or enact some other | | congestion then the network could discard packets or enact some other | |
| sanction. A policer can also be introduced at the ingress of | | sanction. A policer can also be introduced at the ingress of | |
|
| networks that can limit the congestion caused (or base penalties on | | networks that can limit the level of congestion being caused. | |
| it). | | | |
| | | | |
| It is important to add a few key points. | | | |
| | | | |
| o It can be seen that it takes one round trip before any feedback is | | | |
| received. For this reason a sender must make a conservative | | | |
| prediction by transmitting IP packets with a special Feedback Not | | | |
| Established (FNE) marking. | | | |
| | | | |
| o It should be noted that the prediction is carried in-band in | | | |
| normal data packets and for many transports feedback can be | | | |
| carried in the normal acknowledgements or control packets. | | | |
| | | | |
| o The re-ECN protocol is independent of the transport. In TCP, | | | |
| acknowledgments are used to convey the feedback from receiver to | | | |
| sender. This memo concentrates on TCP as an example transport | | | |
| protocol, however the re-ECN protocol is compatible with any | | | |
| transport where feedback can be sent from receiver to sender. | | | |
| | | | |
| A general statement of the problem solved by re-ECN is to provide | | A general statement of the problem solved by re-ECN is to provide | |
| sufficient information in each IP datagram to be able to hold senders | | sufficient information in each IP datagram to be able to hold senders | |
| and whole networks accountable for the congestion they cause | | and whole networks accountable for the congestion they cause | |
| downstream, before they cause it. But the every-day problems that | | downstream, before they cause it. But the every-day problems that | |
| re-ECN can solve are much more recognisable than this rather generic | | re-ECN can solve are much more recognisable than this rather generic | |
| statement: mitigating distributed denial of service (DDoS); | | statement: mitigating distributed denial of service (DDoS); | |
| simplifying differentiation of quality of service (QoS); policing | | simplifying differentiation of quality of service (QoS); policing | |
| compliance to congestion control; and so on. | | compliance to congestion control; and so on. | |
| | | | |
|
| Uniquely, re-ECN manages to enable solutions to these problems | | It is important to add a few key points. | |
| without unduly stifling innovative new ways to use the Internet. | | | |
| This was a hard balance to strike, given it could be argued that DDoS | | | |
| is an innovative way to use the Internet. The most valuable insight | | | |
| was to allow each network to choose the level of constraint it wishes | | | |
| to impose. Also re-ECN has been carefully designed so that networks | | | |
| that choose to use it conservatively can protect themselves against | | | |
| the congestion caused in their network by users on other networks | | | |
| with more liberal policies. | | | |
| | | | |
| For instance, some network owners want to block applications like | | | |
| voice and video unless their network is compensated for the extra | | | |
| share of bottleneck bandwidth taken. These real-time applications | | | |
| tend to be unresponsive when congestion arises. Whereas elastic TCP- | | | |
| based applications back away quickly, ending up taking a much smaller | | | |
| share of congested capacity for themselves. Other network owners | | | |
| want to invest in large amounts of capacity and make their gains from | | | |
| simplicity of operation and economies of scale. | | | |
| | | | |
| re-ECN allows the more conservative networks to police out flows that | | | |
| have not asked to be unresponsive to congestion---not because they | | | |
| are voice or video---just because they don't respond to congestion. | | | |
| But it also allows other networks to choose not to police. | | | |
| Crucially, when flows from liberal networks cross into a conservative | | | |
| network, re-ECN enables the conservative network to apply penalties | | | |
| to its neighbouring networks for the congestion they allow to be | | | |
| caused. And these penalties can be applied to bulk data, without | | | |
| regard to flows. | | | |
| | | | |
|
| Then, if unresponsive applications become so dominant that some of | | o In any stnadard network it always takes one round trip before any | |
| the more liberal networks experience congestion collapse [RFC3714], | | feedback is received. For this reason a sender must make a | |
| they can change their minds and use re-ECN to apply tighter controls | | conservative prediction by transmitting IP packets with a special | |
| in order to bring congestion back under control. | | Cautious marking. | |
| | | | |
|
| re-ECN works by arranging that each packet arrives at each network | | o It should be noted that the prediction is carried in-band in | |
| element carrying a view of expected congestion on its own downstream | | normal data packets and for many transports feedback can be | |
| path, albeit averaged over multiple packets. Most usefully, | | carried in the normal acknowledgements or control packets. | |
| congestion on the remainder of the path becomes visible in the IP | | | |
| header at the first ingress. Many of the applications of re-ECN | | | |
| involve a policer at this ingress using the view of downstream | | | |
| congestion arriving in packets to police or control the packet rate. | | | |
| | | | |
|
| Importantly, the scheme is recursive: a whole network harbouring | | o The re-ECN protocol is independent of the transport. In TCP, | |
| users causing congestion in downstream networks can be held | | acknowledgments are used to convey the feedback from receiver to | |
| responsible or policed by its downstream neighbour. | | sender. This memo concentrates on TCP as an example transport | |
| | | protocol, however the re-ECN protocol is compatible with any | |
| | | transport where feedback can be sent from receiver to sender. | |
| | | | |
| This document is structured as follows. First an overview of the re- | | This document is structured as follows. First an overview of the re- | |
|
| ECN protocol is given (Section 3), outlining its attributes and | | ECN protocol is given (Section 4), outlining its attributes and | |
| explaining conceptually how it works as a whole. The two main parts | | explaining conceptually how it works as a whole. The two main parts | |
| of the document follow. That is, the protocol specification divided | | of the document follow. That is, the protocol specification divided | |
|
| into transport (Section 4) and network (Section 5) layers which | | into network (Section 5) and transport (Section 6) layers. | |
| contain most of the standards compliance terminology, then the | | | |
| applications re-ECN can be put to, such as policing DDoS, QoS and | | | |
| congestion control (Section 6). Although these applications do not | | | |
| require standardisation themselves, they are described in a fair | | | |
| degree of detail in order to explain how re-ECN can be used. Given | | | |
| re-ECN proposes to use the last undefined bit in the IPv4 header, we | | | |
| felt it necessary to outline the potential that re-ECN could release | | | |
| in return for being given that bit. | | | |
| | | | |
| Deployment issues discussed throughout the document are brought | | Deployment issues discussed throughout the document are brought | |
|
| together in Section 7, which is followed by a brief section | | together in Section 7. Related work is discussed in (Section 8). | |
| explaining the somewhat subtle rationale for the design from an | | | |
| architectural perspective (Section 8). We end by describing related | | | |
| work (Section 9), listing security considerations (Section 10) and | | | |
| finally drawing conclusions (Section 12). | | | |
| | | | |
| 2. Requirements notation | | 2. Requirements notation | |
| | | | |
| The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | |
| "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this | | "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this | |
| document are to be interpreted as described in [RFC2119]. | | document are to be interpreted as described in [RFC2119]. | |
| | | | |
|
| This document first specifies a protocol, then describes a framework | | 3. Terminology | |
| that creates the right incentives to ensure compliance to the | | | |
| protocol. This could cause confusion because the second part of the | | | |
| document considers many cases where malicious nodes may not comply | | | |
| with the protocol. When such contingencies are described, if any of | | | |
| the above keywords are not capitalised, that is deliberate. So, for | | | |
| instance, the following two apparently contradictory sentences would | | | |
| be perfectly consistent: i) x MUST do this; ii) x may not do this. | | | |
| | | | |
|
| 3. Protocol Overview | | The following terminology is used throughout this memo. Some of this | |
| | | terminology is new and, to avoid confusion, Appendix F sets out all | |
| | | the alternative terminology that has been used in other re-ECN | |
| | | related documents. | |
| | | | |
|
| 3.1. Background and Applicability | | o Neutral packet - a packet that is able to be congestion marked by | |
| | | an ECN or re-ECN queue. | |
| | | | |
| | | o Negative packet - a Neutral packet that has been congestion marked | |
| | | by an ECN or re-ECN queue. | |
| | | | |
| | | o Positive packet - a packet that has been marked by the sender to | |
| | | indicate the expected level of congestion along its path. In | |
| | | general Positive packets should only be sent in response to | |
| | | feedback received from the receiver.* | |
| | | | |
| | | o Cancelled packet - a Positive Packet that has been congestion | |
| | | marked by an ECN or re-ECN queue. | |
| | | | |
| | | o Cautious packet - a packet that has been marked by the sender to | |
| | | indeiate the expected level of congestion along its path. In | |
| | | general Cautious packets should be used when there is insufficient | |
| | | feedback to be confident about the congestion state of the | |
| | | network.* | |
| | | | |
| | | o * the difference between positive and cautious packets is | |
| | | explained in detail later in the document along with guidelines on | |
| | | the use of Cautious packets. | |
| | | | |
| | | All the above terms have related IP codepoints as defined in | |
| | | (Section 5). | |
| | | | |
| | | 4. Protocol Overview | |
| | | | |
| | | 4.1. Simplified Re-ECN Protocol | |
| | | | |
| | | We describe here the simplified re-ECN protocol. To simplify the | |
| | | description we assume packets and segments are synonymous. | |
| | | | |
| | | Packets are sent from a sender to a receiver. In Figure 1 the queues | |
| | | (Q1 and Q2) are ECN enabled as per RFC 3168 [RFC3168]. If congestion | |
| | | occurs then packets are marked with the congestion experienced (CE) | |
| | | flag exactly as in the ECN protocol [RFC3168]; the routers do not | |
| | | need to be modified and do not need to know the re-ECN protocol. The | |
| | | receiver constantly informs the sender of the current count of | |
| | | Positive packets it has seen. The sender uses this information | |
| | | determine how many Positive packets it must send into the network. | |
| | | The receiver's aim is to balance the number of bytes that have been | |
| | | congestion marked with the number of Positive bytes it has sent. | |
| | | | |
| | | +--------- Feedback----------+ | |
| | | | | | |
| | | v | | |
| | | +---+ +----+ +----+ +---+ | |
| | | | | | | | | | | | |
| | | | S |--->| Q1 |--->| Q2 |--->| R | | |
| | | | | | | | | | | | |
| | | +---+ +----+ +----+ +---+ | |
| | | | |
| | | Figure 1: Simple Re-ECN | |
| | | | |
| | | 4.1.1. Congestion Control and Policing the Protocol | |
| | | | |
| | | The arrangement of the protocol ensures that packets carry a | |
| | | declaration of the amount of congestion that will be experienced on | |
| | | the path. The re-ECN protocol is orthogonal to to any congestion | |
| | | control algorithms, but can be used to ensure that congestion control | |
| | | is being applied by the sender. | |
| | | | |
| | | In general we assume that there will be a policer at the network | |
| | | ingress which can rate limit traffic based on the amount of | |
| | | congestion declared. | |
| | | | |
| | | At the network egress there is a droper which can impose sanctions on | |
| | | flows that incorrectly declare congestion. | |
| | | | |
| | | Policers and droppers are explained in more detail in | |
| | | [re-ecn-motive]. | |
| | | | |
| | | 4.1.2. Background and Applicability | |
| | | | |
| The re-ECN protocol makes no changes and has no effect on the TCP | | The re-ECN protocol makes no changes and has no effect on the TCP | |
| congestion control algorithm or on other rate responses to | | congestion control algorithm or on other rate responses to | |
| congestion. re-ECN is not a new congestion control protocol, rather | | congestion. re-ECN is not a new congestion control protocol, rather | |
| it is orthogonal to congestion control itself. Re-ECN is concerned | | it is orthogonal to congestion control itself. Re-ECN is concerned | |
| with revealing information about congestion so that users and | | with revealing information about congestion so that users and | |
| networks can be held accountable for the congestion they cause, or | | networks can be held accountable for the congestion they cause, or | |
| allow to be caused. | | allow to be caused. | |
| | | | |
| Re-ECN builds on ECN so we briefly recap the essentials of the ECN | | Re-ECN builds on ECN so we briefly recap the essentials of the ECN | |
| protocol [RFC3168]. Two bits in the IP protocol (v4 or v6) are | | protocol [RFC3168]. Two bits in the IP protocol (v4 or v6) are | |
| assigned to the ECN field. The sender clears the field to "00" (Not- | | assigned to the ECN field. The sender clears the field to "00" (Not- | |
| ECT) if either end-point transport is not ECN-capable. Otherwise it | | ECT) if either end-point transport is not ECN-capable. Otherwise it | |
| indicates an ECN-capable transport (ECT) using either of the two | | indicates an ECN-capable transport (ECT) using either of the two | |
| code-points "10" or "01" (ECT(0) and ECT(1) resp.). | | code-points "10" or "01" (ECT(0) and ECT(1) resp.). | |
| | | | |
|
| ECN-capable queues probabilistically set "11" if congestion is | | ECN-capable queues probabilistically set this field to "11" if | |
| experienced (CE), the marking probability increasing with the length | | congestion is experienced (CE). In general this marking probability | |
| of the queue at its egress link (typically using the RED | | will increase with the length of the queue at its egress link | |
| algorithm [RFC2309]). However, they still drop rather than mark Not- | | (typically using the RED algorithm [RFC2309]). However, they still | |
| ECT packets. With multiple ECN-capable queues on a path, a flow of | | drop rather than mark Not-ECT packets. With multiple ECN-capable | |
| packets accumulates the fraction of CE marking that each queue adds. | | queues on a path, a flow of packets accumulates the fraction of CE | |
| The combined effect of the packet marking of all the queues along the | | marking that each queue adds. The combined effect of the packet | |
| path signals congestion of the whole path to the receiver. So, for | | marking of all the queues along the path signals congestion of the | |
| example, if one queue early in a path is marking 1% of packets and | | whole path to the receiver. So, for example, if one queue early in a | |
| another later in a path is marking 2%, flows that pass through both | | path is marking 1% of packets and another later in a path is marking | |
| queues will experience approximately 3% marking (see Appendix A for a | | 2%, flows that pass through both queues will experience approximately | |
| precise treatment). | | 3% marking (see Appendix A for a precise treatment). | |
| | | | |
| The choice of two ECT code-points in the ECN field [RFC3168] | | The choice of two ECT code-points in the ECN field [RFC3168] | |
| permitted future flexibility, optionally allowing the sender to | | permitted future flexibility, optionally allowing the sender to | |
| encode the experimental ECN nonce [RFC3540] in the packet stream. | | encode the experimental ECN nonce [RFC3540] in the packet stream. | |
| The nonce is designed to allow a sender to check the integrity of | | The nonce is designed to allow a sender to check the integrity of | |
|
| congestion feedback. But Section 9.2 explains that it still gives no | | congestion feedback. But Section 8.1 explains that it still gives no | |
| control over how fast the sender transmits as a result of the | | control over how fast the sender transmits as a result of the | |
| feedback. On the other hand, re-ECN is designed both to ensure that | | feedback. On the other hand, re-ECN is designed both to ensure that | |
| congestion is declared honestly and that the sender's rate responds | | congestion is declared honestly and that the sender's rate responds | |
| appropriately. | | appropriately. | |
| | | | |
| Re-ECN is based on a feedback arrangement called `re- | | Re-ECN is based on a feedback arrangement called `re- | |
| feedback' [Re-fb]. The word is short for either receiver-aligned, | | feedback' [Re-fb]. The word is short for either receiver-aligned, | |
| re-inserted or re-echoed feedback. But it actually works even when | | re-inserted or re-echoed feedback. But it actually works even when | |
| no feedback is available. In fact it has been carefully designed to | | no feedback is available. In fact it has been carefully designed to | |
| work for single datagram flows. It also encourages aggregation of | | work for single datagram flows. It also encourages aggregation of | |
| single packet flows by congestion control proxies. Then, even if the | | single packet flows by congestion control proxies. Then, even if the | |
| traffic mix of the Internet were to become dominated by short | | traffic mix of the Internet were to become dominated by short | |
| messages, it would still be possible to control congestion | | messages, it would still be possible to control congestion | |
| effectively and efficiently. | | effectively and efficiently. | |
| | | | |
| Changing the Internet's feedback architecture seems to imply | | Changing the Internet's feedback architecture seems to imply | |
| considerable upheaval. But re-ECN can be deployed incrementally at | | considerable upheaval. But re-ECN can be deployed incrementally at | |
| the transport layer around unmodified queues using existing fields in | | the transport layer around unmodified queues using existing fields in | |
| IP (v4 or v6). However it does also require the last undefined bit | | IP (v4 or v6). However it does also require the last undefined bit | |
| in the IPv4 header, which it uses in combination with the 2-bit ECN | | in the IPv4 header, which it uses in combination with the 2-bit ECN | |
|
| field to create four new codepoints. Nonetheless, we RECOMMENDED | | field to create four new codepoints. Nonetheless, we RECOMMEND | |
| adding optional preferentail drop to IP queues based on the re-ECN | | adding optional preferentail drop to IP queues based on the re-ECN | |
| fields in order to improve resilience against DoS attacks. | | fields in order to improve resilience against DoS attacks. | |
| Similarly, re-ECN works best if both the sender and receiver | | Similarly, re-ECN works best if both the sender and receiver | |
| transports are re-ECN-capable, but it can work with just sender | | transports are re-ECN-capable, but it can work with just sender | |
|
| support. Section 7.1 summarises the incremental deployment strategy. | | support(Section 6.1.2). | |
| | | | |
| Before re-ECN can be considered worthy of using up the last bit in | | | |
| the IP header, we must be sure that all our claims are robust. We | | | |
| have gradually been reducing the list of outstanding issues, but the | | | |
| few that still remain are listed in Section 6.3. We expect new | | | |
| attacks may still be found, but we offer the re-ECN protocol on the | | | |
| basis that it is built on fairly solid theoretical foundations and, | | | |
| so far, it has proved possible to keep it relatively robust. | | | |
| | | | |
| 3.2. Simplified Re-ECN Protocol | | | |
| | | | |
| We describe here the simplified re-ECN protocol. In this first | | | |
| description we assume packets and segments are synonymous. | | | |
| | | | |
| Packets are sent from a sender to a receiver. In Figure 1 the queues | | | |
| (Q1 and Q2) are ECN enabled as per RFC 3168 [ref]. If congestion | | | |
| occurs then packets are marked with the congestion experienced (CE) | | | |
| flag exactly as in the ECN protocol [RFC3168]; the routers do not | | | |
| need to be modified and do not need to know the re-ECN protocol. On | | | |
| reception of marked packets the receiver notifies the sender of the | | | |
| current count of marked packets. Note that this is the number of | | | |
| packets marked rather than the setting of the ECE flag in ECN. The | | | |
| sender uses this information to re-echo mark packets in exact | | | |
| correspondence to the number of CE marked bytes observed at the | | | |
| receiver. | | | |
| | | | |
| +--------- Feedback----------+ | | | |
| | | | | | |
| v | | | | |
| +---+ +----+ +----+ +---+ | | | |
| | | RE | | | | | | | | | |
| | S |--->| Q1 |--->| Q2 |--->| R | | | | |
| | | | | | | | | | | | |
| +---+ +----+ +----+ +---+ | | | |
| | | | |
| Figure 1: Simple Re-ECN | | | |
| | | | |
|
| 3.3. Re-ECN Abstracted Network Layer Wire Protocol (IPv4 or v6) | | 4.2. Re-ECN Abstracted Network Layer Wire Protocol (IPv4 or v6) | |
| | | | |
| The re-ECN wire protocol uses the two bit ECN field broadly as in | | The re-ECN wire protocol uses the two bit ECN field broadly as in | |
| RFC3168 [RFC3168] as described above, but with five differences of | | RFC3168 [RFC3168] as described above, but with five differences of | |
|
| detail (brought together in a list in Section 7.1). This | | detail (brought together in a list in Section 7). This specification | |
| specification defines a new re-ECN extension (RE) flag. We will | | defines a new re-ECN extension (RE) flag. We will defer the | |
| defer the definition of the actual position of the RE flag in the | | definition of the actual position of the RE flag in the IPv4 & v6 | |
| IPv4 & v6 headers until Section 5. When we don't need to choose | | headers until Section 5. When we don't need to choose between IPv4 | |
| between IPv4 and v6 wire protocols it will suffice call it the RE | | and v6 wire protocols it will suffice call it the RE flag. | |
| flag. | | | |
| | | | |
| Unlike the ECN field, the RE flag is intended to be set by the sender | | Unlike the ECN field, the RE flag is intended to be set by the sender | |
|
| and remain unchanged along the path, although it can be read by | | and SHOULD remain unchanged along the path, although it can be read | |
| network elements that understand the re-ECN protocol. It is feasible | | by network elements that understand the re-ECN protocol. It is | |
| that a network element MAY change the setting of the RE flag, perhaps | | feasible that a network element MAY change the setting of the RE | |
| acting as a proxy for an end-point, but such a protocol would have to | | flag, perhaps acting as a proxy for an end-point, but such a protocol | |
| be defined in another specification (e.g. [Re-PCN]). | | would have to be defined in another specification (e.g. [Re-PCN]). | |
| | | | |
| Although the RE flag is a separate, single bit field, it can be read | | Although the RE flag is a separate, single bit field, it can be read | |
| as an extension to the two-bit ECN field; the three concatenated bits | | as an extension to the two-bit ECN field; the three concatenated bits | |
| in what we will call the extended ECN field (EECN) giving eight | | in what we will call the extended ECN field (EECN) giving eight | |
| codepoints. We will use the RFC3168 names of the ECN codepoints to | | codepoints. We will use the RFC3168 names of the ECN codepoints to | |
| describe settings of the ECN field when the RE flag setting is "don't | | describe settings of the ECN field when the RE flag setting is "don't | |
| care", but we also define the following six extended ECN codepoint | | care", but we also define the following six extended ECN codepoint | |
| names for when we need to be more specific. | | names for when we need to be more specific. | |
| | | | |
| One of re-ECN's codepoints is an alternative use of the codepoint set | | One of re-ECN's codepoints is an alternative use of the codepoint set | |
| aside in RFC3168 for the ECN nonce (ECT(1)). Transports using re-ECN | | aside in RFC3168 for the ECN nonce (ECT(1)). Transports using re-ECN | |
| do not need to use the ECN nonce as long as the sender is also | | do not need to use the ECN nonce as long as the sender is also | |
| checking for transport protocol compliance | | checking for transport protocol compliance | |
| [I-D.moncaster-tcpm-rcv-cheat]. The case for doing this is given in | | [I-D.moncaster-tcpm-rcv-cheat]. The case for doing this is given in | |
|
| Appendix I. Two re-ECN codepoints are given compatible uses to those | | Appendix E. Two re-ECN codepoints are given compatible uses to those | |
| defined in RFC3168 (Not-ECT and CE). The other codepoint used by | | defined in RFC3168 (Not-ECT and CE). The other codepoint used by | |
| RFC3168 (ECT(0)) isn't used for re-ECN. Altogether this leave one | | RFC3168 (ECT(0)) isn't used for re-ECN. Altogether this leave one | |
| codepoint of the eight unused by ECN or re-ECN and available for | | codepoint of the eight unused by ECN or re-ECN and available for | |
| future use. | | future use. | |
| | | | |
|
| +-------+------------+------+--------------+------------------------+ | | +--------+-------------+-------+-----------+------------------------+ | |
| | ECN | RFC3168 | RE | Extended ECN | Re-ECN meaning | | | | ECN | RFC3168 | RE | EECN | re-ECN meaning | | |
| | field | codepoint | flag | codepoint | | | | | field | codepoint | flag | codepoint | | | |
|
| +-------+------------+------+--------------+------------------------+ | | +--------+-------------+-------+-----------+------------------------+ | |
| | 00 | Not-ECT | 0 | Not-ECT | Not re-ECN-capable | | | | 00 | Not-ECT | 0 | Not-ECT | Not re-ECN-capable | | |
|
| | | | | | transport | | | | | | | | transport (Legacy) | | |
| | 00 | --- | 1 | FNE | Feedback not | | | | 00 | --- | 1 | FNE | Feedback not | | |
|
| | | | | | established | | | | | | | | established (Cautious) | | |
| | 01 | ECT(1) | 0 | Re-Echo | Re-echoed congestion | | | | 01 | ECT(1) | 0 | Re-Echo | Re-echoed congestion | | |
|
| | | | | | and RECT | | | | | | | | and RECT (Positive) | | |
| | 01 | --- | 1 | RECT | Re-ECN capable | | | | 01 | --- | 1 | RECT | Re-ECN capable | | |
|
| | | | | | transport | | | | | | | | transport (Neutral) | | |
| | 10 | ECT(0) | 0 | ECT(0) | RFC3168 ECN use only | | | | 10 | ECT(0) | 0 | ECT(0) | RFC3168 ECN use only | | |
| | | | | | | | | | | | | | | | |
| | 10 | --- | 1 | --CU-- | Currently unused | | | | 10 | --- | 1 | --CU-- | Currently unused | | |
| | | | | | | | | | | | | | | | |
|
| | 11 | CE | 0 | CE(0) | Re-Echo canceled by | | | | 11 | CE | 0 | CE(0) | Re-Echo cancelled by | | |
| | | | | | congestion experienced | | | | | | | | CE (Cancelled) | | |
| | 11 | --- | 1 | CE(-1) | Congestion experienced | | | | 11 | --- | 1 | CE(-1) | Congestion Experienced | | |
| +-------+------------+------+--------------+------------------------+ | | | | | | | (Negative) | | |
| | | +--------+-------------+-------+-----------+------------------------+ | |
| | | | |
| Table 1: Extended ECN Codepoints | | Table 1: Extended ECN Codepoints | |
| | | | |
|
| 3.4. Re-ECN Protocol Operation | | 4.3. Re-ECN Protocol Operation | |
| | | | |
| In this section we will give an overview of the operation of the re- | | In this section we will give an overview of the operation of the re- | |
| ECN protocol for TCP/IP, leaving a detailed specification to the | | ECN protocol for TCP/IP, leaving a detailed specification to the | |
| following sections. Other transports will be discussed later. | | following sections. Other transports will be discussed later. | |
| | | | |
| In summary, the protocol adds a third `re-echo' stage to the existing | | In summary, the protocol adds a third `re-echo' stage to the existing | |
| TCP/IP ECN protocol. Whenever the network adds CE congestion | | TCP/IP ECN protocol. Whenever the network adds CE congestion | |
| signalling to the IP header on the forward data path, the receiver | | signalling to the IP header on the forward data path, the receiver | |
| feeds it back to the ingress using TCP, then the sender re-echoes it | | feeds it back to the ingress using TCP, then the sender re-echoes it | |
| into the forward data path using the RE flag in the next packet. | | into the forward data path using the RE flag in the next packet. | |
| | | | |
| Prior to receiving any feedback a sender will not know which setting | | Prior to receiving any feedback a sender will not know which setting | |
|
| of the RE flag to use, so it sets the feedback not established (FNE) | | of the RE flag to use, so it sends Cautious packets by setting the | |
| codepoint. The network reads the FNE codepoint conservatively as | | FNE codepoint. The network reads the FNE codepoint conservatively as | |
| equivalent to re-echoed congestion. | | equivalent to re-echoed congestion. | |
| | | | |
|
| Specifically, once feedback from a flow is established, a re-ECN | | Specifically, once feedback from an ECN or re-ECN capable flow is | |
| sender always initialises the ECN field to ECT(1). And it usually | | established, a re-ECN sender always initialises the ECN field to | |
| sets the RE flag to "1". Whenever a queue marks a packet to CE, the | | ECT(1). And it usually sets the RE flag to "1" indicating a Neutral | |
| receiver feeds back this event to the sender. On receiving this | | packet. Whenever a queue marks a packet to CE, the receiver feeds | |
| feedback, the re-ECN sender will clear the RE flag to "0" in the next | | back this event to the sender. On receiving this feedback, the re- | |
| packet it sends. | | ECN sender will clear the RE flag to "0" in the next packet it sends | |
| | | (indicating a Positive packet). | |
| | | | |
| We chose to set and clear the RE flag this way round to ease | | We chose to set and clear the RE flag this way round to ease | |
|
| incremental deployment (see Section 7.1). To avoid confusion we will | | incremental deployment (see Section 7). To avoid confusion we will | |
| use the term `blanking' (rather than marking) when the RE flag is | | use the term `blanking' (rather than marking) when the RE flag is | |
| cleared to "0". So, over a stream of packets, we will talk of the | | cleared to "0". So, over a stream of packets, we will talk of the | |
| `RE blanking fraction' as the fraction of octets in packets with the | | `RE blanking fraction' as the fraction of octets in packets with the | |
| RE flag cleared to "0". | | RE flag cleared to "0". | |
| | | | |
| +---+ +----+ +----+ +---+ | | +---+ +----+ +----+ +---+ | |
| | S |--| Q1 |----------------| Q2 |--| R | | | | S |--| Q1 |----------------| Q2 |--| R | | |
| +---+ +----+ +----+ +---+ | | +---+ +----+ +----+ +---+ | |
| . . . . | | . . . . | |
| ^ . . . . | | ^ . . . . | |
| | | | |
| skipping to change at page 14, line 5 | | skipping to change at page 12, line 5 | |
| horizontal line at 3% in the figure. The CE marked fraction is shown | | horizontal line at 3% in the figure. The CE marked fraction is shown | |
| by the stepped line which rises to meet the RE blanking fraction line | | by the stepped line which rises to meet the RE blanking fraction line | |
| with steps at at each queue where packets are marked. Two queues are | | with steps at at each queue where packets are marked. Two queues are | |
| shown (Q1 and Q2) that are currently congested. Each time packets | | shown (Q1 and Q2) that are currently congested. Each time packets | |
| pass through a fraction are marked; 1% at Q1 and 2% at Q2). The | | pass through a fraction are marked; 1% at Q1 and 2% at Q2). The | |
| approximate downstream congestion can be measured at the observation | | approximate downstream congestion can be measured at the observation | |
| points shown along the path by subtracting the CE marking fraction | | points shown along the path by subtracting the CE marking fraction | |
| from the RE blanking fraction, as shown in the table below | | from the RE blanking fraction, as shown in the table below | |
| (Appendix A derives these approximations from a precise analysis). | | (Appendix A derives these approximations from a precise analysis). | |
| | | | |
|
| +-------------------+------------------------------+ | | NB due to the unary nature of ECN marking and the equivalent unary | |
| | Observation point | Approx downstream congestion | | | nature of re-ECN blanking, the precise fraction of marked bytes must | |
| +-------------------+------------------------------+ | | be calculated by maintaining a moving average of the number of | |
| | L | 3% - 0% = 3% | | | packets that have been marked as a proportion of the total number of | |
| | M | 3% - 1% = 2% | | | packets. | |
| | N | 3% - 3% = 0% | | | | |
| +-------------------+------------------------------+ | | | |
| | | | |
| Table 2: Downstream Congestion Measured at Example Observation Points | | | |
| | | | |
|
| All along the path, whole-path congestion remains unchanged so it can | | Along the path the fraction of packets that had their RE field | |
| be used as a reference against which to compare upstream congestion. | | cleared remains unchanged so it can be used as a reference against | |
| The difference predicts downstream congestion for the rest of the | | which to compare upstream congestion. The difference predicts | |
| path. Therefore, measuring the fractions of each codepoint at any | | downstream congestion for the rest of the path. Therefore, measuring | |
| point in the Internet will reveal upstream, downstream and whole path | | the fractions of each codepoint at any point in the Internet will | |
| congestion. | | reveal upstream, downstream and whole path congestion. | |
| | | | |
| Note that we have introduced discussion of marking and blanking | | Note that we have introduced discussion of marking and blanking | |
|
| fractions solely for illustration. To be absolutely clear, for TCP | | fractions solely for illustration. We are not saying any protocol | |
| these fractions are averages that would result from the behaviour of | | handler will work with these average fractions directly. In fact the | |
| the protocol handler mechanically blanking outgoing packets in direct | | protocol actually requires the number of marked and blanked bytes to | |
| response to incoming feedback---we are not saying any protocol | | balance by the time the packet reaches the receiver. | |
| handler has to work with these average fractions directly. | | | |
| | | | |
| 3.5. Informal Terminology | | | |
| | | | |
|
| In the rest of this memo we will loosely talk of positive or negative | | 4.4. Positive and Negative Flows | |
| flows, meaning flows where the moving average of the downstream | | | |
| congestion metric is persistently positive or negative. A negative | | | |
| flow is one where more CE marked packets than re-ECN blanked packets | | | |
| arrive. Likewise in positive flows more re-ECN blanked packets | | | |
| arrive than CE marked packets. The notion of a negative metric | | | |
| arises because it is derived by subtracting one metric from another. | | | |
| Of course actual downstream congestion cannot be negative, only the | | | |
| metric can (whether due to time lags or deliberate malice). | | | |
| | | | |
|
| Just as we will loosely talk of positive and negative flows, we will | | In Section 3 we introduced the terms Positive, Neutral, Negative, | |
| also talk of positive or negative packets, meaning packets that | | Cautious and Cancelled. This terminology is based on the requirement | |
| contribute positively or negatively to the downstream congestion | | to balance the proportion of bytes marked as CE with the proportion | |
| metric. | | of bytes that are re-echo marked. In the rest of this memo we will | |
| | | loosely talk of positive or negative flows, meaning flows where the | |
| | | moving average of the downstream congestion metric is persistently | |
| | | positive or negative. A negative flow is one where more CE marked | |
| | | packets than re-ECN blanked packets arrive. Likewise in positive | |
| | | flows more re-ECN blanked packets arrive than CE marked packets. The | |
| | | notion of a negative metric arises because it is derived by | |
| | | subtracting one metric from another. Of course actual downstream | |
| | | congestion cannot be negative, only the metric can (whether due to | |
| | | time lags or deliberate malice). | |
| | | | |
| Therefore we will talk of packets having `worth' of +1, 0 or -1, | | Therefore we will talk of packets having `worth' of +1, 0 or -1, | |
| which, when multiplied by their size, indicates their contribution to | | which, when multiplied by their size, indicates their contribution to | |
|
| the downstream congestion metric. | | the downstream congestion metric. The worth of each type of packet | |
| | | is given below in Table 2. The idea is that most flows start with | |
| The idea is that most packets start with zero worth. Every time the | | zero worth. Every time the network decrements the worth of a packet, | |
| network decrements the worth of a packet, the sender increments the | | the sender increments the worth of a later packet. Then, over time, | |
| worth of a later packet. Then, over time, as many positive octets | | as many positive octets should arrive at the receiver as negative. | |
| should arrive at the receiver as negative. Note we have said octets | | Note we have said octets not packets, so if packets are of different | |
| not packets, so if packets are of different sizes, the worth should | | sizes, the worth should be incremented on enough octets to balance | |
| be incremented on enough octets to balance the octets in negative | | the octets in negative packets arriving at the receiver. It is this | |
| packets arriving at the receiver. It is this balance that will allow | | balance that will allow the network to hold the sender accountable | |
| the network to hold the sender accountable for the congestion it | | for the congestion it causes. | |
| causes. | | | |
| | | | |
| If a packet carrying re-echoed congestion happens to also be | | If a packet carrying re-echoed congestion happens to also be | |
| congestion marked, the +1 worth added by the sender will be cancelled | | congestion marked, the +1 worth added by the sender will be cancelled | |
| out by the -1 network congestion marking. Although the two worth | | out by the -1 network congestion marking. Although the two worth | |
| values correctly cancel out, neither the congestion marking nor the | | values correctly cancel out, neither the congestion marking nor the | |
| re-echoed congestion are lost, because the RE bit and the ECN field | | re-echoed congestion are lost, because the RE bit and the ECN field | |
| are orthogonal. So, whenever this happens, the receiver will | | are orthogonal. So, whenever this happens, the receiver will | |
| correctly detect and re-echo the new congestion event as well. | | correctly detect and re-echo the new congestion event as well. | |
| | | | |
| The table below specifies unambiguously the worth of each extended | | The table below specifies unambiguously the worth of each extended | |
| ECN codepoint. Note the order is different from the previous table | | ECN codepoint. Note the order is different from the previous table | |
|
| to better show how the worth increments and decrements. The FNE | | to better show how the worth increments and decrements. | |
| codepoint is used in the flow bootstrap process (explained later) and | | | |
| has the same positive (+1) worth as a packet with the Re-Echo | | | |
| codepoint. | | | |
| | | | |
|
| +--------+------+----------------+-------+--------------------------+ | | +---------+-------+---------------+-------+-------------------------+ | |
| | ECN | RE | Extended ECN | Worth | Re-ECN meaning | | | | ECN | RE | Extended ECN | Worth | Re-ECN Term | | |
| | field | bit | codepoint | | | | | | field | bit | codepoint | | | | |
|
| +--------+------+----------------+-------+--------------------------+ | | +---------+-------+---------------+-------+-------------------------+ | |
| | 00 | 0 | Not-RECT | ... | Not re-ECN-capable | | | | 00 | 0 | Not-RECT | ... | --- | | |
| | | | | | transport | | | | 00 | 1 | FNE | +1 | Cautious | | |
| | 00 | 1 | FNE | +1 | Feedback not established | | | | 01 | 0 | Re-Echo | +1 | Positive | | |
| | 01 | 0 | Re-Echo | +1 | Re-echoed congestion and | | | | 10 | 0 | Legacy | ... | RFC3168 ECN use only | | |
| | | | | | RECT | | | | | | | | | | |
| | 10 | 0 | --- | ... | RFC3168 ECN use only | | | | 11 | 0 | CE(0) | 0 | Negative | | |
| | 11 | 0 | CE(0) | 0 | Re-Echo canceled by | | | | 01 | 1 | RECT | 0 | Neutral | | |
| | | | | | congestion experienced | | | | |
| | 01 | 1 | RECT | 0 | Re-ECN capable transport | | | | |
| | 10 | 1 | --CU-- | ... | Currently unused | | | | 10 | 1 | --CU-- | ... | Currently unused | | |
| | | | | | | | | | | | | | | | |
|
| | 11 | 1 | CE(-1) | -1 | Congestion experienced | | | | 11 | 1 | CE(-1) | -1 | Negative | | |
| +--------+------+----------------+-------+--------------------------+ | | +---------+-------+---------------+-------+-------------------------+ | |
| | | | |
|
| Table 3: 'Worth' of Extended ECN Codepoints | | Table 2: 'Worth' of Extended ECN Codepoints | |
| | | | |
|
| 4. Transport Layers | | 5. Network Layer | |
| | | | |
|
| 4.1. TCP | | 5.1. Re-ECN IPv4 Wire Protocol | |
| | | | |
| | | The wire protocol of the ECN field in the IP header remains largely | |
| | | unchanged from [RFC3168]. However, an extension to the ECN field we | |
| | | call the RE (Re-ECN extension) flag (Section 4.2) is defined in this | |
| | | document. It doubles the extended ECN codepoint space, giving 8 | |
| | | potential codepoints. The semantics of the extra codepoints are | |
| | | backward compatible with the semantics of the 4 original codepoints | |
| | | [RFC3168] (Section 7 collects together and summarises all the changes | |
| | | defined in this document). | |
| | | | |
| | | For IPv4, this document proposes that the new RE control flag will be | |
| | | positioned where the `reserved' control flag was at bit 48 of the | |
| | | IPv4 header (counting from 0). Alternatively, some would call this | |
| | | bit 0 (counting from 0) of byte 7 (counting from 1) of the IPv4 | |
| | | header (Figure 3). | |
| | | | |
| | | 0 1 2 | |
| | | +---+---+---+ | |
| | | | R | D | M | | |
| | | | E | F | F | | |
| | | +---+---+---+ | |
| | | | |
| | | Figure 3: New Definition of the Re-ECN Extension (RE) Control Flag at | |
| | | the Start of Byte 7 of the IPv4 Header | |
| | | | |
| | | The semantics of the RE flag are described in outline in Section 4 | |
| | | and specified fully in Section 6. The RE flag is always considered | |
| | | in conjunction with the 2-bit ECN field, as if they were concatenated | |
| | | together to form a 3-bit extended ECN field. If the ECN field is set | |
| | | to either the ECT(1) or CE codepoint, when the RE flag is blanked | |
| | | (cleared to "0") it represents a re-echo of congestion experienced by | |
| | | an early packet. If the ECN field is set to the Not-ECT codepoint, | |
| | | when the RE flag is set to "1" it represents the feedback not | |
| | | established (FNE) codepoint, which signals that the packet was sent | |
| | | without the benefit of congestion feedback. | |
| | | | |
| | | It is believed that the FNE codepoint can simultaneously serve other | |
| | | purposes, particularly where the start of a flow needs distinguishing | |
| | | from packets later in the flow. For instance it would have been | |
| | | useful to identify new flows for tag switching and might enable | |
| | | similar developments in the future if it were adopted. It is similar | |
| | | to the state set-up bit idea designed to protect against memory | |
| | | exhaustion attacks. This idea was proposed informally by David Clark | |
| | | and documented by Handley and Greenhalgh [Steps_DoS]. The FNE | |
| | | codepoint can be thought of as a `soft-state set-up flag', because it | |
| | | is idempotent (i.e. one occurrence of the flag is sufficient but | |
| | | further occurrences achieve the same effect if previous ones were | |
| | | lost). | |
| | | | |
| | | We are sure there will probably be other claims pending on the use of | |
| | | bit 48. We know of at least two [ARI05], [RFC3514] but neither have | |
| | | been pursued in the IETF, so far, although the present proposal would | |
| | | meet the needs of the latter. | |
| | | | |
| | | The security flag proposal (commonly known as the evil bit) was | |
| | | published on 1 April 2003 as Informational RFC 3514, but it was not | |
| | | adopted due to confusion over whether evil-doers might set it | |
| | | inappropriately. The present proposal is backward compatible with | |
| | | RFC3514 because if re-ECN compliant senders were benign they would | |
| | | correctly clear the evil bit to honestly declare that they had just | |
| | | received congestion feedback. Whereas evil-doers would hide | |
| | | congestion feedback by setting the evil bit continuously, or at least | |
| | | more often than they should. So, evil senders can be identified, | |
| | | because they declare that they are good less often than they should. | |
| | | | |
| | | 5.2. Re-ECN IPv6 Wire Protocol | |
| | | | |
| | | For IPv6, this document proposes that the new RE control flag will be | |
| | | positioned as the first bit of the option field of a new Congestion | |
| | | hop by hop option header (Figure 4). | |
| | | | |
| | | 0 1 2 3 | |
| | | 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | |
| | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |
| | | | Next Header | Hdr ext Len | Option Type | Opt Length =4 | | |
| | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |
| | | |R| Reserved for future use | | |
| | | |E| | | |
| | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |
| | | | |
| | | Figure 4: Definition of a New IPv6 Congestion Hop by Hop Option | |
| | | Header containing the re-ECN Extension (RE) Control Flag | |
| | | | |
| | | 0 1 2 3 4 5 6 7 8 | |
| | | +-+-+-+-+-+-+-+-+- | |
| | | |AIU|C|Option ID| | |
| | | +-+-+-+-+-+-+-+-+- | |
| | | | |
| | | Figure 5: Congestion Hop by Hop Option Type Encoding | |
| | | | |
| | | The Hop-by-Hop Options header enables packets to carry information to | |
| | | be examined and processed by routers or nodes along the packet's | |
| | | delivery path, including the source and destination nodes. For re- | |
| | | ECN, the two bits of the Action If Unrecognized (AIU) flag of the | |
| | | Congestion extension header MUST be set to "00" meaning if | |
| | | unrecognized `skip over option and continue processing the header'. | |
| | | Then, any routers or a receiver not upgraded with the optional re-ECN | |
| | | features described in this memo will simply ignore this header. But | |
| | | routers with these optional re-ECN features or a re-ECN policing | |
| | | function, will process this Congestion extension header. | |
| | | | |
| | | The `C' flag MUST be set to "1" to specify that the Option Data | |
| | | (currently only the RE control flag) can change en-route to the | |
| | | packet's final destination. This ensures that, when an | |
| | | Authentication header (AH [RFC4302]) is present in the packet, for | |
| | | any option whose data may change en-route, its entire Option Data | |
| | | field will be treated as zero-valued octets when computing or | |
| | | verifying the packet's authenticating value. | |
| | | | |
| | | Although the RE control flag should not be changed along the path, we | |
| | | expect that the rest of this option field that is currently `Reserved | |
| | | for future use' could be used for a multi-bit congestion notification | |
| | | field which we would expect to change en route. As the RE flag does | |
| | | not need end-to-end authentication, we set the C flag to '1'. | |
| | | | |
| | | {ToDo: A Congestion Hop by Hop Option ID will need to be registered | |
| | | with IANA.} | |
| | | | |
| | | 5.3. Router Forwarding Behaviour | |
| | | | |
| | | Re-ECN works well without modifying the forwarding behaviour of any | |
| | | routers. However, below, two OPTIONAL changes to forwarding | |
| | | behaviour are defined which respectively enhance performance and | |
| | | improve a router's discrimination against flooding attacks. They are | |
| | | both OPTIONAL additions that we propose MAY apply by default to all | |
| | | Diffserv per-hop scheduling behaviours (PHBs) [RFC2475] and ECN | |
| | | marking behaviours [RFC3168]. Specifications for PHBs MAY define | |
| | | different forwarding behaviours from this default, but this is not | |
| | | required. [Re-PCN] is one example. | |
| | | | |
| | | FNE indicates ECT: | |
| | | | |
| | | The FNE codepoint tells a router to assume that the packet was | |
| | | sent by an ECN-capable transport (see Section 5.4). Therefore an | |
| | | FNE packet MAY be marked rather than dropped. Note that the FNE | |
| | | codepoint has been intentionally chosen so that, to RFC3168 | |
| | | compliant routers (which do not inspect the RE flag) an FNE packet | |
| | | appears to be Not-ECT so it will be dropped by legacy AQM | |
| | | algorithms. | |
| | | | |
| | | A network operator MUST NOT configure a queue to ECN mark rather | |
| | | than drop FNE packets unless it can guarantee that FNE packets | |
| | | will be rate limited, either locally or upstream. The ingress | |
| | | policers discussed in [re-ecn-motive] would count as rate limiters | |
| | | for this purpose. | |
| | | | |
| | | Preferential Drop: If a re-ECN capable router queue experiences very | |
| | | high load so that it has to drop arriving packets (e.g. a DoS | |
| | | attack), it MAY preferentially drop packets within the same | |
| | | Diffserv PHB using the preference order for extended ECN | |
| | | codepoints given in Table 3. Preferential dropping can be | |
| | | difficult to implement on some hardware, but if feasible it would | |
| | | discriminate against attack traffic if done as part of the overall | |
| | | policing framework of [re-ecn-motive]. If nowhere else, routers | |
| | | at the egress of a network SHOULD implement preferential drop | |
| | | (stronger than the MAY above). For simplicity, preferences 4 & 5 | |
| | | MAY be merged into one preference level. | |
| | | | |
| | | +-------+-----+------------+-------+------------+-------------------+ | |
| | | | ECN | RE | Extended | Worth | Drop Pref | Re-ECN meaning | | |
| | | | field | bit | ECN | | (1 = drop | | | |
| | | | | | codepoint | | 1st) | | | |
| | | +-------+-----+------------+-------+------------+-------------------+ | |
| | | | 01 | 0 | Re-Echo | +1 | 5/4 | Re-echoed | | |
| | | | | | | | | congestion and | | |
| | | | | | | | | RECT | | |
| | | | 00 | 1 | FNE | +1 | 4 | Feedback not | | |
| | | | | | | | | established | | |
| | | | 11 | 0 | CE(0) | 0 | 3 | Re-Echo canceled | | |
| | | | | | | | | by congestion | | |
| | | | | | | | | experienced | | |
| | | | 01 | 1 | RECT | 0 | 3 | Re-ECN capable | | |
| | | | | | | | | transport | | |
| | | | 11 | 1 | CE(-1) | -1 | 3 | Congestion | | |
| | | | | | | | | experienced | | |
| | | | 10 | 1 | --CU-- | n/a | 2 | Currently Unused | | |
| | | | 10 | 0 | --- | n/a | 2 | RFC3168 ECN use | | |
| | | | | | | | | only | | |
| | | | 00 | 0 | Not-RECT | n/a | 1 | Not | | |
| | | | | | | | | Re-ECN-capable | | |
| | | | | | | | | transport | | |
| | | +-------+-----+------------+-------+------------+-------------------+ | |
| | | | |
| | | Table 3: Drop Preference of EECN Codepoints (Sorted by `Worth') | |
| | | | |
| | | The above drop preferences are arranged to preserve packets with | |
| | | more positive worth (Section 4.4), given senders of positive | |
| | | packets must have honestly declared downstream congestion. A full | |
| | | treatment of this is provided in the companion document desribing | |
| | | the motivation and architecture for re-ECN [re-ecn-motive] | |
| | | particularly when the application of re-ECN to protect against | |
| | | DDoS attacks is described. | |
| | | | |
| | | 5.4. Justification for Setting the First SYN to FNE | |
| | | | |
| | | the initial SYN MUST be set to FNE by Re-ECT client A (Section 6.1.4) | |
| | | and (Section 5.3) says a queue MAY optionally treat an FNE packet as | |
| | | ECN capable, so an initial SYN may be marked CE(-1) rather than | |
| | | dropped. This seems dangerous, because the sender has not yet | |
| | | established whether the receiver is a RFC3168 one that does not | |
| | | understand congestion marking. It also seems to allow malicious | |
| | | senders to take advantage of ECN marking to avoid so much drop when | |
| | | launching SYN flooding attacks. Below we explain the features of the | |
| | | protocol design that remove both these dangers. | |
| | | | |
| | | ECN-capable initial SYN with a Not-ECT server: If the TCP server B | |
| | | is re-ECN capable, provision is made for it to feedback a possible | |
| | | congestion marked SYN in the SYN ACK (Section 6.1.4). But if the | |
| | | TCP client A finds out from the SYN ACK that the server was not | |
| | | ECN-capable, the TCP client MUST conservatively consider the first | |
| | | SYN as congestion marked before setting itself into Not-ECT mode. | |
| | | Section 6.1.4 mandates that such a TCP client MUST also set its | |
| | | initial window to 1 segment. In this way we remove the need to | |
| | | cautiously avoid setting the first SYN to Not-RECT. This will | |
| | | give worse performance while deployment is patchy, but better | |
| | | performance once deployment is widespread. | |
| | | | |
| | | SYN flooding attacks can't exploit ECN-capability: Malicious hosts | |
| | | may think they can use the advantage that ECN-marking gives over | |
| | | drop in launching classic SYN-flood attacks. But Section 5.3 | |
| | | mandates that a router MUST only be configured to treat packets | |
| | | with the FNE codepoint as ECN-capable if FNE packets are rate | |
| | | limited somewhere. Introduction of the FNE codepoint was a | |
| | | deliberate move to enable transport-neutral handling of flow-start | |
| | | and flow state set-up in the IP layer where it belongs. It then | |
| | | becomes possible to protect against flooding attacks of all forms | |
| | | (not just SYN flooding) without transport-specific inspection for | |
| | | things like the SYN flag in TCP headers. Then, for instance, SYN | |
| | | flooding attacks using IPSec ESP encryption can also be rate | |
| | | limited at the IP layer. | |
| | | | |
| | | It might seem pedantic going to all this trouble to enable ECN on the | |
| | | initial packet of a flow, but it is motivated by a much wider concern | |
| | | to ensure safe congestion control will still be possible even if the | |
| | | application mix evolves to the point where the majority of flows | |
| | | consist of a single window or even a single packet. It also allows | |
| | | denial of service attacks to be more easily isolated and prevented. | |
| | | | |
| | | 5.5. Control and Management | |
| | | | |
| | | 5.5.1. Negative Balance Warning | |
| | | | |
| | | A new ICMP message type is being considered so that a dropper can | |
| | | warn the apparent sender of a flow that it has started to sanction | |
| | | the flow. The message would have similar semantics to the `Time | |
| | | exceeded' ICMP message type. To ensure the sender has to invest some | |
| | | work before the network will generate such a message, a dropper | |
| | | SHOULD only send such a message for flows that have demonstrated that | |
| | | they have started correctly by establishing a positive record, but | |
| | | have later gone negative. The threshold is up to the implementation. | |
| | | The purpose of the message is to deconfuse the cause of drops from | |
| | | other causes, such as congestion or transmission losses. The dropper | |
| | | would send the message to the sender of the flow, not the receiver. | |
| | | | |
| | | If we did define this message type, it would be REQUIRED for all re- | |
| | | ECT senders to parse and understand it. Note that a sender MUST only | |
| | | use this message to explain why losses are occurring. A sender MUST | |
| | | NOT take this message to mean that losses have occurred that it was | |
| | | not aware of. Otherwise, spoof messages could be sent by malicious | |
| | | sources to slow down a sender (c.f. ICMP source quench). | |
| | | | |
| | | However, the need for this message type is not yet confirmed, as we | |
| | | are considering how to prevent it being used by malicious senders to | |
| | | scan for droppers and to test their threshold settings. {ToDo: | |
| | | Complete this section.} | |
| | | | |
| | | 5.5.2. Rate Response Control | |
| | | | |
| | | As discussed in [re-ecn-motive] the sender's access operator will be | |
| | | expected to use bulk per-user policing, but they might choose to | |
| | | introduce a per-flow policer. In cases where operators do introduce | |
| | | per-flow policing, there may be a need for a sender to send a request | |
| | | to the ingress policer asking for permission to apply a non-default | |
| | | response to congestion (where TCP-friendly is assumed to be the | |
| | | default). This would require the sender to know what message | |
| | | format(s) to use and to be able to discover how to address the | |
| | | policer. The required control protocol(s) are outside the scope of | |
| | | this document, but will require definition elsewhere. | |
| | | | |
| | | The policer is likely to be local to the sender and inline, probably | |
| | | at the ingress interface to the internetwork. So, discovery should | |
| | | not be hard. A variety of control protocols already exist for some | |
| | | widely used rate-responses to congestion. For instance DCCP | |
| | | congestion control identifiers (CCIDs [RFC4340]) fulfil this role and | |
| | | so does QoS signalling (e.g. and RSVP request for controlled load | |
| | | service is equivalent to a request for no rate response to | |
| | | congestion, but with admission control). | |
| | | | |
| | | 5.6. IP in IP Tunnels | |
| | | | |
| | | For re-ECN to work correctly through IP in IP tunnels, it needs | |
| | | slightly different tunnel handling to regular ECN [RFC3168]. | |
| | | Currently there is some incosistency between how the handling of IP | |
| | | in IP tunnels is defined in [RFC3168] and how it is defined in | |
| | | [RFC4301], but re-ECN would work fine with the IPsec behaviour. This | |
| | | inconsistency is addressed in a new Internet Draft [ECN-tunnel] that | |
| | | proposes to update RFC3168 tunnel behaviour to bring it into line | |
| | | with IPsec. Ideally, for re-ECN to work through a tunnel, the tunnel | |
| | | entry should copy both the RE flag and the ECN field from the inner | |
| | | to the outer IP header. Then at the tunnel exit, any congestion | |
| | | marking of the outer ECN field should overwrite the inner ECN field | |
| | | (unless the inner field is Not-ECT in which case an alarm should be | |
| | | raised). The RE flag shouldn't change along a path, so the outer RE | |
| | | flag should be the same as the inner. If it isn't a management alarm | |
| | | should be raised. This behaviour is the same as the full- | |
| | | functionality variant of [RFC3168] at tunnel exit, but different at | |
| | | tunnel entry. | |
| | | | |
| | | If tunnels are left as they are specified in [RFC3168], whether the | |
| | | limited or full-functionality variants are used, a problem arises | |
| | | with re-ECN if a tunnel crosses an inter-domain boundary, because the | |
| | | difference between positive and negative markings will not be | |
| | | correctly accounted for. In a limited functionality ECN tunnel, the | |
| | | flow will appear to be RFC3168 compliant traffic, and therefore may | |
| | | be wrongly rate limited. In a full-functionality ECN tunnel, the | |
| | | result will depend whether the tunnel entry copies the inner RE flag | |
| | | to the outer header or the RE flag in the outer header is always | |
| | | cleared. If the former, the flow will tend to be too positive when | |
| | | accounted for at borders. If the latter, it will be too negative. | |
| | | If the rules set out in [ECN-tunnel] are followed then this will not | |
| | | be an issue. | |
| | | | |
| | | 5.7. Non-Issues | |
| | | | |
| | | The following issues might seem to cause unfavourable interactions | |
| | | with re-ECN, but we will explain why they don't: | |
| | | | |
| | | o Various link layers support explicit congestion notification, such | |
| | | as Frame Relay and ATM. Explicit congestion notification is | |
| | | proposed to be added to other link layers, such as Ethernet | |
| | | (802.3ar Ethernet congestion management) and MPLS [RFC5129]; | |
| | | | |
| | | o Encryption and IPSec. | |
| | | | |
| | | In the case of congestion notification at the link layer, each | |
| | | particular link layer scheme either manages congestion on the link | |
| | | with its own link-level feedback (the usual arrangement in the cases | |
| | | of ATM and Frame Relay), or congestion notification from the link | |
| | | layer is merged into congestion notification at the IP level when the | |
| | | frame headers are decapsulated at the end of the link (the | |
| | | recommended arrangement in the Ethernet and MPLS cases). Given the | |
| | | RE flag is not intended to change along the path, this means that | |
| | | downstream congestion will still be measureable at any point where IP | |
| | | is processed on the path by subtracting positive from negative | |
| | | markings. | |
| | | | |
| | | In the case of encryption, as long as the tunnel issues described in | |
| | | Section 5.6 are dealt with, payload encryption itself will not be a | |
| | | problem. The design goal of re-ECN is to include downstream | |
| | | congestion in the IP header so that it is not necessary to bury into | |
| | | inner headers. Obfuscation of flow identifiers is not a problem for | |
| | | re-ECN policing elements. Re-ECN doesn't ever require flow | |
| | | identifiers to be valid, it only requires them to be unique. So if | |
| | | an IPSec encapsulating security payload (ESP [RFC4305]) or an | |
| | | authentication header (AH [RFC4302]) is used, the security parameters | |
| | | index (SPI) will be a sufficient flow identifier, as it is intended | |
| | | to be unique to a flow without revealing actual port numbers. | |
| | | | |
| | | In general, even if endpoints use some locally agreed scheme to hide | |
| | | port numbers, re-ECN policing elements can just consider the pair of | |
| | | source and destination IP addresses as the flow identifier. Re-ECN | |
| | | encourages endpoints to at least tell the network layer that a | |
| | | sequence of packets are all part of the same flow, if indeed they | |
| | | are. The alternative would be for the sender to make each packet | |
| | | appear to be a new flow, which would require them all to be marked | |
| | | FNE in order to avoid being treated with the bulk of malicious flows | |
| | | at the egress dropper. Given the FNE marking is worth +1 and | |
| | | networks are likely to rate limit FNE packets, endpoints are given an | |
| | | incentive not to set FNE on each packet. But if the sender really | |
| | | does want to hide the flow relationship between packets it can choose | |
| | | to pay the cost of multiple FNE packets, which in the long run will | |
| | | compensate for the extra memory required on network policing elements | |
| | | to process each flow. | |
| | | | |
| | | 6. Transport Layers | |
| | | | |
| | | 6.1. TCP | |
| | | | |
| Re-ECN capability at the sender is essential. At the receiver it is | | Re-ECN capability at the sender is essential. At the receiver it is | |
| optional, as long as the receiver has a basic RFC3168-compliant ECN- | | optional, as long as the receiver has a basic RFC3168-compliant ECN- | |
| capable transport (ECT) [RFC3168]. Given re-ECN is not the first | | capable transport (ECT) [RFC3168]. Given re-ECN is not the first | |
| attempt to define the semantics of the ECN field, we give a table | | attempt to define the semantics of the ECN field, we give a table | |
| below summarising what happens for various combinations of | | below summarising what happens for various combinations of | |
| capabilities of the sender S and receiver R, as indicated in the | | capabilities of the sender S and receiver R, as indicated in the | |
| first four columns below. The last column gives the mode a half- | | first four columns below. The last column gives the mode a half- | |
| connection should be in after the first two of the three TCP | | connection should be in after the first two of the three TCP | |
| handshakes. | | handshakes. | |
| | | | |
| skipping to change at page 17, line 5 | | skipping to change at page 22, line 40 | |
| at least one of the transports does not understand even basic ECN | | at least one of the transports does not understand even basic ECN | |
| marking. | | marking. | |
| | | | |
| Note that we use the term Re-ECT for a host transport that is re-ECN- | | Note that we use the term Re-ECT for a host transport that is re-ECN- | |
| capable but RECN for the modes of the half connections between hosts | | capable but RECN for the modes of the half connections between hosts | |
| when they are both Re-ECT. If a host transport is Re-ECT, this fact | | when they are both Re-ECT. If a host transport is Re-ECT, this fact | |
| alone does NOT imply either of its half connections will necessarily | | alone does NOT imply either of its half connections will necessarily | |
| be in RECN mode, at least not until it has confirmed that the other | | be in RECN mode, at least not until it has confirmed that the other | |
| host is Re-ECT. | | host is Re-ECT. | |
| | | | |
|
| 4.1.1. RECN mode: Full Re-ECN capable transport | | 6.1.1. RECN mode: Full Re-ECN capable transport | |
| | | | |
| In full RECN mode, for each half connection, both the sender and the | | In full RECN mode, for each half connection, both the sender and the | |
| receiver each maintain an unsigned integer counter we will call ECC | | receiver each maintain an unsigned integer counter we will call ECC | |
| (echo congestion counter). The receiver maintains a count of how | | (echo congestion counter). The receiver maintains a count of how | |
| many times a CE marked packet has arrived during the half-connection. | | many times a CE marked packet has arrived during the half-connection. | |
| Once a RECN connection is established, the three TCP option flags | | Once a RECN connection is established, the three TCP option flags | |
| (ECE, CWR & NS) used for ECN-related functions in other versions of | | (ECE, CWR & NS) used for ECN-related functions in other versions of | |
| ECN are used as a 3-bit field for the receiver to repeatedly tell the | | ECN are used as a 3-bit field for the receiver to repeatedly tell the | |
| sender the current value of ECC, modulo 8, whenever it sends a TCP | | sender the current value of ECC, modulo 8, whenever it sends a TCP | |
| ACK. We will call this the echo congestion increment (ECI) field. | | ACK. We will call this the echo congestion increment (ECI) field. | |
| This overloaded use of these 3 option flags as one 3-bit ECI field is | | This overloaded use of these 3 option flags as one 3-bit ECI field is | |
|
| shown in Figure 4. The actual definition of the TCP header, | | shown in Figure 7. The actual definition of the TCP header, | |
| including the addition of support for the ECN nonce, is shown for | | including the addition of support for the ECN nonce, is shown for | |
|
| comparison in Figure 3. This specification does not redefine the | | comparison in Figure 6. This specification does not redefine the | |
| names of these three TCP option flags, it merely overloads them with | | names of these three TCP option flags, it merely overloads them with | |
| another definition once a flow is established. | | another definition once a flow is established. | |
| | | | |
| 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | | 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | |
| +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | | +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | |
| | | | N | C | E | U | A | P | R | S | F | | | | | | N | C | E | U | A | P | R | S | F | | |
| | Header Length | Reserved | S | W | C | R | C | S | S | Y | I | | | | Header Length | Reserved | S | W | C | R | C | S | S | Y | I | | |
| | | | | R | E | G | K | H | T | N | N | | | | | | | R | E | G | K | H | T | N | N | | |
| +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | | +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | |
| | | | |
|
| Figure 3: The (post-ECN Nonce) definition of bytes 13 and 14 of the | | Figure 6: The (post-ECN Nonce) definition of bytes 13 and 14 of the | |
| TCP Header | | TCP Header | |
| | | | |
| 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | | 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | |
| +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | | +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | |
| | | | | U | A | P | R | S | F | | | | | | | U | A | P | R | S | F | | |
| | Header Length | Reserved | ECI | R | C | S | S | Y | I | | | | Header Length | Reserved | ECI | R | C | S | S | Y | I | | |
| | | | | G | K | H | T | N | N | | | | | | | G | K | H | T | N | N | | |
| +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | | +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | |
| | | | |
|
| Figure 4: Definition of the ECI field within bytes 13 and 14 of the | | Figure 7: Definition of the ECI field within bytes 13 and 14 of the | |
| TCP Header, overloading the current definitions above for established | | TCP Header, overloading the current definitions above for established | |
| RECN flows. | | RECN flows. | |
| | | | |
| Receiver Action in RECN Mode | | Receiver Action in RECN Mode | |
| | | | |
| Every time a CE marked packet arrives at a receiver in RECN mode, | | Every time a CE marked packet arrives at a receiver in RECN mode, | |
| the receiver transport increments its local value of ECC and MUST | | the receiver transport increments its local value of ECC and MUST | |
| echo its value, modulo 8, to the sender in the ECI field of the | | echo its value, modulo 8, to the sender in the ECI field of the | |
| next ACK. It MUST repeat the same value of ECI in every | | next ACK. It MUST repeat the same value of ECI in every | |
| subsequent ACK until the next CE event, when it increments ECI | | subsequent ACK until the next CE event, when it increments ECI | |
| | | | |
| skipping to change at page 18, line 30 | | skipping to change at page 24, line 22 | |
| below for the sender's safety strategy). Whenever the ECI field | | below for the sender's safety strategy). Whenever the ECI field | |
| increments by D (and/or d drops are detected), the sender MUST | | increments by D (and/or d drops are detected), the sender MUST | |
| clear the RE flag to "0" in the IP header of the next D' data | | clear the RE flag to "0" in the IP header of the next D' data | |
| packets it sends (where D' = D + d), effectively re-echoing each | | packets it sends (where D' = D + d), effectively re-echoing each | |
| single increment of ECI. Otherwise the data sender MUST send all | | single increment of ECI. Otherwise the data sender MUST send all | |
| data packets with RE set to "1". | | data packets with RE set to "1". | |
| | | | |
| As a general rule, once a flow is established, as well as setting | | As a general rule, once a flow is established, as well as setting | |
| or clearing the RE flag as above, a data sender in RECN mode MUST | | or clearing the RE flag as above, a data sender in RECN mode MUST | |
| always set the ECN field to ECT(1). However, the settings of the | | always set the ECN field to ECT(1). However, the settings of the | |
|
| extended ECN field during flow start are defined in Section 4.1.4. | | extended ECN field during flow start are defined in Section 6.1.4. | |
| | | | |
| As we have already emphasised, the re-ECN protocol makes no | | As we have already emphasised, the re-ECN protocol makes no | |
| changes and has no effect on the TCP congestion control algorithm. | | changes and has no effect on the TCP congestion control algorithm. | |
| So, the first increment of ECI (or detection of a drop) in a RTT | | So, the first increment of ECI (or detection of a drop) in a RTT | |
| triggers the standard TCP congestion response, no more than one | | triggers the standard TCP congestion response, no more than one | |
| congestion response per round trip, as usual. However, the sender | | congestion response per round trip, as usual. However, the sender | |
| re-echoes every increment of ECI irrespective of RTTs. | | re-echoes every increment of ECI irrespective of RTTs. | |
| | | | |
| A TCP sender also acts as the receiver for the other half- | | A TCP sender also acts as the receiver for the other half- | |
| connection. The host will maintain two ECC values S.ECC and R.ECC | | connection. The host will maintain two ECC values S.ECC and R.ECC | |
| as sender and receiver respectively. Every TCP header sent by a | | as sender and receiver respectively. Every TCP header sent by a | |
| host in RECN mode will also repeat the prevailing value of R.ECC | | host in RECN mode will also repeat the prevailing value of R.ECC | |
| in its ECI field. If a sender in RECN mode has to retransmit a | | in its ECI field. If a sender in RECN mode has to retransmit a | |
| packet due to a suspected loss, the re-transmitted packet MUST | | packet due to a suspected loss, the re-transmitted packet MUST | |
| carry the latest prevailing value of R.ECC when it is re- | | carry the latest prevailing value of R.ECC when it is re- | |
| transmitted, which will not necessarily be the one it carried | | transmitted, which will not necessarily be the one it carried | |
| originally. | | originally. | |
| | | | |
|
| 4.1.1.1. Drops and Marks | | 6.1.2. RECN-Co mode: Re-ECT Sender with a RFC3168 compliant ECN | |
| | | | |
| Re-ECN is based on the ECN protocol [RFC3168] . In turn the | | | |
| congestion markings ECN uses are typically based on the RED | | | |
| algorithm [RFC2309]. This algorithm marks packets as CE with a | | | |
| probability that increases as the size of the router queue increases. | | | |
| However, if the queue becomes too full then it will revert to | | | |
| dropping packets. Because of this it is important that a re-ECN | | | |
| sender treats each packet drop it detects as if it were actually a CE | | | |
| mark. This ensures that it can continue to correctly echo congestion | | | |
| even through a highly congested path. | | | |
| | | | |
| In order to ensure that drops are correctly echoed the sender needs | | | |
| to add the number of drops detected per RTT to the difference in ECI | | | |
| value waiting to be echoed. Drop detection is defined as set out in | | | |
| [RFC2581] -- if the connection is in slow start then a single | | | |
| duplicate aknowledgement will be treated as an indication of a drop. | | | |
| When the system is in the congestion avoidance stage then 3 duplicate | | | |
| acknowledgements will be treated as a sign of a drop. In all cases, | | | |
| if a re-transmission time-out occurs then that will be treatd as a | | | |
| drop. | | | |
| | | | |
| 4.1.1.2. Safety against Long Pure ACK Loss Sequences | | | |
| | | | |
| The ECI method was chosen for echoing congestion marking because a | | | |
| re-ECN sender needs to know about every CE mark arriving at the | | | |
| receiver, not just whether at least one arrives within a round trip | | | |
| time (which is all the ECE/CWR mechanism supported). And, as pure | | | |
| ACKs are not protected by TCP reliable delivery, we repeat the same | | | |
| ECI value in every ACK until it changes. Even if many ACKs in a row | | | |
| are lost, as soon as one gets through, the ECI field it repeats from | | | |
| previous ACKs that didn't get through will update the sender on how | | | |
| many CE marks arrived since the last ACK got through. | | | |
| | | | |
| The sender will only lose a record of the arrival of a CE mark if all | | | |
| the ACKS are lost (and all of them were pure ACKs) for a stream of | | | |
| data long enough to contain 8 or more CE marks. So, if the marking | | | |
| fraction was p, at least 8/p pure ACKs would have to be lost. For | | | |
| example, if p was 5%, a sequence of 160 pure ACKs would all have to | | | |
| be lost. To protect against such extremely unlikely events, if a re- | | | |
| ECN sender detects a sequence of pure ACKs has been lost it SHOULD | | | |
| assume the ECI field wrapped as many times as possible within the | | | |
| sequence. | | | |
| | | | |
| Specifically, if a re-ECN sender receives an ACK with an | | | |
| acknowledgement number that acknowledges L segments since the | | | |
| previous ACK but with a sequence number unchanged from the previously | | | |
| received ACK, it SHOULD conservatively assume that the ECI field | | | |
| incremented by D' = L - ((L-D) mod 8), where D is the apparent | | | |
| increase in the ECI field. For example if the ACK arriving after 9 | | | |
| pure ACK losses apparently increased ECI by 2, the assumed increment | | | |
| of ECI would still be 2. But if ECI apparently increased by 2 after | | | |
| 11 pure ACK losses, ECI should be assumed to have increased by 10. | | | |
| | | | |
| A re-ECN sender MAY implement a heuristic algorithm to predict beyond | | | |
| reasonable doubt that the ECI field probably did not wrap within a | | | |
| sequence of lost pure ACKs. But such an algorithm is OPTIONAL. Such | | | |
| an algorithm MUST NOT be used unless it is proven to work even in the | | | |
| presence of correlation between high ACK loss rate on the back | | | |
| channel and high CE marking rate on the forward channel. | | | |
| | | | |
| Whatever assumption a re-ECN sender makes about potentially lost CE | | | |
| marks, both its congestion control and its re-echoing behaviour | | | |
| SHOULD be consistent with the assumption it makes. | | | |
| | | | |
| 4.1.2. RECN-Co mode: Re-ECT Sender with a RFC3168 compliant ECN | | | |
| Receiver | | Receiver | |
| | | | |
| If the half-connection is in RECN-Co mode, ECN feedback proceeds no | | If the half-connection is in RECN-Co mode, ECN feedback proceeds no | |
| differently to that of RFC3168 compliant ECN. In other words, the | | differently to that of RFC3168 compliant ECN. In other words, the | |
| receiver sets the ECE flag repeatedly in the TCP header and the | | receiver sets the ECE flag repeatedly in the TCP header and the | |
| sender responds by setting the CWR flag. Although RECN-Co mode is | | sender responds by setting the CWR flag. Although RECN-Co mode is | |
| used when the receiver has not implemented the re-ECN protocol, the | | used when the receiver has not implemented the re-ECN protocol, the | |
| sender can infer enough from its RFC3168 compliant ECN feedback to | | sender can infer enough from its RFC3168 compliant ECN feedback to | |
| set or clear the RE flag reasonably well. Specifically, every time | | set or clear the RE flag reasonably well. Specifically, every time | |
| the receiver toggles the ECE field from "0" to "1" (or a loss is | | the receiver toggles the ECE field from "0" to "1" (or a loss is | |
| | | | |
| skipping to change at page 20, line 45 | | skipping to change at page 25, line 19 | |
| packets with RE set to "1". Once a flow is established, a re-ECN | | packets with RE set to "1". Once a flow is established, a re-ECN | |
| data sender in RECN-Co mode MUST always set the ECN field to ECT(1). | | data sender in RECN-Co mode MUST always set the ECN field to ECT(1). | |
| | | | |
| If a CE marked packet arrives at the receiver within a round trip | | If a CE marked packet arrives at the receiver within a round trip | |
| time of a previous mark, the receiver will still be echoing ECE for | | time of a previous mark, the receiver will still be echoing ECE for | |
| the last CE mark. Therefore, such a mark will be missed by the | | the last CE mark. Therefore, such a mark will be missed by the | |
| sender. Of course, this isn't of concern for congestion control, but | | sender. Of course, this isn't of concern for congestion control, but | |
| it does mean that very occasionally the RE blanking fraction will be | | it does mean that very occasionally the RE blanking fraction will be | |
| understated. Therefore flows in RECN-Co mode may occasionally be | | understated. Therefore flows in RECN-Co mode may occasionally be | |
| mistaken for very lightly cheating flows and consequently might | | mistaken for very lightly cheating flows and consequently might | |
|
| suffer a small number of packet drops through an egress dropper | | suffer a small number of packet drops through an egress dropper. We | |
| (Section 6.1.4). We expect re-ECN would be deployed for some time | | expect re-ECN would be deployed for some time before policers and | |
| before policers and droppers start to enforce it. So, given there is | | droppers start to enforce it. So, given there is not much ECN | |
| not much ECN deployment yet anyway, this minor problem may affect | | deployment yet anyway, this minor problem may affect only a very | |
| only a very small proportion of flows, reducing to nothing over the | | small proportion of flows, reducing to nothing over the years as | |
| years as RFC3168 compliant ECN hosts upgrade. The use of RECN-Co | | RFC3168 compliant ECN hosts upgrade. The use of RECN-Co mode would | |
| mode would need to be reviewed in the light of experience at the time | | need to be reviewed in the light of experience at the time of re-ECN | |
| of re-ECN deployment. | | deployment. | |
| | | | |
| RECN-Co mode is OPTIONAL. Re-ECN implementers who want to keep their | | RECN-Co mode is OPTIONAL. Re-ECN implementers who want to keep their | |
| code simple, MAY choose not to implement this mode. If they do not, | | code simple, MAY choose not to implement this mode. If they do not, | |
| a re-ECN sender SHOULD fall back to RFC3168 compliant ECT mode in the | | a re-ECN sender SHOULD fall back to RFC3168 compliant ECT mode in the | |
| presence of an ECN-capable receiver. It MAY choose to fall back to | | presence of an ECN-capable receiver. It MAY choose to fall back to | |
| the ECT-Nonce mode, but if re-ECN implementers don't want to be | | the ECT-Nonce mode, but if re-ECN implementers don't want to be | |
| bothered with RECN-Co mode, they probably won't want to add an ECT- | | bothered with RECN-Co mode, they probably won't want to add an ECT- | |
| Nonce mode either. | | Nonce mode either. | |
| | | | |
|
| 4.1.2.1. Re-ECN support for the ECN Nonce | | 6.1.2.1. Re-ECN support for the ECN Nonce | |
| | | | |
| A TCP half-connection in RECN-Co mode MUST NOT support the ECN | | A TCP half-connection in RECN-Co mode MUST NOT support the ECN | |
| Nonce [RFC3540]. This means that the sending code of a re-ECN | | Nonce [RFC3540]. This means that the sending code of a re-ECN | |
| implementation will never need to include ECN Nonce support. Re-ECN | | implementation will never need to include ECN Nonce support. Re-ECN | |
| is intended to provide wider protection than the ECN nonce against | | is intended to provide wider protection than the ECN nonce against | |
| congestion control misbehaviour, and re-ECN only requires support | | congestion control misbehaviour, and re-ECN only requires support | |
| from the sender, therefore it is preferable to specifically rule out | | from the sender, therefore it is preferable to specifically rule out | |
| the need for dual sender implementations. As a consequence, a re-ECN | | the need for dual sender implementations. As a consequence, a re-ECN | |
| capable sender will never set ECT(0), so it will be easier for | | capable sender will never set ECT(0), so it will be easier for | |
| network elements to discriminate re-ECN traffic flows from other ECN | | network elements to discriminate re-ECN traffic flows from other ECN | |
| | | | |
| skipping to change at page 21, line 41 | | skipping to change at page 26, line 15 | |
| | | | |
| RFC3540 allows an ECN nonce sender to choose whether to sanction a | | RFC3540 allows an ECN nonce sender to choose whether to sanction a | |
| receiver that does not ever set the nonce sum. Given re-ECN is | | receiver that does not ever set the nonce sum. Given re-ECN is | |
| intended to provide wider protection than the ECN nonce against | | intended to provide wider protection than the ECN nonce against | |
| congestion control misbehaviour, implementers of re-ECN receivers MAY | | congestion control misbehaviour, implementers of re-ECN receivers MAY | |
| choose not to implement backwards compatibility with the ECN nonce | | choose not to implement backwards compatibility with the ECN nonce | |
| capability. This may be because they deem that the risk of sanctions | | capability. This may be because they deem that the risk of sanctions | |
| is low, perhaps because significant deployment of the ECN nonce seems | | is low, perhaps because significant deployment of the ECN nonce seems | |
| unlikely at implementation time. | | unlikely at implementation time. | |
| | | | |
|
| 4.1.3. Capability Negotiation | | 6.1.3. Capability Negotiation | |
| | | | |
| During the TCP hand-shake at the start of a connection, an originator | | During the TCP hand-shake at the start of a connection, an originator | |
| of the connection (host A) with a re-ECN-capable transport MUST | | of the connection (host A) with a re-ECN-capable transport MUST | |
| indicate it is Re-ECT by setting the TCP flags NS=1, CWR=1 and ECE=1 | | indicate it is Re-ECT by setting the TCP flags NS=1, CWR=1 and ECE=1 | |
| in the initial SYN. | | in the initial SYN. | |
| | | | |
| A responding Re-ECT host (host B) MUST return a SYN ACK with flags | | A responding Re-ECT host (host B) MUST return a SYN ACK with flags | |
| CWR=1 and ECE=0. The responding host MUST NOT set this combination | | CWR=1 and ECE=0. The responding host MUST NOT set this combination | |
| of flags unless the preceding SYN has already indicated Re-ECT | | of flags unless the preceding SYN has already indicated Re-ECT | |
| support as above. Normally a Re-ECT server (B) will reply to a Re- | | support as above. Normally a Re-ECT server (B) will reply to a Re- | |
| | | | |
| skipping to change at page 23, line 19 | | skipping to change at page 27, line 42 | |
| preceding SYN (because there is a broken RFC3168 compliant | | preceding SYN (because there is a broken RFC3168 compliant | |
| implementation that behaves this way), RFC3168 specifies that the | | implementation that behaves this way), RFC3168 specifies that the | |
| whole connection MUST revert to Not-ECT. | | whole connection MUST revert to Not-ECT. | |
| | | | |
| Also note that, whenever the SYN flag of a TCP segment is set | | Also note that, whenever the SYN flag of a TCP segment is set | |
| (including when the ACK flag is also set), the NS, CWR and ECE flags | | (including when the ACK flag is also set), the NS, CWR and ECE flags | |
| ( i.e the ECI field of the SYNACK) MUST NOT be interpreted as the | | ( i.e the ECI field of the SYNACK) MUST NOT be interpreted as the | |
| 3-bit ECI value, which is only set as a copy of the local ECC value | | 3-bit ECI value, which is only set as a copy of the local ECC value | |
| in non-SYN packets. | | in non-SYN packets. | |
| | | | |
|
| 4.1.4. Extended ECN (EECN) Field Settings during Flow Start or after | | 6.1.4. Extended ECN (EECN) Field Settings during Flow Start or after | |
| Idle Periods | | Idle Periods | |
| | | | |
| If the originator (A) of a TCP connection supports re-ECN it MUST set | | If the originator (A) of a TCP connection supports re-ECN it MUST set | |
| the extended ECN (EECN) field in the IP header of the initial SYN | | the extended ECN (EECN) field in the IP header of the initial SYN | |
| packet to the feedback not established (FNE) codepoint. | | packet to the feedback not established (FNE) codepoint. | |
| | | | |
| FNE is a new extended ECN codepoint defined by this specification | | FNE is a new extended ECN codepoint defined by this specification | |
|
| (Section 3.3). The feedback not established (FNE) codepoint is used | | (Section 4.2). The feedback not established (FNE) codepoint is used | |
| when the transport does not have the benefit of ECN feedback so it | | when the transport does not have the benefit of ECN feedback so it | |
| cannot decide whether to set or clear the RE flag. | | cannot decide whether to set or clear the RE flag. | |
| | | | |
| If after receiving a SYN the server B has set its sending half- | | If after receiving a SYN the server B has set its sending half- | |
| connection into RECN mode or RECN-Co mode, it MUST set the extended | | connection into RECN mode or RECN-Co mode, it MUST set the extended | |
| ECN field in the IP header of its SYN ACK to the feedback not | | ECN field in the IP header of its SYN ACK to the feedback not | |
| established (FNE) codepoint. Note the careful wording here, which | | established (FNE) codepoint. Note the careful wording here, which | |
| means that Re-ECT server B MUST set FNE on a SYN ACK whether it is | | means that Re-ECT server B MUST set FNE on a SYN ACK whether it is | |
| responding to a SYN from a Re-ECT client or from a client that is | | responding to a SYN from a Re-ECT client or from a client that is | |
| merely ECN-capable. This is because FNE indicates the transport is | | merely ECN-capable. This is because FNE indicates the transport is | |
| | | | |
| skipping to change at page 27, line 5 | | skipping to change at page 31, line 9 | |
| trip time. We use the lower bound of the retransmission timeout | | trip time. We use the lower bound of the retransmission timeout | |
| (RTO) [RFC2988], which is commonly used as the idle period before TCP | | (RTO) [RFC2988], which is commonly used as the idle period before TCP | |
| must reduce to the restart window [RFC2581]. Note our specification | | must reduce to the restart window [RFC2581]. Note our specification | |
| of re-ECN's idle period is NOT intended to change the idle period for | | of re-ECN's idle period is NOT intended to change the idle period for | |
| TCP's restart, nor indeed for any other purposes. | | TCP's restart, nor indeed for any other purposes. | |
| | | | |
| {ToDo: Describe how the sender falls back to RFC3168 modes if packets | | {ToDo: Describe how the sender falls back to RFC3168 modes if packets | |
| don't appear to be getting through (to work round firewalls | | don't appear to be getting through (to work round firewalls | |
| discarding packets they consider unusual).} | | discarding packets they consider unusual).} | |
| | | | |
|
| 4.1.5. Pure ACKS, Retransmissions, Window Probes and Partial ACKs | | 6.1.5. Pure ACKS, Retransmissions, Window Probes and Partial ACKs | |
| | | | |
| A re-ECN sender MUST clear the RE flag to "0" and set the ECN field | | A re-ECN sender MUST clear the RE flag to "0" and set the ECN field | |
| to Not-ECT in pure ACKs, retransmissions and window probes, as | | to Not-ECT in pure ACKs, retransmissions and window probes, as | |
| specified in [RFC3168]. Our eventual goal is for all packets to be | | specified in [RFC3168]. Our eventual goal is for all packets to be | |
| sent with re-ECN enabled, and we believe the semantics of the ECI | | sent with re-ECN enabled, and we believe the semantics of the ECI | |
| field go a long way towards being able to achieve this. However, we | | field go a long way towards being able to achieve this. However, we | |
| have not completed a full security analysis for these cases, | | have not completed a full security analysis for these cases, | |
| therefore, currently we merely re-state current practice. | | therefore, currently we merely re-state current practice. | |
| | | | |
| We must also reconcile the facts that congestion marking is applied | | We must also reconcile the facts that congestion marking is applied | |
| | | | |
| skipping to change at page 27, line 47 | | skipping to change at page 32, line 5 | |
| through the variable R. | | through the variable R. | |
| | | | |
| This does not ensure precisely the same number of octets have RE | | This does not ensure precisely the same number of octets have RE | |
| blanked as were CE marked. But we believe positive errors will | | blanked as were CE marked. But we believe positive errors will | |
| cancel negative over a long enough period. {ToDo: However, more | | cancel negative over a long enough period. {ToDo: However, more | |
| research is needed to prove whether this is so. If it is not, it may | | research is needed to prove whether this is so. If it is not, it may | |
| be necessary to increment and decrement R in octets rather than | | be necessary to increment and decrement R in octets rather than | |
| packets, by incrementing R as the product of D and the size in octets | | packets, by incrementing R as the product of D and the size in octets | |
| of packets being sent (typically the MSS).} | | of packets being sent (typically the MSS).} | |
| | | | |
|
| 4.2. Other Transports | | 6.2. Other Transports | |
| | | | |
|
| 4.2.1. General Guidelines for Adding Re-ECN to Other Transports | | 6.2.1. General Guidelines for Adding Re-ECN to Other Transports | |
| | | | |
| As a general rule, Re-ECT sender transports that have established the | | As a general rule, Re-ECT sender transports that have established the | |
| receiver transport is at least ECN-capable (not necessarily re-ECN | | receiver transport is at least ECN-capable (not necessarily re-ECN | |
| capable) MUST blank the RE codepoint for at least as many octets as | | capable) MUST blank the RE codepoint for at least as many octets as | |
| arrive at receiver with the CE codepoint set. Re-ECN-capable sender | | arrive at receiver with the CE codepoint set. Re-ECN-capable sender | |
| transports should always initialise the ECN field to the ECT(1) | | transports should always initialise the ECN field to the ECT(1) | |
| codepoint once a flow is established. | | codepoint once a flow is established. | |
| | | | |
| If the sender transport does not have sufficient feedback to even | | If the sender transport does not have sufficient feedback to even | |
| estimate the path's CE rate, it SHOULD set FNE continuously. If the | | estimate the path's CE rate, it SHOULD set FNE continuously. If the | |
| | | | |
| skipping to change at page 28, line 32 | | skipping to change at page 32, line 39 | |
| following: | | following: | |
| | | | |
| o UDP fire and forget (e.g. DNS) | | o UDP fire and forget (e.g. DNS) | |
| | | | |
| o UDP streaming with no feedback | | o UDP streaming with no feedback | |
| | | | |
| o UDP streaming with feedback | | o UDP streaming with feedback | |
| | | | |
| } | | } | |
| | | | |
|
| 4.2.2. Guidelines for adding Re-ECN to RSVP or NSIS | | 6.2.2. Guidelines for adding Re-ECN to RSVP or NSIS | |
| | | | |
| A separate I-D has been submitted [Re-PCN] describing how re-ECN can | | A separate I-D has been submitted [Re-PCN] describing how re-ECN can | |
| be used in an edge-to-edge rather than end-to-end scenario. It can | | be used in an edge-to-edge rather than end-to-end scenario. It can | |
| then be used by downstream networks to police whether upstream | | then be used by downstream networks to police whether upstream | |
| networks are blocking new flow reservations when downstream | | networks are blocking new flow reservations when downstream | |
| congestion is too high, even though the congestion is in other | | congestion is too high, even though the congestion is in other | |
| operators' downstream networks. This relates to current IETF work on | | operators' downstream networks. This relates to current IETF work on | |
| Admission Control over Diffserv using Pre-Congestion Notification | | Admission Control over Diffserv using Pre-Congestion Notification | |
| (PCN) [PCN-arch]. | | (PCN) [PCN-arch]. | |
| | | | |
|
| 4.2.3. Guidelines for adding Re-ECN to DCCP | | 6.2.3. Guidelines for adding Re-ECN to DCCP | |
| | | | |
| Beside adjusting the initial features negotiation sequence, operating | | Beside adjusting the initial features negotiation sequence, operating | |
| re-ECN in DCCP [RFC4340] could be achieved by defining a new option | | re-ECN in DCCP [RFC4340] could be achieved by defining a new option | |
| to be added to acknowledgments, that would include a multibit field | | to be added to acknowledgments, that would include a multibit field | |
| where the destination could copy its ECC. | | where the destination could copy its ECC. | |
| | | | |
|
| 4.2.4. Guidelines for adding Re-ECN to SCTP | | 6.2.4. Guidelines for adding Re-ECN to SCTP | |
| | | | |
| Appendix A in [RFC4960] gives the specifications for SCTP to support | | Appendix A in [RFC4960] gives the specifications for SCTP to support | |
| ECN. Similar steps should be taken to support re-ECN. Beside | | ECN. Similar steps should be taken to support re-ECN. Beside | |
| adjusting the initial features negotiation sequence, operating re-ECN | | adjusting the initial features negotiation sequence, operating re-ECN | |
| in SCTP could be achieved by defining a new control chunk, that would | | in SCTP could be achieved by defining a new control chunk, that would | |
| include a multibit field where the destination could copy its ECC | | include a multibit field where the destination could copy its ECC | |
| | | | |
|
| 5. Network Layer | | | |
| | | | |
| 5.1. Re-ECN IPv4 Wire Protocol | | | |
| | | | |
| The wire protocol of the ECN field in the IP header remains largely | | | |
| unchanged from [RFC3168]. However, an extension to the ECN field we | | | |
| call the RE (Re-ECN extension) flag (Section 3.3) is defined in this | | | |
| document. It doubles the extended ECN codepoint space, giving 8 | | | |
| potential codepoints. The semantics of the extra codepoints are | | | |
| backward compatible with the semantics of the 4 original codepoints | | | |
| [RFC3168] (Section 7.1 collects together and summarises all the | | | |
| changes defined in this document). | | | |
| | | | |
| For IPv4, this document proposes that the new RE control flag will be | | | |
| positioned where the `reserved' control flag was at bit 48 of the | | | |
| IPv4 header (counting from 0). Alternatively, some would call this | | | |
| bit 0 (counting from 0) of byte 7 (counting from 1) of the IPv4 | | | |
| header (Figure 5). | | | |
| | | | |
| 0 1 2 | | | |
| +---+---+---+ | | | |
| | R | D | M | | | | |
| | E | F | F | | | | |
| +---+---+---+ | | | |
| | | | |
| Figure 5: New Definition of the Re-ECN Extension (RE) Control Flag at | | | |
| the Start of Byte 7 of the IPv4 Header | | | |
| | | | |
| The semantics of the RE flag are described in outline in Section 3 | | | |
| and specified fully in Section 4. The RE flag is always considered | | | |
| in conjunction with the 2-bit ECN field, as if they were concatenated | | | |
| together to form a 3-bit extended ECN field. If the ECN field is set | | | |
| to either the ECT(1) or CE codepoint, when the RE flag is blanked | | | |
| (cleared to "0") it represents a re-echo of congestion experienced by | | | |
| an early packet. If the ECN field is set to the Not-ECT codepoint, | | | |
| when the RE flag is set to "1" it represents the feedback not | | | |
| established (FNE) codepoint, which signals that the packet was sent | | | |
| without the benefit of congestion feedback. | | | |
| | | | |
| It is believed that the FNE codepoint can simultaneously serve other | | | |
| purposes, particularly where the start of a flow needs distinguishing | | | |
| from packets later in the flow. For instance it would have been | | | |
| useful to identify new flows for tag switching and might enable | | | |
| similar developments in the future if it were adopted. It is similar | | | |
| to the state set-up bit idea designed to protect against memory | | | |
| exhaustion attacks. This idea was proposed informally by David Clark | | | |
| and documented by Handley and Greenhalgh [Steps_DoS]. The FNE | | | |
| codepoint can be thought of as a `soft-state set-up flag', because it | | | |
| is idempotent (i.e. one occurrence of the flag is sufficient but | | | |
| further occurrences achieve the same effect if previous ones were | | | |
| lost). | | | |
| | | | |
| We are sure there will probably be other claims pending on the use of | | | |
| bit 48. We know of at least two [ARI05], [RFC3514] but neither have | | | |
| been pursued in the IETF, so far, although the present proposal would | | | |
| meet the needs of the former. | | | |
| | | | |
| The security flag proposal (commonly known as the evil bit) was | | | |
| published on 1 April 2003 as Informational RFC 3514, but it was not | | | |
| adopted due to confusion over whether evil-doers might set it | | | |
| inappropriately. The present proposal is backward compatible with | | | |
| RFC3514 because if re-ECN compliant senders were benign they would | | | |
| correctly clear the evil bit to honestly declare that they had just | | | |
| received congestion feedback. Whereas evil-doers would hide | | | |
| congestion feedback by setting the evil bit continuously, or at least | | | |
| more often than they should. So, evil senders can be identified, | | | |
| because they declare that they are good less often than they should. | | | |
| | | | |
| 5.2. Re-ECN IPv6 Wire Protocol | | | |
| | | | |
| For IPv6, this document proposes that the new RE control flag will be | | | |
| positioned as the first bit of the option field of a new Congestion | | | |
| hop by hop option header (Figure 6). | | | |
| | | | |
| 0 1 2 3 | | | |
| 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | | | |
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | |
| | Next Header | Hdr ext Len | Option Type | Opt Length =4 | | | | |
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | |
| |R| Reserved for future use | | | | |
| |E| | | | | |
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | |
| | | | |
| Figure 6: Definition of a New IPv6 Congestion Hop by Hop Option | | | |
| Header containing the re-ECN Extension (RE) Control Flag | | | |
| 0 1 2 3 4 5 6 7 8 | | | |
| +-+-+-+-+-+-+-+-+- | | | |
| |AIU|C|Option ID| | | | |
| +-+-+-+-+-+-+-+-+- | | | |
| | | | |
| Figure 7: Congestion Hop by Hop Option Type Encoding | | | |
| | | | |
| The Hop-by-Hop Options header enables packets to carry information to | | | |
| be examined and processed by routers or nodes along the packet's | | | |
| delivery path, including the source and destination nodes. For re- | | | |
| ECN, the two bits of the Action If Unrecognized (AIU) flag of the | | | |
| Congestion extension header MUST be set to "00" meaning if | | | |
| unrecognized `skip over option and continue processing the header'. | | | |
| Then, any routers or a receiver not upgraded with the optional re-ECN | | | |
| features described in this memo will simply ignore this header. But | | | |
| routers with these optional re-ECN features or a re-ECN policing | | | |
| function, will process this Congestion extension header. | | | |
| | | | |
| The `C' flag MUST be set to "1" to specify that the Option Data | | | |
| (currently only the RE control flag) can change en-route to the | | | |
| packet's final destination. This ensures that, when an | | | |
| Authentication header (AH [RFC4302]) is present in the packet, for | | | |
| any option whose data may change en-route, its entire Option Data | | | |
| field will be treated as zero-valued octets when computing or | | | |
| verifying the packet's authenticating value. | | | |
| | | | |
| Although the RE control flag should not be changed along the path, we | | | |
| expect that the rest of this option field that is currently `Reserved | | | |
| for future use' could be used for a multi-bit congestion notification | | | |
| field which we would expect to change en route. As the RE flag does | | | |
| not need end-to-end authentication, we set the C flag to '1'. | | | |
| | | | |
| {ToDo: A Congestion Hop by Hop Option ID will need to be registered | | | |
| with IANA.} | | | |
| | | | |
| 5.3. Router Forwarding Behaviour | | | |
| | | | |
| Re-ECN works well without modifying the forwarding behaviour of any | | | |
| routers. However, below, two OPTIONAL changes to forwarding | | | |
| behaviour are defined which respectively enhance performance and | | | |
| improve a router's discrimination against flooding attacks. They are | | | |
| both OPTIONAL additions that we propose MAY apply by default to all | | | |
| Diffserv per-hop scheduling behaviours (PHBs) [RFC2475] and ECN | | | |
| marking behaviours [RFC3168]. Specifications for PHBs MAY define | | | |
| different forwarding behaviours from this default, but this is not | | | |
| required. [Re-PCN] is one example. | | | |
| | | | |
| FNE indicates ECT: | | | |
| | | | |
| The FNE codepoint tells a router to assume that the packet was | | | |
| sent by an ECN-capable transport (see Section 5.4). Therefore an | | | |
| FNE packet MAY be marked rather than dropped. Note that the FNE | | | |
| codepoint has been intentionally chosen so that, to RFC3168 | | | |
| compliant routers (which do not inspect the RE flag) an FNE packet | | | |
| appears to be Not-ECT so it will be dropped by legacy AQM | | | |
| algorithms. | | | |
| | | | |
| A network operator MUST NOT configure a queue to ECN mark rather | | | |
| than drop FNE packets unless it can guarantee that FNE packets | | | |
| will be rate limited, either locally or upstream. The ingress | | | |
| policers discussed in Section 6.1.5 would count as rate limiters | | | |
| for this purpose. | | | |
| | | | |
| Preferential Drop: If a re-ECN capable router queue experiences very | | | |
| high load so that it has to drop arriving packets (e.g. a DoS | | | |
| attack), it MAY preferentially drop packets within the same | | | |
| Diffserv PHB using the preference order for extended ECN | | | |
| codepoints given in Table 7. Preferential dropping can be | | | |
| difficult to implement on some hardware, but if feasible it would | | | |
| discriminate against attack traffic if done as part of the overall | | | |
| policing framework of Section 6.1.3. If nowhere else, routers at | | | |
| the egress of a network SHOULD implement preferential drop | | | |
| (stronger than the MAY above). For simplicity, preferences 4 & 5 | | | |
| MAY be merged into one preference level. | | | |
| | | | |
| +-------+-----+------------+-------+------------+-------------------+ | | | |
| | ECN | RE | Extended | Worth | Drop Pref | Re-ECN meaning | | | | |
| | field | bit | ECN | | (1 = drop | | | | | |
| | | | codepoint | | 1st) | | | | | |
| +-------+-----+------------+-------+------------+-------------------+ | | | |
| | 01 | 0 | Re-Echo | +1 | 5/4 | Re-echoed | | | | |
| | | | | | | congestion and | | | | |
| | | | | | | RECT | | | | |
| | 00 | 1 | FNE | +1 | 4 | Feedback not | | | | |
| | | | | | | established | | | | |
| | 11 | 0 | CE(0) | 0 | 3 | Re-Echo canceled | | | | |
| | | | | | | by congestion | | | | |
| | | | | | | experienced | | | | |
| | 01 | 1 | RECT | 0 | 3 | Re-ECN capable | | | | |
| | | | | | | transport | | | | |
| | 11 | 1 | CE(-1) | -1 | 3 | Congestion | | | | |
| | | | | | | experienced | | | | |
| | 10 | 1 | --CU-- | n/a | 2 | Currently Unused | | | | |
| | 10 | 0 | --- | n/a | 2 | RFC3168 ECN use | | | | |
| | | | | | | only | | | | |
| | 00 | 0 | Not-RECT | n/a | 1 | Not | | | | |
| | | | | | | Re-ECN-capable | | | | |
| | | | | | | transport | | | | |
| +-------+-----+------------+-------+------------+-------------------+ | | | |
| | | | |
| Table 7: Drop Preference of EECN Codepoints (Sorted by `Worth') | | | |
| | | | |
| The above drop preferences are arranged to preserve packets with | | | |
| more positive worth (Section 3.5), given senders of positive | | | |
| packets must have honestly declared downstream congestion. This | | | |
| is explained fully in Section 6 on applications, particularly when | | | |
| the application of re-ECN to protect against DDoS attacks is | | | |
| described. | | | |
| | | | |
| 5.4. Justification for Setting the First SYN to FNE | | | |
| | | | |
| the initial SYN MUST be set to FNE by Re-ECT client A (Section 4.1.4) | | | |
| and (Section 5.3) says a queue MAY optionally treat an FNE packet as | | | |
| ECN capable, so an initial SYN may be marked CE(-1) rather than | | | |
| dropped. This seems dangerous, because the sender has not yet | | | |
| established whether the receiver is a RFC3168 one that does not | | | |
| understand congestion marking. It also seems to allow malicious | | | |
| senders to take advantage of ECN marking to avoid so much drop when | | | |
| launching SYN flooding attacks. Below we explain the features of the | | | |
| protocol design that remove both these dangers. | | | |
| | | | |
| ECN-capable initial SYN with a Not-ECT server: If the TCP server B | | | |
| is re-ECN capable, provision is made for it to feedback a possible | | | |
| congestion marked SYN in the SYN ACK (Section 4.1.4). But if the | | | |
| TCP client A finds out from the SYN ACK that the server was not | | | |
| ECN-capable, the TCP client MUST conservatively consider the first | | | |
| SYN as congestion marked before setting itself into Not-ECT mode. | | | |
| Section 4.1.4 mandates that such a TCP client MUST also set its | | | |
| initial window to 1 segment. In this way we remove the need to | | | |
| cautiously avoid setting the first SYN to Not-RECT. This will | | | |
| give worse performance while deployment is patchy, but better | | | |
| performance once deployment is widespread. | | | |
| | | | |
| SYN flooding attacks can't exploit ECN-capability: Malicious hosts | | | |
| may think they can use the advantage that ECN-marking gives over | | | |
| drop in launching classic SYN-flood attacks. But Section 5.3 | | | |
| mandates that a router MUST only be configured to treat packets | | | |
| with the FNE codepoint as ECN-capable if FNE packets are rate | | | |
| limited somewhere. Introduction of the FNE codepoint was a | | | |
| deliberate move to enable transport-neutral handling of flow-start | | | |
| and flow state set-up in the IP layer where it belongs. It then | | | |
| becomes possible to protect against flooding attacks of all forms | | | |
| (not just SYN flooding) without transport-specific inspection for | | | |
| things like the SYN flag in TCP headers. Then, for instance, SYN | | | |
| flooding attacks using IPSec ESP encryption can also be rate | | | |
| limited at the IP layer. | | | |
| | | | |
| It might seem pedantic going to all this trouble to enable ECN on the | | | |
| initial packet of a flow, but it is motivated by a much wider concern | | | |
| to ensure safe congestion control will still be possible even if the | | | |
| application mix evolves to the point where the majority of flows | | | |
| consist of a single window or even a single packet. It also allows | | | |
| denial of service attacks to be more easily isolated and prevented. | | | |
| | | | |
| 5.5. Control and Management | | | |
| | | | |
| 5.5.1. Negative Balance Warning | | | |
| | | | |
| A new ICMP message type is being considered so that a dropper can | | | |
| warn the apparent sender of a flow that it has started to sanction | | | |
| the flow. The message would have similar semantics to the `Time | | | |
| exceeded' ICMP message type. To ensure the sender has to invest some | | | |
| work before the network will generate such a message, a dropper | | | |
| SHOULD only send such a message for flows that have demonstrated that | | | |
| they have started correctly by establishing a positive record, but | | | |
| have later gone negative. The threshold is up to the implementation. | | | |
| The purpose of the message is to deconfuse the cause of drops from | | | |
| other causes, such as congestion or transmission losses. The dropper | | | |
| would send the message to the sender of the flow, not the receiver. | | | |
| | | | |
| If we did define this message type, it would be REQUIRED for all re- | | | |
| ECT senders to parse and understand it. Note that a sender MUST only | | | |
| use this message to explain why losses are occurring. A sender MUST | | | |
| NOT take this message to mean that losses have occurred that it was | | | |
| not aware of. Otherwise, spoof messages could be sent by malicious | | | |
| sources to slow down a sender (c.f. ICMP source quench). | | | |
| | | | |
| However, the need for this message type is not yet confirmed, as we | | | |
| are considering how to prevent it being used by malicious senders to | | | |
| scan for droppers and to test their threshold settings. {ToDo: | | | |
| Complete this section.} | | | |
| | | | |
| 5.5.2. Rate Response Control | | | |
| | | | |
| As discussed in Section 6.1.5 the sender's access operator will be | | | |
| expected to use bulk per-user policing, but they might choose to | | | |
| introduce a per-flow policer. In cases where operators do introduce | | | |
| per-flow policing, there may be a need for a sender to send a request | | | |
| to the ingress policer asking for permission to apply a non-default | | | |
| response to congestion (where TCP-friendly is assumed to be the | | | |
| default). This would require the sender to know what message | | | |
| format(s) to use and to be able to discover how to address the | | | |
| policer. The required control protocol(s) are outside the scope of | | | |
| this document, but will require definition elsewhere. | | | |
| | | | |
| The policer is likely to be local to the sender and inline, probably | | | |
| at the ingress interface to the internetwork. So, discovery should | | | |
| not be hard. A variety of control protocols already exist for some | | | |
| widely used rate-responses to congestion. For instance DCCP | | | |
| congestion control identifiers (CCIDs [RFC4340]) fulfil this role and | | | |
| so does QoS signalling (e.g. and RSVP request for controlled load | | | |
| service is equivalent to a request for no rate response to | | | |
| congestion, but with admission control). | | | |
| | | | |
| 5.6. IP in IP Tunnels | | | |
| | | | |
| For re-ECN to work correctly through IP in IP tunnels, it needs | | | |
| slightly different tunnel handling to regular ECN [RFC3168]. | | | |
| Currently there is some incosistency between how the handling of IP | | | |
| in IP tunnels is defined in [RFC3168] and how it is defined in | | | |
| [RFC4301], but re-ECN would work fine with the IPsec behaviour. This | | | |
| inconsistency is addressed in a new Internet Draft [ECN-tunnel] that | | | |
| proposes to update RFC3168 tunnel behaviour to bring it into line | | | |
| with IPsec. Ideally, for re-ECN to work through a tunnel, the tunnel | | | |
| entry should copy both the RE flag and the ECN field from the inner | | | |
| to the outer IP header. Then at the tunnel exit, any congestion | | | |
| marking of the outer ECN field should overwrite the inner ECN field | | | |
| (unless the inner field is Not-ECT in which case an alarm should be | | | |
| raised). The RE flag shouldn't change along a path, so the outer RE | | | |
| flag should be the same as the inner. If it isn't a management alarm | | | |
| should be raised. This behaviour is the same as the full- | | | |
| functionality variant of [RFC3168] at tunnel exit, but different at | | | |
| tunnel entry. | | | |
| | | | |
| If tunnels are left as they are specified in [RFC3168], whether the | | | |
| limited or full-functionality variants are used, a problem arises | | | |
| with re-ECN if a tunnel crosses an inter-domain boundary, because the | | | |
| difference between positive and negative markings will not be | | | |
| correctly accounted for. In a limited functionality ECN tunnel, the | | | |
| flow will appear to be RFC3168 compliant traffic, and therefore may | | | |
| be wrongly rate limited. In a full-functionality ECN tunnel, the | | | |
| result will depend whether the tunnel entry copies the inner RE flag | | | |
| to the outer header or the RE flag in the outer header is always | | | |
| cleared. If the former, the flow will tend to be too positive when | | | |
| accounted for at borders. If the latter, it will be too negative. | | | |
| If the rules set out in [ECN-tunnel] are followed then this will not | | | |
| be an issue. | | | |
| | | | |
| 5.7. Non-Issues | | | |
| | | | |
| The following issues might seem to cause unfavourable interactions | | | |
| with re-ECN, but we will explain why they don't: | | | |
| | | | |
| o Various link layers support explicit congestion notification, such | | | |
| as Frame Relay and ATM. Explicit congestion notification is | | | |
| proposed to be added to other link layers, such as Ethernet | | | |
| (802.3ar Ethernet congestion management) and MPLS [RFC5129]; | | | |
| | | | |
| o Encryption and IPSec. | | | |
| | | | |
| In the case of congestion notification at the link layer, each | | | |
| particular link layer scheme either manages congestion on the link | | | |
| with its own link-level feedback (the usual arrangement in the cases | | | |
| of ATM and Frame Relay), or congestion notification from the link | | | |
| layer is merged into congestion notification at the IP level when the | | | |
| frame headers are decapsulated at the end of the link (the | | | |
| recommended arrangement in the Ethernet and MPLS cases). Given the | | | |
| RE flag is not intended to change along the path, this means that | | | |
| downstream congestion will still be measureable at any point where IP | | | |
| is processed on the path by subtracting positive from negative | | | |
| markings. | | | |
| | | | |
| In the case of encryption, as long as the tunnel issues described in | | | |
| Section 5.6 are dealt with, payload encryption itself will not be a | | | |
| problem. The design goal of re-ECN is to include downstream | | | |
| congestion in the IP header so that it is not necessary to bury into | | | |
| inner headers. Obfuscation of flow identifiers is not a problem for | | | |
| re-ECN policing elements. Re-ECN doesn't ever require flow | | | |
| identifiers to be valid, it only requires them to be unique. So if | | | |
| an IPSec encapsulating security payload (ESP [RFC4305]) or an | | | |
| authentication header (AH [RFC4302]) is used, the security parameters | | | |
| index (SPI) will be a sufficient flow identifier, as it is intended | | | |
| to be unique to a flow without revealing actual port numbers. | | | |
| | | | |
| In general, even if endpoints use some locally agreed scheme to hide | | | |
| port numbers, re-ECN policing elements can just consider the pair of | | | |
| source and destination IP addresses as the flow identifier. Re-ECN | | | |
| encourages endpoints to at least tell the network layer that a | | | |
| sequence of packets are all part of the same flow, if indeed they | | | |
| are. The alternative would be for the sender to make each packet | | | |
| appear to be a new flow, which would require them all to be marked | | | |
| FNE in order to avoid being treated with the bulk of malicious flows | | | |
| at the egress dropper. Given the FNE marking is worth +1 and | | | |
| networks are likely to rate limit FNE packets, endpoints are given an | | | |
| incentive not to set FNE on each packet. But if the sender really | | | |
| does want to hide the flow relationship between packets it can choose | | | |
| to pay the cost of multiple FNE packets, which in the long run will | | | |
| compensate for the extra memory required on network policing elements | | | |
| to process each flow. | | | |
| | | | |
| 6. Applications | | | |
| | | | |
| 6.1. Policing Congestion Response | | | |
| | | | |
| 6.1.1. The Policing Problem | | | |
| | | | |
| The current Internet architecture trusts hosts to respond voluntarily | | | |
| to congestion. Limited evidence shows that the large majority of | | | |
| end-points on the Internet comply with a TCP-friendly response to | | | |
| congestion. But telephony (and increasingly video) services over the | | | |
| best effort Internet are attracting the interest of major commercial | | | |
| operations. Most of these applications do not respond to congestion | | | |
| at all. Those that can switch to lower rate codecs, still have a | | | |
| lower bound below which they must become unresponsive to congestion. | | | |
| | | | |
| Of course, the Internet is intended to support many different | | | |
| application behaviours. But the problem is that this freedom can be | | | |
| exercised irresponsibly. The greater problem is that we will never | | | |
| be able to agree on where the boundary is between responsible and | | | |
| irresponsible. Therefore re-ECN is designed to allow different | | | |
| networks to set their own view of the limit to irresponsibility, and | | | |
| to allow networks that choose a more conservative limit to push back | | | |
| against congestion caused in more liberal networks. | | | |
| | | | |
| As an example of the impossibility of setting a standard for | | | |
| fairness, mandating TCP-friendliness would set the bar too high for | | | |
| unresponsive streaming media, but still some would say the bar was | | | |
| too low. Even though all known peer-to-peer filesharing applications | | | |
| are TCP-compatible, they can cause a disproportionate amount of | | | |
| congestion, simply by using multiple flows and by transferring data | | | |
| continuously relative to other short-lived sessions. On the other | | | |
| hand, if we swung the other way and set the bar low enough to allow | | | |
| streaming media to be unresponsive, we would also allow denial of | | | |
| service attacks, which are typically unresponsive to congestion and | | | |
| consist of multiple continuous flows. | | | |
| | | | |
| Applications that need (or choose) to be unresponsive to congestion | | | |
| can effectively take (some would say steal) whatever share of | | | |
| bottleneck resources they want from responsive flows. Whether or not | | | |
| such free-riding is common, inability to prevent it increases the | | | |
| risk of poor returns for investors in network infrastructure, leading | | | |
| to under-investment. An increasing proportion of unresponsive or | | | |
| free-riding demand coupled with persistent under-supply is a broken | | | |
| economic cycle. Therefore, if the current, largely co-operative | | | |
| consensus continues to erode, congestion collapse could become more | | | |
| common in more areas of the Internet [RFC3714]. | | | |
| | | | |
| While we have designed re-ECN so that networks can choose to deploy | | | |
| stringent policing, this does not imply we advocate that every | | | |
| network should introduce tight controls on those that cause | | | |
| congestion. Re-ECN has been specifically designed to allow different | | | |
| networks to choose how conservative or liberal they wish to be with | | | |
| respect to policing congestion. But those that choose to be | | | |
| conservative can protect themselves from the excesses that liberal | | | |
| networks allow their users. | | | |
| | | | |
| 6.1.2. The Case Against Bottleneck Policing | | | |
| | | | |
| The state of the art in rate policing is the bottleneck policer, | | | |
| which is intended to be deployed at any forwarding resource that may | | | |
| become congested. Its aim is to detect flows that cause | | | |
| significantly more local congestion than others. Although operators | | | |
| might solve their immediate problems by deploying bottleneck | | | |
| policers, we are concerned that widespread deployment would make it | | | |
| extremely hard to evolve new application behaviours. We believe the | | | |
| IETF should offer re-ECN as the preferred protocol on which to base | | | |
| solutions to the policing problems of operators, because it would not | | | |
| harm evolvability and, frankly, it would be far more effective (see | | | |
| later for why). | | | |
| | | | |
| Approaches like [XCHOKe] & [pBox] are nice approaches for rate | | | |
| policing traffic without the benefit of whole path information (such | | | |
| as could be provided by re-ECN). But they must be deployed at | | | |
| bottlenecks in order to work. Unfortunately, a large proportion of | | | |
| traffic traverses at least two bottlenecks (in two access networks), | | | |
| particularly with the current traffic mix where peer-to-peer file- | | | |
| sharing is prevalent. If ECN were deployed, we believe it would be | | | |
| likely that these bottleneck policers would be adapted to combine ECN | | | |
| congestion marking from the upstream path with local congestion | | | |
| knowledge. But then the only useful placement for such policers | | | |
| would be close to the egress of the internetwork. | | | |
| | | | |
| But then, if these bottleneck policers were widely deployed (which | | | |
| would require them to be more effective than they are now), the | | | |
| Internet would find itself with one universal rate adaptation policy | | | |
| (probably TCP-friendliness) embedded throughout the network. Given | | | |
| TCP's congestion control algorithm is already known to be hitting its | | | |
| scalability limits and new algorithms are being developed for high- | | | |
| speed congestion control, embedding TCP policing into the Internet | | | |
| would make evolution to new algorithms extremely painful. If a | | | |
| source wanted to use a different algorithm, it would have to first | | | |
| discover then negotiate with all the policers on its path, | | | |
| particularly those in the far access network. The IETF has already | | | |
| traveled that path with the Intserv architecture and found it | | | |
| constrains scalability [RFC2208]. | | | |
| | | | |
| Anyway, if bottleneck policers were ever widely deployed, they would | | | |
| be likely to be bypassed by determined attackers. They inherently | | | |
| have to police fairness per flow or per source-destination pair. | | | |
| Therefore they can easily be circumvented either by opening multiple | | | |
| flows (by varying the end-point port number); or by spoofing the | | | |
| source address but arranging with the receiver to hide the true | | | |
| return address at a higher layer. | | | |
| | | | |
| 6.1.3. Re-ECN Incentive Framework | | | |
| | | | |
| The aim is to create an incentive environment that ensures optimal | | | |
| sharing of capacity despite everyone acting selfishly (including | | | |
| lying and cheating). Of course, the mechanisms put in place for this | | | |
| can lie dormant wherever co-operation is the norm. | | | |
| | | | |
| Throughout this document we focus on path congestion. But some forms | | | |
| of fairness, particularly TCP's, also depend on round trip time. If | | | |
| TCP-fairness is required, we also propose to measure downstream path | | | |
| delay using re-feedback. We give a simple outline of how this could | | | |
| work in Appendix F. However, we do not expect this to be necessary, | | | |
| as researchers tend to agree that only congestion control dynamics | | | |
| need to depend on RTT, not the rate that the algorithm would converge | | | |
| on after a period of stability. | | | |
| | | | |
| Figure 8 sketches the incentive framework that we will describe piece | | | |
| by piece throughout this section. We will do a first pass in | | | |
| overview, then return to each piece in detail. We re-use the earlier | | | |
| example of how downstream congestion is derived by subtracting | | | |
| upstream congestion from path congestion (Figure 2) but depict | | | |
| multiple trust boundaries to turn it into an internetwork. For | | | |
| clarity, only downstream congestion is shown (the difference between | | | |
| the two earlier plots). The graph displays downstream path | | | |
| congestion seen in a typical flow as it traverses an example path | | | |
| from sender S to receiver R, across networks N1, N2 & N3. Everyone | | | |
| is shown using re-ECN correctly, but we intend to show why everyone | | | |
| would /choose/ to use it correctly, and honestly. | | | |
| | | | |
| Three main types of self-interest can be identified: | | | |
| | | | |
| o Users want to transmit data across the network as fast as | | | |
| possible, paying as little as possible for the privilege. In this | | | |
| respect, there is no distinction between senders and receivers, | | | |
| but we must be wary of potential malice by one on the other; | | | |
| | | | |
| o Network operators want to maximise revenues from the resources | | | |
| they invest in. They compete amongst themselves for the custom of | | | |
| users. | | | |
| | | | |
| o Attackers (whether users or networks) want to use any opportunity | | | |
| to subvert the new re-ECN system for their own gain or to damage | | | |
| the service of their victims, whether targeted or random. | | | |
| | | | |
| policer dropper | | | |
| | | | | | |
| | | | | | |
| S <-----N1----> <---N2---> <---N3--> R domain | | | |
| | | | |
| | | | | |
| 3% |---------+ | | | |
| | | | | | |
| 2% | +-----------------------+ | | | |
| | downstream congestion | | | | |
| 1% | | | | | |
| | | | | | |
| 0% +---------------------------------+====== | | | |
| 0 i | | | |
| | | | |
| Figure 8: Incentive Framework, showing creation of opposing pressures | | | |
| to under-declare and over-declare downstream congestion, using a | | | |
| policer and a dropper | | | |
| | | | |
| Source congestion control: We want to ensure that the sender will | | | |
| throttle its rate as downstream congestion increases. Whatever | | | |
| the agreed congestion response (whether TCP-compatible or some | | | |
| enhanced QoS), to some extent it will always be against the | | | |
| sender's interest to comply. | | | |
| | | | |
| Ingress policing: But it is in all the network operators' interests | | | |
| to encourage fair congestion response, so that their investments | | | |
| are employed to satisfy the most valuable demand. The re-ECN | | | |
| protocol ensures packets carry the necessary information about | | | |
| their own expected downstream congestion so that N1 can deploy a | | | |
| policer at its ingress to check that S1 is complying with whatever | | | |
| congestion control it should be using (Section 6.1.5). If N1 is | | | |
| extremely conservative it could police each flow, but it is likely | | | |
| to just police the bulk amount of congestion each customer causes | | | |
| without regard to flows, or if it is extremely liberal it need not | | | |
| police congestion control at all. Whatever, it is always | | | |
| preferable to police traffic at the very first ingress into an | | | |
| internetwork, before non-compliant traffic can cause any damage. | | | |
| | | | |
| Edge egress dropper: If the policer ensures the source has less | | | |
| right to a high rate the higher it declares downstream congestion, | | | |
| the source has a clear incentive to understate downstream | | | |
| congestion. But, if flows of packets are understated when they | | | |
| enter the internetwork, they will have become negative by the time | | | |
| they leave. So, we introduce a dropper at the last network | | | |
| egress, which drops packets in flows that persistently declare | | | |
| negative downstream congestion (see Section 6.1.4 for details). | | | |
| | | | |
| Inter-domain traffic policing: But next we must ask, if congestion | | | |
| arises downstream (say in N3), what is the ingress network's | | | |
| (N1's) incentive to police its customers' response? If N1 turns a | | | |
| blind eye, its own customers benefit while other networks suffer. | | | |
| This is why all inter-domain QoS architectures (e.g. Intserv, | | | |
| Diffserv) police traffic each time it crosses a trust boundary. | | | |
| We have already shown that re-ECN gives a trustworthy measure of | | | |
| the expected downstream congestion that a flow will cause by | | | |
| subtracting negative volume from positive at any intermediate | | | |
| point on a path. N3 (say) can use this measure to police all the | | | |
| responses to congestion of all the sources beyond its upstream | | | |
| neighbour (N2), but in bulk with one very simple passive | | | |
| mechanism, rather than per flow, as we will now explain. | | | |
| | | | |
| Emulating policing with inter-domain congestion penalties: Between | | | |
| high-speed networks, we would rather avoid per-flow policing, and | | | |
| we would rather avoid holding back traffic while it is policed. | | | |
| Instead, once re-ECN has arranged headers to carry downstream | | | |
| congestion honestly, N2 can contract to pay N3 penalties in | | | |
| proportion to a single bulk count of the congestion metrics | | | |
| crossing their mutual trust boundary (Section 6.1.6). In this | | | |
| way, N3 puts pressure on N2 to suppress downstream congestion, for | | | |
| every flow passing through the border interface, even though they | | | |
| will all start and end in different places, and even though they | | | |
| may all be allowed different responses to congestion. The figure | | | |
| depicts this downward pressure on N2 by the solid downward arrow | | | |
| at the egress of N2. Then N2 has an incentive either to police | | | |
| the congestion response of its own ingress traffic (from N1) or to | | | |
| emulate policing by applying penalties to N1 in turn on the basis | | | |
| of congestion counted at their mutual boundary. In this recursive | | | |
| way, the incentives for each flow to respond correctly to | | | |
| congestion trace back with each flow precisely to each source, | | | |
| despite the mechanism not recognising flows (see Section 6.2.2). | | | |
| | | | |
| Inter-domain congestion charging diversity: Any two networks are | | | |
| free to agree any of a range of penalty regimes between themselves | | | |
| but they would only provide the right incentives if they were | | | |
| within the following reasonable constraints. N2 should expect to | | | |
| have to pay penalties to N3 where penalties monotonically increase | | | |
| with the volume of congestion and negative penalties are not | | | |
| allowed. For instance, they may agree an SLA with tiered | | | |
| congestion thresholds, where higher penalties apply the higher the | | | |
| threshold that is broken. But the most obvious (and useful) form | | | |
| of penalty is where N3 levies a charge on N2 proportional to the | | | |
| volume of downstream congestion N2 dumps into N3. In the | | | |
| explanation that follows, we assume this specific variant of | | | |
| volume charging between networks - charging proportionate to the | | | |
| volume of congestion. | | | |
| | | | |
| We must make clear that we are not advocating that everyone should | | | |
| use this form of contract. We are well aware that the IETF tries | | | |
| to avoid standardising technology that depends on a particular | | | |
| business model. And we strongly share this desire to encourage | | | |
| diversity. But our aim is merely to show that border policing can | | | |
| at least work with this one model, then we can assume that | | | |