| draft-briscoe-tsvwg-re-ecn-tcp-02.txt | draft-briscoe-tsvwg-re-ecn-tcp-03.txt | |||
|---|---|---|---|---|
| Transport Area Working Group B. Briscoe | Transport Area Working Group B. Briscoe | |||
| Internet-Draft BT & UCL | Internet-Draft BT & UCL | |||
| Expires: December 28, 2006 A. Jacquet | Intended status: Informational A. Jacquet | |||
| A. Salvatori | Expires: April 26, 2007 A. Salvatori | |||
| M. Koyabe | M. Koyabe | |||
| BT | BT | |||
| June 26, 2006 | October 23, 2006 | |||
| Re-ECN: Adding Accountability for Causing Congestion to TCP/IP | Re-ECN: Adding Accountability for Causing Congestion to TCP/IP | |||
| draft-briscoe-tsvwg-re-ecn-tcp-02 | draft-briscoe-tsvwg-re-ecn-tcp-03 | |||
| Status of this Memo | Status of this Memo | |||
| By submitting this Internet-Draft, each author represents that any | By submitting this Internet-Draft, each author represents that any | |||
| applicable patent or other IPR claims of which he or she is aware | applicable patent or other IPR claims of which he or she is aware | |||
| have been or will be disclosed, and any of which he or she becomes | have been or will be disclosed, and any of which he or she becomes | |||
| aware will be disclosed, in accordance with Section 6 of BCP 79. | aware will be disclosed, in accordance with Section 6 of BCP 79. | |||
| Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
| Task Force (IETF), its areas, and its working groups. Note that | Task Force (IETF), its areas, and its working groups. Note that | |||
| skipping to change at page 1, line 37 | skipping to change at page 1, line 37 | |||
| and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
| time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
| material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
| The list of current Internet-Drafts can be accessed at | The list of current Internet-Drafts can be accessed at | |||
| http://www.ietf.org/ietf/1id-abstracts.txt. | http://www.ietf.org/ietf/1id-abstracts.txt. | |||
| The list of Internet-Draft Shadow Directories can be accessed at | The list of Internet-Draft Shadow Directories can be accessed at | |||
| http://www.ietf.org/shadow.html. | http://www.ietf.org/shadow.html. | |||
| This Internet-Draft will expire on December 28, 2006. | This Internet-Draft will expire on April 26, 2007. | |||
| Copyright Notice | Copyright Notice | |||
| Copyright (C) The Internet Society (2006). | Copyright (C) The Internet Society (2006). | |||
| Abstract | Abstract | |||
| This document introduces a new protocol for explicit congestion | This document introduces a new protocol for explicit congestion | |||
| notification (ECN), termed re-ECN, which can be deployed | notification (ECN), termed re-ECN, which can be deployed | |||
| incrementally around unmodified routers. The protocol arranges an | incrementally around unmodified routers. The protocol arranges an | |||
| skipping to change at page 2, line 27 | skipping to change at page 2, line 27 | |||
| honestly. | honestly. | |||
| Authors' Statement: Status (to be removed by the RFC Editor) | Authors' Statement: Status (to be removed by the RFC Editor) | |||
| This document is posted as an Internet-Draft with the intent (at | This document is posted as an Internet-Draft with the intent (at | |||
| least that of the authors) to eventually progress to standards track. | least that of the authors) to eventually progress to standards track. | |||
| Although the re-ECN protocol is intended to make a simple but far- | Although the re-ECN protocol is intended to make a simple but far- | |||
| reaching change to the Internet architecture, the most immediate | reaching change to the Internet architecture, the most immediate | |||
| priority for the authors is to delay any move of the ECN nonce to | priority for the authors is to delay any move of the ECN nonce to | |||
| Proposed Standard status. | Proposed Standard status. The argument for this position is | |||
| developed in Appendix I. | ||||
| The ECN nonce is an experimental RFC that allows /senders/ to check | ||||
| the integrity of congestion feedback from /networks/. Therefore the | ||||
| nonce only helps in scenarios where the sender is trusted to control | ||||
| network congestion. On the other hand, the re-ECN protocol aims to | ||||
| allow networks themselves to be able to police cheating senders and | ||||
| receivers and to police neighbouring networks. Re-ECN is therefore | ||||
| proposed in preference to the ECN nonce on the basis that it | ||||
| addresses the generic problem of accountability for congestion of a | ||||
| network's resources at the IP layer. | ||||
| Delaying the ECN nonce is justified by two factors: | ||||
| o The ECN nonce would permanently consumes a two-bit codepoint in | ||||
| the IP header for a purpose specific to a limited trust model. | ||||
| Although the nonce is a neat idea, its applicability seems too | ||||
| limited to warrant space in the IP header; | ||||
| o Although we have re-designed the re-ECN codepoints so that they do | ||||
| not prevent the ECN nonce progressing, the same is not true the | ||||
| other way round. If the ECN nonce started to see some deployment | ||||
| (perhaps because it was blessed with proposed standard status), | ||||
| incremental deployment of re-ECN would effectively be impossible, | ||||
| because re-ECN marking fractions at inter-domain borders would be | ||||
| polluted by unknown levels of nonce traffic. | ||||
| The authors are aware that re-ECN must prove it has the potential it | ||||
| claims if it is to displace the nonce. Therefore, every effort has | ||||
| been made to complete a comprehensive specification of re-ECN so that | ||||
| its potential can be assessed. We therefore seek the opinion of the | ||||
| Internet community on whether the re-ECN protocol is sufficiently | ||||
| useful to warrant standards action. | ||||
| Changes from previous drafts (to be removed by the RFC Editor) | Changes from previous drafts (to be removed by the RFC Editor) | |||
| From -00 to -01: | From -00 to -01: | |||
| Encoding of re-ECN wire protocol changed for reasons given in | Encoding of re-ECN wire protocol changed for reasons given in | |||
| Appendix B and consequently draft substantially re-written. | Appendix B and consequently draft substantially re-written. | |||
| Substantial text added in sections on applications, incremental | Substantial text added in sections on applications, incremental | |||
| deployment, architectural rationale and security considerations. | deployment, architectural rationale and security considerations. | |||
| skipping to change at page 3, line 39 | skipping to change at page 3, line 12 | |||
| Text on (non-)issues with tunnels, encryption and link layer | Text on (non-)issues with tunnels, encryption and link layer | |||
| congestion notification added (Section 5.6 & Section 5.7). | congestion notification added (Section 5.6 & Section 5.7). | |||
| Section added giving evolvability arguments against encouraging | Section added giving evolvability arguments against encouraging | |||
| bottleneck policing (Section 6.1.2). And text on re-ECN's | bottleneck policing (Section 6.1.2). And text on re-ECN's | |||
| evolvability by design added to Section 6.1.3 | evolvability by design added to Section 6.1.3 | |||
| Text on inter-domain policing (Section 6.1.6) and inter-domain | Text on inter-domain policing (Section 6.1.6) and inter-domain | |||
| fail-safes (Section 6.1.7) added. | fail-safes (Section 6.1.7) added. | |||
| From -02 to -03: | ||||
| Started guidelines for re-ECN support in DCCP and SCTP. | ||||
| Added annex on limitations of nonce mechanism. | ||||
| Minor editorial changes throughout. | Minor editorial changes throughout. | |||
| Table of Contents | Table of Contents | |||
| 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 6 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 6 | |||
| 2. Requirements notation . . . . . . . . . . . . . . . . . . . . 7 | 2. Requirements notation . . . . . . . . . . . . . . . . . . . . 7 | |||
| 3. Protocol Overview . . . . . . . . . . . . . . . . . . . . . . 8 | 3. Protocol Overview . . . . . . . . . . . . . . . . . . . . . . 8 | |||
| 3.1. Background and Applicability . . . . . . . . . . . . . . . 8 | 3.1. Background and Applicability . . . . . . . . . . . . . . . 8 | |||
| 3.2. Re-ECN Abstracted Network Layer Wire Protocol (IPv4 or | 3.2. Re-ECN Abstracted Network Layer Wire Protocol (IPv4 or | |||
| v6) . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 | v6) . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 | |||
| 3.3. Re-ECN Protocol Operation . . . . . . . . . . . . . . . . 10 | 3.3. Re-ECN Protocol Operation . . . . . . . . . . . . . . . . 10 | |||
| 3.4. Informal Terminology . . . . . . . . . . . . . . . . . . . 12 | 3.4. Informal Terminology . . . . . . . . . . . . . . . . . . . 12 | |||
| 4. Transport Layers . . . . . . . . . . . . . . . . . . . . . . . 14 | 4. Transport Layers . . . . . . . . . . . . . . . . . . . . . . . 15 | |||
| 4.1. TCP . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 | 4.1. TCP . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 | |||
| 4.1.1. RECN mode: Full re-ECN capable transport . . . . . . . 16 | 4.1.1. RECN mode: Full re-ECN capable transport . . . . . . . 16 | |||
| 4.1.2. RECN-Co mode: Re-ECT Sender with a Vanilla or | 4.1.2. RECN-Co mode: Re-ECT Sender with a Vanilla or | |||
| Nonce ECT Receiver . . . . . . . . . . . . . . . . . . 18 | Nonce ECT Receiver . . . . . . . . . . . . . . . . . . 18 | |||
| 4.1.3. Capability Negotiation . . . . . . . . . . . . . . . . 20 | 4.1.3. Capability Negotiation . . . . . . . . . . . . . . . . 20 | |||
| 4.1.4. Extended ECN (EECN) Field Settings during Flow | 4.1.4. Extended ECN (EECN) Field Settings during Flow | |||
| Start or after Idle Periods . . . . . . . . . . . . . 21 | Start or after Idle Periods . . . . . . . . . . . . . 21 | |||
| 4.1.5. Pure ACKS, Retransmissions, Window Probes and | 4.1.5. Pure ACKS, Retransmissions, Window Probes and | |||
| Partial ACKs . . . . . . . . . . . . . . . . . . . . . 25 | Partial ACKs . . . . . . . . . . . . . . . . . . . . . 25 | |||
| 4.2. Other Transports . . . . . . . . . . . . . . . . . . . . . 26 | 4.2. Other Transports . . . . . . . . . . . . . . . . . . . . . 26 | |||
| 4.2.1. Guidelines for Adding Re-ECN to Other Transports . . . 26 | 4.2.1. General Guidelines for Adding Re-ECN to Other | |||
| 5. Network Layer . . . . . . . . . . . . . . . . . . . . . . . . 26 | Transports . . . . . . . . . . . . . . . . . . . . . . 26 | |||
| 5.1. Re-ECN IPv4 Wire Protocol . . . . . . . . . . . . . . . . 26 | 4.2.2. Guidelines for adding Re-ECN to RSVP or NSIS . . . . . 26 | |||
| 4.2.3. Guidelines for adding Re-ECN to DCCP . . . . . . . . . 27 | ||||
| 4.2.4. Guidelines for adding Re-ECN to SCTP . . . . . . . . . 27 | ||||
| 5. Network Layer . . . . . . . . . . . . . . . . . . . . . . . . 27 | ||||
| 5.1. Re-ECN IPv4 Wire Protocol . . . . . . . . . . . . . . . . 27 | ||||
| 5.2. Re-ECN IPv6 Wire Protocol . . . . . . . . . . . . . . . . 28 | 5.2. Re-ECN IPv6 Wire Protocol . . . . . . . . . . . . . . . . 28 | |||
| 5.3. Router Forwarding Behaviour . . . . . . . . . . . . . . . 29 | 5.3. Router Forwarding Behaviour . . . . . . . . . . . . . . . 30 | |||
| 5.4. Justification for Setting the First SYN to FNE . . . . . . 30 | 5.4. Justification for Setting the First SYN to FNE . . . . . . 31 | |||
| 5.5. Control and Management . . . . . . . . . . . . . . . . . . 31 | 5.5. Control and Management . . . . . . . . . . . . . . . . . . 32 | |||
| 5.5.1. Negative Balance Warning . . . . . . . . . . . . . . . 31 | 5.5.1. Negative Balance Warning . . . . . . . . . . . . . . . 32 | |||
| 5.5.2. Rate Response Control . . . . . . . . . . . . . . . . 32 | 5.5.2. Rate Response Control . . . . . . . . . . . . . . . . 33 | |||
| 5.6. IP in IP Tunnels . . . . . . . . . . . . . . . . . . . . . 32 | 5.6. IP in IP Tunnels . . . . . . . . . . . . . . . . . . . . . 33 | |||
| 5.7. Non-Issues . . . . . . . . . . . . . . . . . . . . . . . . 33 | 5.7. Non-Issues . . . . . . . . . . . . . . . . . . . . . . . . 34 | |||
| 6. Applications . . . . . . . . . . . . . . . . . . . . . . . . . 34 | 6. Applications . . . . . . . . . . . . . . . . . . . . . . . . . 35 | |||
| 6.1. Policing Congestion Response . . . . . . . . . . . . . . . 34 | 6.1. Policing Congestion Response . . . . . . . . . . . . . . . 35 | |||
| 6.1.1. The Policing Problem . . . . . . . . . . . . . . . . . 34 | 6.1.1. The Policing Problem . . . . . . . . . . . . . . . . . 35 | |||
| 6.1.2. The Case Against Bottleneck Policing . . . . . . . . . 35 | 6.1.2. The Case Against Bottleneck Policing . . . . . . . . . 36 | |||
| 6.1.3. Re-ECN Incentive Framework . . . . . . . . . . . . . . 36 | 6.1.3. Re-ECN Incentive Framework . . . . . . . . . . . . . . 37 | |||
| 6.1.4. Egress Dropper . . . . . . . . . . . . . . . . . . . . 43 | 6.1.4. Egress Dropper . . . . . . . . . . . . . . . . . . . . 44 | |||
| 6.1.5. Rate Policing . . . . . . . . . . . . . . . . . . . . 44 | 6.1.5. Rate Policing . . . . . . . . . . . . . . . . . . . . 45 | |||
| 6.1.6. Inter-domain Policing . . . . . . . . . . . . . . . . 46 | 6.1.6. Inter-domain Policing . . . . . . . . . . . . . . . . 47 | |||
| 6.1.7. Inter-domain Fail-safes . . . . . . . . . . . . . . . 50 | 6.1.7. Inter-domain Fail-safes . . . . . . . . . . . . . . . 51 | |||
| 6.1.8. Simulations . . . . . . . . . . . . . . . . . . . . . 51 | 6.1.8. Simulations . . . . . . . . . . . . . . . . . . . . . 51 | |||
| 6.2. Other Applications . . . . . . . . . . . . . . . . . . . . 51 | 6.2. Other Applications . . . . . . . . . . . . . . . . . . . . 51 | |||
| 6.2.1. DDoS Mitigation . . . . . . . . . . . . . . . . . . . 51 | 6.2.1. DDoS Mitigation . . . . . . . . . . . . . . . . . . . 52 | |||
| 6.2.2. End-to-end QoS . . . . . . . . . . . . . . . . . . . . 52 | 6.2.2. End-to-end QoS . . . . . . . . . . . . . . . . . . . . 53 | |||
| 6.2.3. Traffic Engineering . . . . . . . . . . . . . . . . . 52 | 6.2.3. Traffic Engineering . . . . . . . . . . . . . . . . . 53 | |||
| 6.2.4. Inter-Provider Service Monitoring . . . . . . . . . . 53 | 6.2.4. Inter-Provider Service Monitoring . . . . . . . . . . 53 | |||
| 6.3. Limitations . . . . . . . . . . . . . . . . . . . . . . . 53 | 6.3. Limitations . . . . . . . . . . . . . . . . . . . . . . . 53 | |||
| 7. Incremental Deployment . . . . . . . . . . . . . . . . . . . . 53 | 7. Incremental Deployment . . . . . . . . . . . . . . . . . . . . 54 | |||
| 7.1. Incremental Deployment Features . . . . . . . . . . . . . 53 | 7.1. Incremental Deployment Features . . . . . . . . . . . . . 54 | |||
| 7.2. Incremental Deployment Incentives . . . . . . . . . . . . 55 | 7.2. Incremental Deployment Incentives . . . . . . . . . . . . 55 | |||
| 8. Architectural Rationale . . . . . . . . . . . . . . . . . . . 60 | 8. Architectural Rationale . . . . . . . . . . . . . . . . . . . 60 | |||
| 9. Related Work . . . . . . . . . . . . . . . . . . . . . . . . . 62 | 9. Related Work . . . . . . . . . . . . . . . . . . . . . . . . . 63 | |||
| 9.1. Policing Rate Response to Congestion . . . . . . . . . . . 62 | 9.1. Policing Rate Response to Congestion . . . . . . . . . . . 63 | |||
| 9.2. Congestion Notification Integrity . . . . . . . . . . . . 63 | 9.2. Congestion Notification Integrity . . . . . . . . . . . . 63 | |||
| 9.3. Identifying Upstream and Downstream Congestion . . . . . . 64 | 9.3. Identifying Upstream and Downstream Congestion . . . . . . 64 | |||
| 10. Security Considerations . . . . . . . . . . . . . . . . . . . 64 | 10. Security Considerations . . . . . . . . . . . . . . . . . . . 65 | |||
| 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 66 | 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 66 | |||
| 12. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 66 | 12. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 67 | |||
| 13. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 66 | 13. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 67 | |||
| 14. Comments Solicited . . . . . . . . . . . . . . . . . . . . . . 66 | 14. Comments Solicited . . . . . . . . . . . . . . . . . . . . . . 67 | |||
| 15. References . . . . . . . . . . . . . . . . . . . . . . . . . . 67 | 15. References . . . . . . . . . . . . . . . . . . . . . . . . . . 67 | |||
| 15.1. Normative References . . . . . . . . . . . . . . . . . . . 67 | 15.1. Normative References . . . . . . . . . . . . . . . . . . . 67 | |||
| 15.2. Informative References . . . . . . . . . . . . . . . . . . 67 | 15.2. Informative References . . . . . . . . . . . . . . . . . . 68 | |||
| Appendix A. Precise Re-ECN Protocol Operation . . . . . . . . . . 70 | Appendix A. Precise Re-ECN Protocol Operation . . . . . . . . . . 71 | |||
| Appendix B. Justification for Two Codepoints Signifying Zero | Appendix B. Justification for Two Codepoints Signifying Zero | |||
| Worth Packets . . . . . . . . . . . . . . . . . . . . 71 | Worth Packets . . . . . . . . . . . . . . . . . . . . 72 | |||
| Appendix C. ECN Compatibility . . . . . . . . . . . . . . . . . . 73 | Appendix C. ECN Compatibility . . . . . . . . . . . . . . . . . . 74 | |||
| Appendix D. Packet Marking During Flow Start . . . . . . . . . . 74 | Appendix D. Packet Marking During Flow Start . . . . . . . . . . 75 | |||
| Appendix E. Example Egress Dropper Algorithm . . . . . . . . . . 74 | Appendix E. Example Egress Dropper Algorithm . . . . . . . . . . 75 | |||
| Appendix F. Re-TTL . . . . . . . . . . . . . . . . . . . . . . . 74 | Appendix F. Re-TTL . . . . . . . . . . . . . . . . . . . . . . . 75 | |||
| Appendix G. Policer Designs to ensure Congestion | Appendix G. Policer Designs to ensure Congestion | |||
| Responsiveness . . . . . . . . . . . . . . . . . . . 75 | Responsiveness . . . . . . . . . . . . . . . . . . . 76 | |||
| G.1. Per-user Policing . . . . . . . . . . . . . . . . . . . . 75 | G.1. Per-user Policing . . . . . . . . . . . . . . . . . . . . 76 | |||
| G.2. Per-flow Rate Policing . . . . . . . . . . . . . . . . . . 76 | G.2. Per-flow Rate Policing . . . . . . . . . . . . . . . . . . 77 | |||
| Appendix H. Downstream Congestion Metering Algorithms . . . . . . 79 | Appendix H. Downstream Congestion Metering Algorithms . . . . . . 80 | |||
| H.1. Bulk Downstream Congestion Metering Algorithm . . . . . . 79 | H.1. Bulk Downstream Congestion Metering Algorithm . . . . . . 80 | |||
| H.2. Inflation Factor for Persistently Negative Flows . . . . . 79 | H.2. Inflation Factor for Persistently Negative Flows . . . . . 80 | |||
| Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 81 | Appendix I. Argument for holding back the ECN nonce . . . . . . . 81 | |||
| Intellectual Property and Copyright Statements . . . . . . . . . . 82 | Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 83 | |||
| Intellectual Property and Copyright Statements . . . . . . . . . . 85 | ||||
| 1. Introduction | 1. Introduction | |||
| This document aims: | This document aims: | |||
| o To provide a complete specification of the addition of the re-ECN | o To provide a complete specification of the addition of the re-ECN | |||
| protocol to IP and guidelines on how to add it to transport layer | protocol to IP and guidelines on how to add it to transport layer | |||
| protocols, including a complete specification of re-ECN in TCP as | protocols, including a complete specification of re-ECN in TCP as | |||
| an example; | an example; | |||
| skipping to change at page 7, line 38 | skipping to change at page 7, line 38 | |||
| (Section 5) layers, then the applications it can be put to, such as | (Section 5) layers, then the applications it can be put to, such as | |||
| policing DDoS, QoS and congestion control (Section 6). Although | policing DDoS, QoS and congestion control (Section 6). Although | |||
| these applications do not require standardisation themselves, they | these applications do not require standardisation themselves, they | |||
| are described in a fair degree of detail in order to explain how re- | are described in a fair degree of detail in order to explain how re- | |||
| ECN can be used. Given, re-ECN proposes to use the last undefined | ECN can be used. Given, re-ECN proposes to use the last undefined | |||
| bit in the IPv4 header, we felt it necessary to outline the potential | bit in the IPv4 header, we felt it necessary to outline the potential | |||
| that re-ECN could release in return for being given that bit. | that re-ECN could release in return for being given that bit. | |||
| Deployment issues discussed throughout the document are brought | Deployment issues discussed throughout the document are brought | |||
| together in Section 7, which is followed by a brief section | together in Section 7, which is followed by a brief section | |||
| explaining the somewhat subtle rationale for the design, from an | explaining the somewhat subtle rationale for the design from an | |||
| architectural perspective (Section 8). We end by describing related | architectural perspective (Section 8). We end by describing related | |||
| work (Section 9), listing security considerations (Section 10) and | work (Section 9), listing security considerations (Section 10) and | |||
| finally drawing conclusions (Section 12). | finally drawing conclusions (Section 12). | |||
| 2. Requirements notation | 2. Requirements notation | |||
| The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | |||
| "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this | "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this | |||
| document are to be interpreted as described in [RFC2119]. | document are to be interpreted as described in [RFC2119]. | |||
| skipping to change at page 8, line 45 | skipping to change at page 8, line 45 | |||
| The choice of two ECT code-points in the ECN field [RFC3168] | The choice of two ECT code-points in the ECN field [RFC3168] | |||
| permitted future flexibility, optionally allowing the sender to | permitted future flexibility, optionally allowing the sender to | |||
| encode the experimental ECN nonce [RFC3540] in the packet stream. | encode the experimental ECN nonce [RFC3540] in the packet stream. | |||
| The nonce is designed to allow a sender to check the integrity of | The nonce is designed to allow a sender to check the integrity of | |||
| congestion feedback. But Section 9.2 explains that it still gives no | congestion feedback. But Section 9.2 explains that it still gives no | |||
| control over how fast the sender transmits as a result of the | control over how fast the sender transmits as a result of the | |||
| feedback. On the other hand, re-ECN is designed both to ensure that | feedback. On the other hand, re-ECN is designed both to ensure that | |||
| congestion is declared honestly and that the sender's rate responds | congestion is declared honestly and that the sender's rate responds | |||
| appropriately. | appropriately. | |||
| Re-ECN is based on a feedback arrangement called | Re-ECN is based on a feedback arrangement called `re- | |||
| `re-feedback' [Re-fb]. The word is short for either receiver- | feedback' [Re-fb]. The word is short for either receiver-aligned, | |||
| aligned, re-inserted or re-echoed feedback. But it actually works | re-inserted or re-echoed feedback. But it actually works even when | |||
| even when no feedback is available. In fact it has been carefully | no feedback is available. In fact it has been carefully designed to | |||
| designed to work for single datagram flows. Indeed, it even | work for single datagram flows. Indeed, it even encourages | |||
| encourages aggregation of single packet flows by congestion control | aggregation of single packet flows by congestion control proxies. | |||
| proxies. Then, even if the traffic mix of the Internet were to | ||||
| become dominated by short messages, it would still be possible to | Then, even if the traffic mix of the Internet were to become | |||
| control congestion effectively and efficiently. | dominated by short messages, it would still be possible to control | |||
| congestion effectively and efficiently. | ||||
| Changing the Internet's feedback architecture seems to imply | Changing the Internet's feedback architecture seems to imply | |||
| considerable upheaval. But re-ECN can be deployed incrementally at | considerable upheaval. But re-ECN can be deployed incrementally at | |||
| the transport layer around unmodified routers using existing fields | the transport layer around unmodified routers using existing fields | |||
| in IP (v4 or v6). However it does also require the last undefined | in IP (v4 or v6). However it does also require the last undefined | |||
| bit in the IPv4 header, which it uses in combination with the 2-bit | bit in the IPv4 header, which it uses in combination with the 2-bit | |||
| ECN field to create four new codepoints. Nonetheless, changes to IP | ECN field to create four new codepoints. Nonetheless, changes to IP | |||
| routers are RECOMMENDED in order to improve resilience against DoS | routers are RECOMMENDED in order to improve resilience against DoS | |||
| attacks. Similarly, re-ECN works best if both the sender and | attacks. Similarly, re-ECN works best if both the sender and | |||
| receiver transports are re-ECN-capable, but it can work with just | receiver transports are re-ECN-capable, but it can work with just | |||
| skipping to change at page 10, line 13 | skipping to change at page 10, line 13 | |||
| be defined in another specification (e.g. [Re-PCN]). | be defined in another specification (e.g. [Re-PCN]). | |||
| Although the RE flag is a separate, single bit field, it can be read | Although the RE flag is a separate, single bit field, it can be read | |||
| as an extension to the two-bit ECN field; the three concatenated bits | as an extension to the two-bit ECN field; the three concatenated bits | |||
| in what we will call the extended ECN field (EECN) making eight | in what we will call the extended ECN field (EECN) making eight | |||
| codepoints. We will use the RFC3168 names of the ECN codepoints to | codepoints. We will use the RFC3168 names of the ECN codepoints to | |||
| describe settings of the ECN field when the RE flag setting is "don't | describe settings of the ECN field when the RE flag setting is "don't | |||
| care", but we also define the following six extended ECN codepoint | care", but we also define the following six extended ECN codepoint | |||
| names for when we need to be more specific. | names for when we need to be more specific. | |||
| +-------+-----------+------+--------------+-------------------------+ | +-------+------------+------+--------------+------------------------+ | |||
| | ECN | RFC3168 | RE | Extended ECN | Re-ECN meaning | | | ECN | RFC3168 | RE | Extended ECN | Re-ECN meaning | | |||
| | field | codepoint | flag | codepoint | | | | field | codepoint | flag | codepoint | | | |||
| +-------+-----------+------+--------------+-------------------------+ | +-------+------------+------+--------------+------------------------+ | |||
| | 00 | Not-ECT | 0 | Not-RECT | Not re-ECN-capable | | | 00 | Not-ECT | 0 | Not-RECT | Not re-ECN-capable | | |||
| | | | | | transport | | | | | | | transport | | |||
| | 00 | Not-ECT | 1 | FNE | Feedback not | | | 00 | Not-ECT | 1 | FNE | Feedback not | | |||
| | | | | | established | | | | | | | established | | |||
| | 01 | ECT(1) | 0 | Re-Echo | Re-echoed congestion | | | 01 | ECT(1) | 0 | Re-Echo | Re-echoed congestion | | |||
| | | | | | and RECT | | | | | | | and RECT | | |||
| | 01 | ECT(1) | 1 | RECT | Re-ECN capable | | | 01 | ECT(1) | 1 | RECT | Re-ECN capable | | |||
| | | | | | transport | | | | | | | transport | | |||
| | 10 | ECT(0) | 0 | --- | Legacy ECN use only | | | 10 | ECT(0) | 0 | --- | Legacy ECN use only | | |||
| | | | | | | | ||||
| | 10 | ECT(0) | 1 | --CU-- | Currently unused | | | 10 | ECT(0) | 1 | --CU-- | Currently unused | | |||
| | | | | | | | | | | | | | | |||
| | 11 | CE | 0 | CE(0) | Re-Echo canceled by | | | 11 | CE | 0 | CE(0) | Re-Echo canceled by | | |||
| | | | | | congestion experienced | | | | | | | congestion experienced | | |||
| | 11 | CE | 1 | CE(-1) | Congestion experienced | | | 11 | CE | 1 | CE(-1) | Congestion experienced | | |||
| +-------+-----------+------+--------------+-------------------------+ | +-------+------------+------+--------------+------------------------+ | |||
| Table 1: Extended ECN Codepoints | Table 1: Extended ECN Codepoints | |||
| 3.3. Re-ECN Protocol Operation | 3.3. Re-ECN Protocol Operation | |||
| In this section we will give an overview of the operation of the re- | In this section we will give an overview of the operation of the re- | |||
| ECN protocol for TCP/IP, leaving a detailed specification to the | ECN protocol for TCP/IP, leaving a detailed specification to the | |||
| following sections. Other transports will be discussed later. | following sections. Other transports will be discussed later. | |||
| In summary, the protocol adds a third `re-echo' stage to the existing | In summary, the protocol adds a third `re-echo' stage to the existing | |||
| skipping to change at page 12, line 44 | skipping to change at page 12, line 44 | |||
| of a negative metric arises because it is derived by subtracting one | of a negative metric arises because it is derived by subtracting one | |||
| metric from another. Of course actual downstream congestion cannot | metric from another. Of course actual downstream congestion cannot | |||
| be negative, only the metric can (whether due to time lags or | be negative, only the metric can (whether due to time lags or | |||
| deliberate malice). | deliberate malice). | |||
| Just as we will loosely talk of positive and negative flows, we will | Just as we will loosely talk of positive and negative flows, we will | |||
| also talk of positive or negative packets, meaning packets that | also talk of positive or negative packets, meaning packets that | |||
| contribute positively or negatively to the downstream congestion | contribute positively or negatively to the downstream congestion | |||
| metric. | metric. | |||
| Therefore packets we will talk of packets having `worth' of +1, 0 or | Therefore we will talk of packets having `worth' of +1, 0 or -1, | |||
| -1, which, when multiplied by their size, indicates their | which, when multiplied by their size, indicates their contribution to | |||
| contribution to the downstream congestion metric. | the downstream congestion metric. | |||
| Figure 2 shows the main state transitions of the system once a flow | Figure 2 shows the main state transitions of the system once a flow | |||
| is established, showing the worth of packets in each state. When the | is established, showing the worth of packets in each state. When the | |||
| network congestion marks a packet it decrements its worth (moving | network congestion marks a packet it decrements its worth (moving | |||
| from the left of the main square to the right). When the sender | from the left of the main square to the right). When the sender | |||
| blanks the RE flag in order to re-echo congestion it increments the | blanks the RE flag in order to re-echo congestion it increments the | |||
| worth of a packet (moving from the bottom of the main square to the | worth of a packet (moving from the bottom of the main square to the | |||
| top). | top). | |||
| Sender state Sent Worth Received Worth | Sender state Sent Worth Received Worth | |||
| skipping to change at page 13, line 33 | skipping to change at page 13, line 33 | |||
| Figure 2: Re-ECN System State Diagram (bootstrap not shown) | Figure 2: Re-ECN System State Diagram (bootstrap not shown) | |||
| The idea is that every time the network decrements the worth of a | The idea is that every time the network decrements the worth of a | |||
| packet, the sender increments the worth of a later packet. Then, | packet, the sender increments the worth of a later packet. Then, | |||
| over time, as many positive octets should arrive at the receiver as | over time, as many positive octets should arrive at the receiver as | |||
| negative. Note we have said octets not packets, so if packets are of | negative. Note we have said octets not packets, so if packets are of | |||
| different sizes, the worth should be incremented on enough octets to | different sizes, the worth should be incremented on enough octets to | |||
| balance the octets in negative packets arriving at the receiver. It | balance the octets in negative packets arriving at the receiver. It | |||
| is this balance that will allow the network to hold the sender | is this balance that will allow the network to hold the sender | |||
| accountable for the congestion it causes, as we shall see. the | accountable for the congestion it causes, as we shall see. The | |||
| informal outline below uses TCP as an example transport, but the idea | informal outline below uses TCP as an example transport, but the idea | |||
| would be broadly similar for any transport that adapts its rate to | would be broadly similar for any transport that adapts its rate to | |||
| congestion. | congestion. | |||
| We will start with the sender in `flow established' state, Normally | We will start with the sender in `flow established' state. Normally, | |||
| as acknowledgements of earlier packets arrive that don't feedback any | as acknowledgements of earlier packets arrive that don't feedback any | |||
| congestion, the congestion window can be opened, so the sender goes | congestion, the congestion window can be opened, so the sender goes | |||
| round the smaller sub-loop, sending RECT packets (worth 0) and | round the smaller sub-loop, sending RECT packets (worth 0) and | |||
| returning to the flow established state to send another one. If a | returning to the flow established state to send another one. If a | |||
| router congestion marks one of the packets, it decrements the | router congestion marks one of the packets, it decrements the | |||
| packet's worth. The sender will have been continuing to traverse | packet's worth. The sender will have been continuing to traverse | |||
| round the smaller feedback loop every time acknowledgements arrive. | round the smaller feedback loop every time acknowledgements arrive. | |||
| But when congestion feedback returns from this packet that was marked | But when congestion feedback returns from this packet that was marked | |||
| with -1 worth (the largest loop in the figure) the sender jumps to | with -1 worth (the largest loop in the figure) the sender jumps to | |||
| the congestion echoed state in order to re-echo the congestion, | the congestion echoed state in order to re-echo the congestion, | |||
| skipping to change at page 14, line 16 | skipping to change at page 14, line 16 | |||
| the same end to end feedback loop. | the same end to end feedback loop. | |||
| If a packet carrying re-echoed congestion happens to also be | If a packet carrying re-echoed congestion happens to also be | |||
| congestion marked, the +1 worth added by the sender will be cancelled | congestion marked, the +1 worth added by the sender will be cancelled | |||
| out by the -1 network congestion marking. Although the two worth | out by the -1 network congestion marking. Although the two worth | |||
| values correctly cancel out, neither the congestion marking nor the | values correctly cancel out, neither the congestion marking nor the | |||
| re-echoed congestion are lost, because the RE bit and the ECN field | re-echoed congestion are lost, because the RE bit and the ECN field | |||
| are orthogonal. So, whenever this happens, the receiver will | are orthogonal. So, whenever this happens, the receiver will | |||
| correctly detect and re-echo the new congestion event as well (the | correctly detect and re-echo the new congestion event as well (the | |||
| top sub-loop). When we need to distinguish, we will sometimes call a | top sub-loop). When we need to distinguish, we will sometimes call a | |||
| packet marked RECT neutral (0 worth), while we will call the CE(0) | packet marked RECT 'neutral' (0 worth), while we will call the CE(0) | |||
| marking canceled (also 0 worth). If a re-echoed packet isn't unlucky | marking 'canceled' (also 0 worth). If a re-echoed packet isn't | |||
| enough to be further congestion marked, the sender will return to the | unlucky enough to be further congestion marked, the sender will | |||
| flow established state and continue to send RECT packets (worth 0). | return to the flow established state and continue to send RECT | |||
| packets (worth 0). | ||||
| The table below specifies unambiguously the worth of each extended | The table below specifies unambiguously the worth of each extended | |||
| ECN codepoint. Note the order is different from the previous table | ECN codepoint. Note the order is different from the previous table | |||
| to better show how the worth increments and decrements. The FNE | to better show how the worth increments and decrements. The FNE | |||
| codepoint is an exception. It is used in the flow bootstrap process | codepoint is an exception. It is used in the flow bootstrap process | |||
| (explained later) and has the same positive (+1) worth as a packet | (explained later) and has the same positive (+1) worth as a packet | |||
| with the Re-Echo codepoint. | with the Re-Echo codepoint. | |||
| +-------+-----+----------------+-------+----------------------------+ | +--------+------+----------------+-------+--------------------------+ | |||
| | ECN | RE | Extended ECN | Worth | Re-ECN meaning | | | ECN | RE | Extended ECN | Worth | Re-ECN meaning | | |||
| | field | bit | codepoint | | | | | field | bit | codepoint | | | | |||
| +-------+-----+----------------+-------+----------------------------+ | +--------+------+----------------+-------+--------------------------+ | |||
| | 00 | 0 | Not-RECT | ... | Not re-ECN-capable | | | 00 | 0 | Not-RECT | ... | Not re-ECN-capable | | |||
| | | | | | transport | | | | | | | transport | | |||
| | 01 | 0 | Re-Echo | +1 | Re-echoed congestion and | | | 01 | 0 | Re-Echo | +1 | Re-echoed congestion and | | |||
| | | | | | RECT | | | | | | | RECT | | |||
| | 10 | 0 | --- | ... | Legacy ECN use only | | | 10 | 0 | --- | ... | Legacy ECN use only | | |||
| | 11 | 0 | CE(0) | 0 | Re-Echo canceled by | | | 11 | 0 | CE(0) | 0 | Re-Echo canceled by | | |||
| | | | | | congestion experienced | | | | | | | congestion experienced | | |||
| | 00 | 1 | FNE | +1 | Feedback not established | | | 00 | 1 | FNE | +1 | Feedback not established | | |||
| | 01 | 1 | RECT | 0 | Re-ECN capable transport | | | 01 | 1 | RECT | 0 | Re-ECN capable transport | | |||
| | 10 | 1 | --CU-- | ... | Currently unused | | | 10 | 1 | --CU-- | ... | Currently unused | | |||
| | | | | | | | | | | | | | | |||
| | 11 | 1 | CE(-1) | -1 | Congestion experienced | | | 11 | 1 | CE(-1) | -1 | Congestion experienced | | |||
| +-------+-----+----------------+-------+----------------------------+ | +--------+------+----------------+-------+--------------------------+ | |||
| Table 3: 'Worth' of Extended ECN Codepoints | Table 3: 'Worth' of Extended ECN Codepoints | |||
| 4. Transport Layers | 4. Transport Layers | |||
| 4.1. TCP | 4.1. TCP | |||
| Re-ECN capability at the sender is essential. At the receiver it is | Re-ECN capability at the sender is essential. At the receiver it is | |||
| optional, as long as the receiver has a basic (`vanilla flavour') | optional, as long as the receiver has a basic (`vanilla flavour') | |||
| RFC3168-compliant ECN-capable transport (ECT) [RFC3168]. Given re- | RFC3168-compliant ECN-capable transport (ECT) [RFC3168]. Given re- | |||
| ECN is not the first attempt to define the semantics of the ECN | ECN is not the first attempt to define the semantics of the ECN | |||
| field, we give a table below summarising what happens for various | field, we give a table below summarising what happens for various | |||
| combinations of capabilities of the sender S and receiver R, as | combinations of capabilities of the sender S and receiver R, as | |||
| indicated in the first four columns below. The last column gives the | indicated in the first four columns below. The last column gives the | |||
| mode a half-connection should be in after the first two of the three | mode a half-connection should be in after the first two of the three | |||
| TCP handshakes. | TCP handshakes. | |||
| +--------+---------------+-----------+---------+--------------------+ | +--------+--------------+------------+---------+--------------------+ | |||
| | Re-ECT | ECT-Nonce | ECT | Not-ECT | S-R | | | Re-ECT | ECT-Nonce | ECT | Not-ECT | S-R | | |||
| | | (RFC3540) | (RFC3168) | | Half-connection | | | | (RFC3540) | (RFC3168) | | Half-connection | | |||
| | | | | | Mode | | | | | | | Mode | | |||
| +--------+---------------+-----------+---------+--------------------+ | +--------+--------------+------------+---------+--------------------+ | |||
| | SR | | | | RECN | | | SR | | | | RECN | | |||
| | S | R | | | RECN-Co | | | S | R | | | RECN-Co | | |||
| | S | | R | | RECN-Co | | | S | | R | | RECN-Co | | |||
| | S | | | R | Not-ECT | | | S | | | R | Not-ECT | | |||
| +--------+---------------+-----------+---------+--------------------+ | +--------+--------------+------------+---------+--------------------+ | |||
| Table 4: Modes of TCP Half-connection for Combinations of ECN | Table 4: Modes of TCP Half-connection for Combinations of ECN | |||
| Capabilities of Sender S and Receiver R | Capabilities of Sender S and Receiver R | |||
| We will describe what happens in each mode, then describe how they | We will describe what happens in each mode, then describe how they | |||
| are negotiated. The abbreviations for the modes in the above table | are negotiated. The abbreviations for the modes in the above table | |||
| mean: | mean: | |||
| RECN: Full re-ECN capable transport | RECN: Full re-ECN capable transport | |||
| RECN-Co: Re-ECN sender in compatibility mode with a vanilla [RFC3168] | RECN-Co: Re-ECN sender in compatibility mode with a | |||
| ECN receiver or an [RFC3540] ECN nonce-capable receiver. | vanilla [RFC3168] ECN receiver or an [RFC3540] ECN nonce-capable | |||
| Implementation of this mode is OPTIONAL. | receiver. Implementation of this mode is OPTIONAL. | |||
| Not-ECT: Not ECN-capable transport, as defined in [RFC3168] for when | Not-ECT: Not ECN-capable transport, as defined in [RFC3168] for when | |||
| at least one of the transports does not understand even basic ECN | at least one of the transports does not understand even basic ECN | |||
| marking. | marking. | |||
| Note that we use the term Re-ECT for a host transport that is re-ECN- | Note that we use the term Re-ECT for a host transport that is re-ECN- | |||
| capable but RECN for the modes of the half connections between hosts | capable but RECN for the modes of the half connections between hosts | |||
| when they are both Re-ECT. If a host transport is Re-ECT, this fact | when they are both Re-ECT. If a host transport is Re-ECT, this fact | |||
| alone does NOT imply either of its half connections will necessarily | alone does NOT imply either of its half connections will necessarily | |||
| be in RECN mode, at least not until it has confirmed that the other | be in RECN mode, at least not until it has confirmed that the other | |||
| skipping to change at page 23, line 5 | skipping to change at page 23, line 5 | |||
| RECN mode: Given the constraints on TCP's initial window [RFC3390] | RECN mode: Given the constraints on TCP's initial window [RFC3390] | |||
| and its exponential window increase during slow start | and its exponential window increase during slow start | |||
| phase [RFC2581], it turns out that the sender SHOULD set FNE on | phase [RFC2581], it turns out that the sender SHOULD set FNE on | |||
| the first and third data packets in its flow, assuming equal sized | the first and third data packets in its flow, assuming equal sized | |||
| data packets once a flow is established. Appendix D presents the | data packets once a flow is established. Appendix D presents the | |||
| calculation that led to this conclusion. Below, after running | calculation that led to this conclusion. Below, after running | |||
| through the start of an example TCP session, we give the intuition | through the start of an example TCP session, we give the intuition | |||
| learned from that calculation. | learned from that calculation. | |||
| RECN-Co mode: A re-ECT sender that switches into re-ECN compatibility | RECN-Co mode: A re-ECT sender that switches into re-ECN | |||
| mode or into Not-ECT mode (because it has detected the | compatibility mode or into Not-ECT mode (because it has detected | |||
| corresponding host is not re-ECN capable) MUST limit its initial | the corresponding host is not re-ECN capable) MUST limit its | |||
| window to 1 segment. The reasoning behind this constraint is | initial window to 1 segment. The reasoning behind this constraint | |||
| given in Section 5.4. Having set this initial window, a re-ECN | is given in Section 5.4. Having set this initial window, a re-ECN | |||
| sender in RECN-Co mode SHOULD set FNE on the first and third data | sender in RECN-Co mode SHOULD set FNE on the first and third data | |||
| packets in a flow, as for RECN mode. | packets in a flow, as for RECN mode. | |||
| +----+------+----------------+-------+-------+---------------+------+ | +----+------+----------------+-------+-------+---------------+------+ | |||
| | | Data | TCP A(Re-ECT) | IP A | IP B | TCP B(Re-ECT) | Data | | | | Data | TCP A(Re-ECT) | IP A | IP B | TCP B(Re-ECT) | Data | | |||
| +----+------+----------------+-------+-------+---------------+------+ | +----+------+----------------+-------+-------+---------------+------+ | |||
| | | Byte | SEQ ACK CTL | EECN | EECN | SEQ ACK CTL | Byte | | | | Byte | SEQ ACK CTL | EECN | EECN | SEQ ACK CTL | Byte | | |||
| | -- | ---- | ------------- | ----- | ----- | ------------- | ---- | | | -- | ---- | ------------- | ----- | ----- | ------------- | ---- | | |||
| | 1 | | 0100 SYN | FNE | --> | R.ECC=0 | | | | 1 | | 0100 SYN | FNE | --> | R.ECC=0 | | | |||
| | | | CWR,ECE,NS | | | | | | | | | CWR,ECE,NS | | | | | | |||
| skipping to change at page 26, line 7 | skipping to change at page 26, line 7 | |||
| This does not ensure precisely the same number of octets have RE | This does not ensure precisely the same number of octets have RE | |||
| blanked as were CE marked. But we believe positive errors will | blanked as were CE marked. But we believe positive errors will | |||
| cancel negative over a long enough period. {ToDo: However, more | cancel negative over a long enough period. {ToDo: However, more | |||
| research is needed to prove whether this is so. If it is not, it may | research is needed to prove whether this is so. If it is not, it may | |||
| be necessary to increment and decrement R in octets rather than | be necessary to increment and decrement R in octets rather than | |||
| packets, by incrementing R as the product of D and the size in octets | packets, by incrementing R as the product of D and the size in octets | |||
| of packets being sent (typically the MSS).} | of packets being sent (typically the MSS).} | |||
| 4.2. Other Transports | 4.2. Other Transports | |||
| 4.2.1. Guidelines for Adding Re-ECN to Other Transports | 4.2.1. General Guidelines for Adding Re-ECN to Other Transports | |||
| Re-ECT sender transports that have established the receiver transport | Re-ECT sender transports that have established the receiver transport | |||
| is at least ECN-capable (not necessarily re-ECN capable) MUST blank | is at least ECN-capable (not necessarily re-ECN capable) MUST blank | |||
| the RE codepoint in packets carrying at least as many octets as | the RE codepoint in packets carrying at least as many octets as | |||
| arrive at receiver with the CE codepoint set. Re-ECN-capable sender | arrive at receiver with the CE codepoint set. Re-ECN-capable sender | |||
| transports should always initialise the ECN field to the ECT(1) | transports should always initialise the ECN field to the ECT(1) | |||
| codepoint once a flow is established. | codepoint once a flow is established. | |||
| If the sender transport does not have sufficient feedback to even | If the sender transport does not have sufficient feedback to even | |||
| estimate the path's CE rate, it SHOULD set FNE continuously. If the | estimate the path's CE rate, it SHOULD set FNE continuously. If the | |||
| sender transport has some, perhaps stale, feedback to estimate that | sender transport has some, perhaps stale, feedback to estimate that | |||
| the path's CE rate is nearly definitely less than E%, the transport | the path's CE rate is nearly definitely less than E%, the transport | |||
| MAY blank RE in packets for E% of sent octets, and set the RECT | MAY blank RE in packets for E% of sent octets, and set the RECT | |||
| codepoint for the remainder. | codepoint for the remainder. | |||
| The following sections give guidelines on how re-ECN support could be | ||||
| added to RSVP or NSIS, to DCCP, and to SCTP - although separate | ||||
| Internet drafts will be necessary to document the exact mechanics of | ||||
| re-ECN if each of these protocols. | ||||
| {ToDo: Give a brief outline of what would be expected for each of the | {ToDo: Give a brief outline of what would be expected for each of the | |||
| following: | following: | |||
| o UDP fire and forget (e.g. DNS) | o UDP fire and forget (e.g. DNS) | |||
| o UDP streaming with no feedback | o UDP streaming with no feedback | |||
| o UDP streaming with feedback | o UDP streaming with feedback | |||
| o DCCP [RFC4340] } | } | |||
| o RSVP and/or NSIS: A separate I-D has been submitted [Re-PCN] | 4.2.2. Guidelines for adding Re-ECN to RSVP or NSIS | |||
| describing how re-ECN can be used in an edge-to-edge rather than | ||||
| end-to-end scenario. It can then be used by downstream networks | A separate I-D has been submitted [Re-PCN] describing how re-ECN can | |||
| to police whether upstream networks are blocking new flow | be used in an edge-to-edge rather than end-to-end scenario. It can | |||
| reservations when downstream congestion is too high, even though | then be used by downstream networks to police whether upstream | |||
| the congestion is in other operators' downstream networks. This | networks are blocking new flow reservations when downstream | |||
| relates to current work in progress on Admission Control over | congestion is too high, even though the congestion is in other | |||
| Diffserv using Pre-Congestion Notification, being reported to the | operators' downstream networks. This relates to current work in | |||
| IETF TSVWG [CL-deploy]. | progress on Admission Control over Diffserv using Pre-Congestion | |||
| Notification, being reported to the IETF TSVWG [CL-deploy]. | ||||
| 4.2.3. Guidelines for adding Re-ECN to DCCP | ||||
| Beside adjusting the initial features negotiation sequence, operating | ||||
| re-ECN in DCCP could be achieved by defining a new option to be added | ||||
| to acknowledgments, that would include a multibit field where the | ||||
| destination could copy its ECC. | ||||
| 4.2.4. Guidelines for adding Re-ECN to SCTP | ||||
| Annex 1 in RFC4340 gives the specifications for SCTP to support ECN. | ||||
| Similar steps should be taken to support re-ECN. Beside adjusting | ||||
| the initial features negotiation sequence, operating re-ECN in SCTP | ||||
| could be achieved by defining a new control chunk, that would include | ||||
| a multibit field where the destination could copy its ECC | ||||
| 5. Network Layer | 5. Network Layer | |||
| 5.1. Re-ECN IPv4 Wire Protocol | 5.1. Re-ECN IPv4 Wire Protocol | |||
| The wire protocol of the ECN field in the IP header remains largely | The wire protocol of the ECN field in the IP header remains largely | |||
| unchanged from [RFC3168]. However, an extension to the ECN field we | unchanged from [RFC3168]. However, an extension to the ECN field we | |||
| call the RE (re-ECN extension) flag (Section 3.2) is defined in this | call the RE (re-ECN extension) flag (Section 3.2) is defined in this | |||
| document. It doubles the extended ECN codepoint space, giving 8 | document. It doubles the extended ECN codepoint space, giving 8 | |||
| potential codepoints. The semantics of the extra codepoints are | potential codepoints. The semantics of the extra codepoints are | |||
| skipping to change at page 29, line 26 | skipping to change at page 30, line 9 | |||
| field which we would expect to change en route. As the RE flag does | field which we would expect to change en route. As the RE flag does | |||
| not need end-to-end authentication, we set the C flag to '1'. | not need end-to-end authentication, we set the C flag to '1'. | |||
| {ToDo: A Congestion Hop by Hop Option ID will need to be registered | {ToDo: A Congestion Hop by Hop Option ID will need to be registered | |||
| with IANA.} | with IANA.} | |||
| 5.3. Router Forwarding Behaviour | 5.3. Router Forwarding Behaviour | |||
| Re-ECN works well without modifying the forwarding behaviour of any | Re-ECN works well without modifying the forwarding behaviour of any | |||
| routers. However, below, two OPTIONAL changes to forwarding | routers. However, below, two OPTIONAL changes to forwarding | |||
| behaviour are defined, which respectively enhance performance and | behaviour are defined which respectively enhance performance and | |||
| improve a router's discrimination against flooding attacks. They are | improve a router's discrimination against flooding attacks. They are | |||
| both OPTIONAL additions that we propose MAY apply by default to all | both OPTIONAL additions that we propose MAY apply by default to all | |||
| Diffserv per-hop scheduling behaviours (PHBs) [RFC2475] and ECN | Diffserv per-hop scheduling behaviours (PHBs) [RFC2475] and ECN | |||
| marking behaviours [RFC3168]. Specifications for PHBs MAY define | marking behaviours [RFC3168]. Specifications for PHBs MAY define | |||
| different forwarding behaviours from this default, but this is NOT | different forwarding behaviours from this default, but this is NOT | |||
| REQUIRED. [Re-PCN] is one example. | REQUIRED. [Re-PCN] is one example. | |||
| FNE indicates ECT: | FNE indicates ECT: | |||
| The FNE codepoint tells a router to assume that the packet was | The FNE codepoint tells a router to assume that the packet was | |||
| skipping to change at page 30, line 12 | skipping to change at page 31, line 5 | |||
| it MAY preferentially drop packets within the same Diffserv PHB | it MAY preferentially drop packets within the same Diffserv PHB | |||
| using the preference order for extended ECN codepoints given in | using the preference order for extended ECN codepoints given in | |||
| Table 7. Preferential dropping can be difficult to implement on | Table 7. Preferential dropping can be difficult to implement on | |||
| some hardware, but if feasible it would discriminate against | some hardware, but if feasible it would discriminate against | |||
| attack traffic if done as part of the overall policing framework | attack traffic if done as part of the overall policing framework | |||
| of Section 6.1.3. If nowhere else, routers at the egress of a | of Section 6.1.3. If nowhere else, routers at the egress of a | |||
| network SHOULD implement preferential drop (stronger than the MAY | network SHOULD implement preferential drop (stronger than the MAY | |||
| above). For simplicity, preferences 4 & 5 MAY be merged into one | above). For simplicity, preferences 4 & 5 MAY be merged into one | |||
| preference level. | preference level. | |||
| +-------+-----+-----------+-------+------------+--------------------+ | +-------+-----+------------+-------+------------+-------------------+ | |||
| | ECN | RE | Extended | Worth | Drop Pref | Re-ECN meaning | | | ECN | RE | Extended | Worth | Drop Pref | Re-ECN meaning | | |||
| | field | bit | ECN | | (1 = drop | | | | field | bit | ECN | | (1 = drop | | | |||
| | | | codepoint | | 1st) | | | | | | codepoint | | 1st) | | | |||
| +-------+-----+-----------+-------+------------+--------------------+ | +-------+-----+------------+-------+------------+-------------------+ | |||
| | 01 | 0 | Re-Echo | +1 | 5/4 | Re-echoed | | | 01 | 0 | Re-Echo | +1 | 5/4 | Re-echoed | | |||
| | | | | | | congestion and | | | | | | | | congestion and | | |||
| | | | | | | RECT | | | | | | | | RECT | | |||
| | 00 | 1 | FNE | +1 | 4 | Feedback not | | | 00 | 1 | FNE | +1 | 4 | Feedback not | | |||
| | | | | | | established | | | | | | | | established | | |||
| | 11 | 0 | CE(0) | 0 | 3 | Re-Echo canceled | | | 11 | 0 | CE(0) | 0 | 3 | Re-Echo canceled | | |||
| | | | | | | by congestion | | | | | | | | by congestion | | |||
| | | | | | | experienced | | | | | | | | experienced | | |||
| | 01 | 1 | RECT | 0 | 3 | Re-ECN capable | | | 01 | 1 | RECT | 0 | 3 | Re-ECN capable | | |||
| | | | | | | transport | | | | | | | | transport | | |||
| | 11 | 1 | CE(-1) | -1 | 3 | Congestion | | | 11 | 1 | CE(-1) | -1 | 3 | Congestion | | |||
| | | | | | | experienced | | | | | | | | experienced | | |||
| | 10 | 1 | --CU-- | n/a | 2 | Currently Unused | | | 10 | 1 | --CU-- | n/a | 2 | Currently Unused | | |||
| | 10 | 0 | --- | n/a | 2 | Legacy ECN use | | | 10 | 0 | --- | n/a | 2 | Legacy ECN use | | |||
| | | | | | | only | | | | | | | | only | | |||
| | 00 | 0 | Not-RECT | n/a | 1 | Not re-ECN-capable | | | 00 | 0 | Not-RECT | n/a | 1 | Not | | |||
| | | | | | | re-ECN-capable | | ||||
| | | | | | | transport | | | | | | | | transport | | |||
| +-------+-----+-----------+-------+------------+--------------------+ | +-------+-----+------------+-------+------------+-------------------+ | |||
| Table 7: Drop Preference of EECN Codepoints (Sorted by `Worth') | Table 7: Drop Preference of EECN Codepoints (Sorted by `Worth') | |||
| The above drop preferences are arranged to preserve packets with | The above drop preferences are arranged to preserve packets with | |||
| more positive worth (Section 3.4), given senders of positive | more positive worth (Section 3.4), given senders of positive | |||
| packets must have honestly declared downstream congestion. This | packets must have honestly declared downstream congestion. This | |||
| is explained fully in Section 6 on applications, particularly when | is explained fully in Section 6 on applications, particularly when | |||
| the application of re-ECN to protect against DDoS attacks is | the application of re-ECN to protect against DDoS attacks is | |||
| described. | described. | |||
| skipping to change at page 31, line 9 | skipping to change at page 32, line 5 | |||
| Congested routers may mark an FNE packet to CE(-1) (Section 5.3), and | Congested routers may mark an FNE packet to CE(-1) (Section 5.3), and | |||
| the initial SYN MUST be set to FNE by Re-ECT client A | the initial SYN MUST be set to FNE by Re-ECT client A | |||
| (Section 4.1.4). So an initial SYN may be marked CE(-1) rather than | (Section 4.1.4). So an initial SYN may be marked CE(-1) rather than | |||
| dropped. This seems dangerous, because the sender has not yet | dropped. This seems dangerous, because the sender has not yet | |||
| established whether the receiver is a legacy one that does not | established whether the receiver is a legacy one that does not | |||
| understand congestion marking. It also seems to allow malicious | understand congestion marking. It also seems to allow malicious | |||
| senders to take advantage of ECN marking to avoid so much drop when | senders to take advantage of ECN marking to avoid so much drop when | |||
| launching SYN flooding attacks. Below we explain the features of the | launching SYN flooding attacks. Below we explain the features of the | |||
| protocol design that remove both these dangers. | protocol design that remove both these dangers. | |||
| ECN-capable initial SYN with a Not-ECT server: If the TCP server B is | ECN-capable initial SYN with a Not-ECT server: If the TCP server B | |||
| re-ECN capable, provision is made for it to feedback a possible | is re-ECN capable, provision is made for it to feedback a possible | |||
| congestion marked SYN in the SYN ACK (Section 4.1.4). But if the | congestion marked SYN in the SYN ACK (Section 4.1.4). But if the | |||
| TCP client A finds out from the SYN ACK that the server was not | TCP client A finds out from the SYN ACK that the server was not | |||
| ECN-capable, the TCP client MUST consider the first SYN as | ECN-capable, the TCP client MUST consider the first SYN as | |||
| congestion marked before setting itself into Not-ECT mode. | congestion marked before setting itself into Not-ECT mode. | |||
| Section 4.1.4 mandates that such a TCP client MUST also set its | Section 4.1.4 mandates that such a TCP client MUST also set its | |||
| initial window to 1 segment. In this way we remove the need to | initial window to 1 segment. In this way we remove the need to | |||
| cautiously avoid setting the first SYN to Not-RECT. This will | cautiously avoid setting the first SYN to Not-RECT. This will | |||
| give worse performance while deployment is patchy, but better | give worse performance while deployment is patchy, but better | |||
| performance once deployment is widespread. | performance once deployment is widespread. | |||
| skipping to change at page 38, line 24 | skipping to change at page 39, line 19 | |||
| their own expected downstream congestion so that N1 can deploy a | their own expected downstream congestion so that N1 can deploy a | |||
| policer at its ingress to check that S1 is complying with whatever | policer at its ingress to check that S1 is complying with whatever | |||
| congestion control it should be using (Section 6.1.5). If N1 is | congestion control it should be using (Section 6.1.5). If N1 is | |||
| extremely conservative it may police each flow, but it can choose | extremely conservative it may police each flow, but it can choose | |||
| to just police the bulk amount of congestion each customer causes | to just police the bulk amount of congestion each customer causes | |||
| without regard to flows, or if it is extremely liberal it need not | without regard to flows, or if it is extremely liberal it need not | |||
| police congestion control at all. Whatever, it is always | police congestion control at all. Whatever, it is always | |||
| preferable to police traffic at the very first ingress into an | preferable to police traffic at the very first ingress into an | |||
| internetwork, before non-compliant traffic can cause any damage. | internetwork, before non-compliant traffic can cause any damage. | |||
| Edge egress dropper: If the policer ensures the source has less right | Edge egress dropper: If the policer ensures the source has less | |||
| to a high rate the higher it declares downstream congestion, the | right to a high rate the higher it declares downstream congestion, | |||
| source has a clear incentive to understate downstream congestion. | the source has a clear incentive to understate downstream | |||
| But, if flows of packets are understated when they enter the | congestion. But, if flows of packets are understated when they | |||
| internetwork, they will have become negative by the time they | enter the internetwork, they will have become negative by the time | |||
| leave. So, we introduce a dropper at the last network egress, | they leave. So, we introduce a dropper at the last network | |||
| which drops packets in flows that persistently declare negative | egress, which drops packets in flows that persistently declare | |||
| downstream congestion (see Section 6.1.4 for details). | negative downstream congestion (see Section 6.1.4 for details). | |||
| ..competitive routing | ..competitive routing | |||
| .' : '. | .' : '. | |||
| .' p e n a l:t i e s '. | .' p e n a l:t i e s '. | |||
| : | : \ : | : | : \ : | |||
| A : | : | : | A : | : | : | |||
| |S <-----N1----> <---N2---> <---N4--> R domain | |S <-----N1----> <---N2---> <---N4--> R domain | |||
| | : | : | : | | : | : | : | |||
| | V | : | : | | V | : | : | |||
| 3% |--------+ | : | : | 3% |--------+ | : | : | |||
| skipping to change at page 40, line 14 | skipping to change at page 40, line 39 | |||
| may all be allowed different responses to congestion. The figure | may all be allowed different responses to congestion. The figure | |||
| depicts this downward pressure on N2 by the solid downward arrow | depicts this downward pressure on N2 by the solid downward arrow | |||
| at the egress of N2. Then N2 has an incentive either to police | at the egress of N2. Then N2 has an incentive either to police | |||
| the congestion response of its own ingress traffic (from N1) or to | the congestion response of its own ingress traffic (from N1) or to | |||
| emulate policing by applying penalties to N1 in turn on the basis | emulate policing by applying penalties to N1 in turn on the basis | |||
| of congestion counted at their mutual boundary. In this recursive | of congestion counted at their mutual boundary. In this recursive | |||
| way, the incentives for each flow to respond correctly to | way, the incentives for each flow to respond correctly to | |||
| congestion trace back with each flow precisely to each source, | congestion trace back with each flow precisely to each source, | |||
| despite the mechanism not recognising flows (see Section 6.2.2). | despite the mechanism not recognising flows (see Section 6.2.2). | |||
| Inter-domain congestion charging diversity: Any two networks are free | Inter-domain congestion charging diversity: Any two networks are | |||
| to agree any of a range of penalty regimes between themselves | free to agree any of a range of penalty regimes between themselves | |||
| within the following reasonable constraints. N2 should expect to | within the following reasonable constraints. N2 should expect to | |||
| have to pay penalties to N4 where penalties monotonically increase | have to pay penalties to N4 where penalties monotonically increase | |||
| with the volume of congestion and negative penalties are not | with the volume of congestion and negative penalties are not | |||
| allowed. For instance, they may agree an SLA with tiered | allowed. For instance, they may agree an SLA with tiered | |||
| congestion thresholds, where higher penalties apply the higher the | congestion thresholds, where higher penalties apply the higher the | |||
| threshold that is broken. But the most obvious (and useful) form | threshold that is broken. But the most obvious (and useful) form | |||
| of penalty is where N4 levies a charge on N2 proportional to the | of penalty is where N4 levies a charge on N2 proportional to the | |||
| volume of downstream congestion N2 dumps into N4. In the | volume of downstream congestion N2 dumps into N4. In the | |||
| explanation that follows, we assume this specific variant of | explanation that follows, we assume this specific variant of | |||
| volume charging between networks - charging proportionate to the | volume charging between networks - charging proportionate to the | |||
| skipping to change at page 43, line 45 | skipping to change at page 44, line 19 | |||
| fraction of negative octets introduced by congestion marking, leaving | fraction of negative octets introduced by congestion marking, leaving | |||
| a balance of zero. If it is less (a negative flow), it implies that | a balance of zero. If it is less (a negative flow), it implies that | |||
| the source is understating path congestion (which will reduce the | the source is understating path congestion (which will reduce the | |||
| penalties that N2 owes N4). | penalties that N2 owes N4). | |||
| If flows are positive, N4 need take no action---this simply means its | If flows are positive, N4 need take no action---this simply means its | |||
| upstream neighbour is paying more penalties than it needs to, and the | upstream neighbour is paying more penalties than it needs to, and the | |||
| source is going slower than it needs to. But, to protect itself | source is going slower than it needs to. But, to protect itself | |||
| against persistently negative flows, N4 will need to install a | against persistently negative flows, N4 will need to install a | |||
| dropper at its egress. Appendix E gives a suggested algorithm for | dropper at its egress. Appendix E gives a suggested algorithm for | |||
| this dropper. There is not intention that the dropper algorithm | this dropper. There is no intention that the dropper algorithm needs | |||
| needs to be standardised, it is merely provided to show that an | to be standardised, it is merely provided to show that an efficient, | |||
| efficient, robust algorithm is possible. But whatever algorithm is | robust algorithm is possible. But whatever algorithm is used must | |||
| used must meet the criteria below: | meet the criteria below: | |||
| o It SHOULD introduce minimal false positives for honest flows; | o It SHOULD introduce minimal false positives for honest flows; | |||
| o It SHOULD quickly detect and sanction dishonest flows (minimal | o It SHOULD quickly detect and sanction dishonest flows (minimal | |||
| false negatives); | false negatives); | |||
| o It MUST be invulnerable to state exhaustion attacks from malicious | o It MUST be invulnerable to state exhaustion attacks from malicious | |||
| sources. For instance, if the dropper uses flow-state, it should | sources. For instance, if the dropper uses flow-state, it should | |||
| not be possible for a source to send numerous packets, each with a | not be possible for a source to send numerous packets, each with a | |||
| different flow ID, to force the dropper to exhaust its memory | different flow ID, to force the dropper to exhaust its memory | |||
| capacity; | capacity; | |||
| o It MUST introduce sufficient loss in goodput so that malicious | o It MUST introduce sufficient loss in goodput so that malicious | |||
| skipping to change at page 44, line 35 | skipping to change at page 45, line 9 | |||
| setting the FNE codepoint at the start of a flow, even though there | setting the FNE codepoint at the start of a flow, even though there | |||
| is a cost to the sender of setting FNE (positive `worth'). Indeed, | is a cost to the sender of setting FNE (positive `worth'). Indeed, | |||
| with the FNE codepoint, the rate at which a sender can generate new | with the FNE codepoint, the rate at which a sender can generate new | |||
| flows can be limited (Appendix G). In this respect, the FNE | flows can be limited (Appendix G). In this respect, the FNE | |||
| codepoint works like Handley's state set-up bit [Steps_DoS]. | codepoint works like Handley's state set-up bit [Steps_DoS]. | |||
| Appendix E also gives an example dropper implementation that | Appendix E also gives an example dropper implementation that | |||
| aggregates flow state. Dropper algorithms will often maintain a | aggregates flow state. Dropper algorithms will often maintain a | |||
| moving average across flows of the fraction of RE blanked packets. | moving average across flows of the fraction of RE blanked packets. | |||
| When maintaining an average across flows, a dropper SHOULD only allow | When maintaining an average across flows, a dropper SHOULD only allow | |||
| flows into the average if they start with FNE, but it SHOULD not | flows into the average if they start with FNE, but it SHOULD NOT | |||
| include packets with the FNE codepoint set in the average. A sender | include packets with the FNE codepoint set in the average. A sender | |||
| sets the FNE codepoint when it does not have the benefit of feedback | sets the FNE codepoint when it does not have the benefit of feedback | |||
| from the receiver. So, counting packets with FNE cleared would be | from the receiver. So, counting packets with FNE cleared would be | |||
| likely to make the average unnecessarily positive, providing headroom | likely to make the average unnecessarily positive, providing headroom | |||
| (or should we say footroom?) for dishonest (negative) traffic. | (or should we say footroom?) for dishonest (negative) traffic. | |||
| If the dropper detects a persistently negative flow, it SHOULD drop | If the dropper detects a persistently negative flow, it SHOULD drop | |||
| sufficient negative and neutral packets to force the flow to not be | sufficient negative and neutral packets to force the flow to not be | |||
| negative. Drops SHOULD be focused on just sufficient packets in | negative. Drops SHOULD be focused on just sufficient packets in | |||
| misbehaving flows to remove the negative bias while doing minimal | misbehaving flows to remove the negative bias while doing minimal | |||
| skipping to change at page 54, line 5 | skipping to change at page 54, line 25 | |||
| that the feedback loop is not broken but useful data can be | that the feedback loop is not broken but useful data can be | |||
| removed. | removed. | |||
| 7. Incremental Deployment | 7. Incremental Deployment | |||
| 7.1. Incremental Deployment Features | 7.1. Incremental Deployment Features | |||
| The design of the re-ECN protocol started from the fact that the | The design of the re-ECN protocol started from the fact that the | |||
| current ECN marking behaviour of routers was sufficient and that re- | current ECN marking behaviour of routers was sufficient and that re- | |||
| feedback could be introduced around these routers by changing the | feedback could be introduced around these routers by changing the | |||
| sender behaviour but not the routers. Otherwise, if had required | sender behaviour but not the routers. Otherwise, if we had required | |||
| routers to be changed, the chance of encountering a path that had | routers to be changed, the chance of encountering a path that had | |||
| every router upgraded would be vanishly small during early | every router upgraded would be vanishly small during early | |||
| deployment, giving no incentive to start deployment. Also, as there | deployment, giving no incentive to start deployment. Also, as there | |||
| is no new forwarding behaviour, routers and hosts do not have to | is no new forwarding behaviour, routers and hosts do not have to | |||
| signal or negotiate anything. | signal or negotiate anything. | |||
| However, networks that choose to protect themselves using re-ECN do | However, networks that choose to protect themselves using re-ECN do | |||
| have to add new security functions at their trust boundaries with | have to add new security functions at their trust boundaries with | |||
| others. They distinguish legacy traffic by its ECN field. Traffic | others. They distinguish legacy traffic by its ECN field. Traffic | |||
| from Not-ECT transports is distinguishable by its Not-RECT marking. | from Not-ECT transports is distinguishable by its Not-RECT marking. | |||
| skipping to change at page 55, line 25 | skipping to change at page 55, line 47 | |||
| None of these changes REQUIRE any modifications to routers. Also | None of these changes REQUIRE any modifications to routers. Also | |||
| none of these changes affect anything about end to end congestion | none of these changes affect anything about end to end congestion | |||
| control; they are all to do with allowing networks to police that end | control; they are all to do with allowing networks to police that end | |||
| to end congestion control is well-behaved. | to end congestion control is well-behaved. | |||
| 7.2. Incremental Deployment Incentives | 7.2. Incremental Deployment Incentives | |||
| It would only be worth standardising the re-ECN protocol if there | It would only be worth standardising the re-ECN protocol if there | |||
| existed a coherent story for how it might be incrementally deployed. | existed a coherent story for how it might be incrementally deployed. | |||
| In order for it to have a chance of deployment, everyone who needs to | In order for it to have a chance of deployment, everyone who needs to | |||
| act, must have a strong incentive to act, and the incentives must | act must have a strong incentive to act, and the incentives must | |||
| arise in the order that deployment would have to happen. Re-ECN | arise in the order that deployment would have to happen. Re-ECN | |||
| works around unmodified ECN routers, but we can't just discuss why | works around unmodified ECN routers, but we can't just discuss why | |||
| and how re-ECN deployment might build on ECN deployment, because | and how re-ECN deployment might build on ECN deployment, because | |||
| there is precious little to build on in the first place. Instead, we | there is precious little to build on in the first place. Instead, we | |||
| aim to show that re-ECN deployment could carry ECN with it. We focus | aim to show that re-ECN deployment could carry ECN with it. We focus | |||
| on commercial deployment incentives, although some of the arguments | on commercial deployment incentives, although some of the arguments | |||
| apply equally to academic or government sectors. | apply equally to academic or government sectors. | |||
| ECN deployment: | ECN deployment: | |||
| skipping to change at page 58, line 40 | skipping to change at page 59, line 13 | |||
| world to the religion of policing. Networks that chose not to | world to the religion of policing. Networks that chose not to | |||
| deploy egress droppers would leave themselves open to being | deploy egress droppers would leave themselves open to being | |||
| congested by senders in other networks. But that would be their | congested by senders in other networks. But that would be their | |||
| choice. | choice. | |||
| The important aspect of the egress dropper though is that it most | The important aspect of the egress dropper though is that it most | |||
| protects the network that deploys it. If a network does not | protects the network that deploys it. If a network does not | |||
| deploy an egress dropper, sources sending into it from other | deploy an egress dropper, sources sending into it from other | |||
| networks will be able to understate the congestion they are | networks will be able to understate the congestion they are | |||
| causing. Whereas, if a network deploys an egress dropper, it can | causing. Whereas, if a network deploys an egress dropper, it can | |||
| know how much congestion other networks are dumping into it. And | know how much congestion other networks are dumping into it, and | |||
| apply penalties or charges accordingly. So, whether or not a | apply penalties or charges accordingly. So, whether or not a | |||
| network polices its own sources at ingress, it is in its interests | network polices its own sources at ingress, it is in its interests | |||
| to deploy an egress dropper. | to deploy an egress dropper. | |||
| Host support: | Host support: | |||
| In the above deployment scenario, host operating system support | In the above deployment scenario, host operating system support | |||
| for re-ECN came about through the cellular operators demanding it | for re-ECN came about through the cellular operators demanding it | |||
| in device standards (i.e. 3GPP). Of course, increasingly, mobile | in device standards (i.e. 3GPP). Of course, increasingly, mobile | |||
| devices are being built to support multiple wireless technologies. | devices are being built to support multiple wireless technologies. | |||
| skipping to change at page 60, line 7 | skipping to change at page 60, line 25 | |||
| the motivator, but it seems optimistic to expect such a level of | the motivator, but it seems optimistic to expect such a level of | |||
| joined-up thinking from today's communications industry. We | joined-up thinking from today's communications industry. We | |||
| believe a single application alone must be a sufficient motivator. | believe a single application alone must be a sufficient motivator. | |||
| In short, everyone gains from adding accountability to TCP/IP, | In short, everyone gains from adding accountability to TCP/IP, | |||
| except the selfish or malicious. So, deployment incentives tend | except the selfish or malicious. So, deployment incentives tend | |||
| to be strong. | to be strong. | |||
| 8. Architectural Rationale | 8. Architectural Rationale | |||
| In the Internet's technical community the danger of not responding to | In the Internet's technical community, the danger of not responding | |||
| congestion is well-understood, with its attendant risk of congestion | to congestion is well-understood, as well as its attendant risk of | |||
| collapse [RFC3714]. However, many of the Internet's commercial | congestion collapse [RFC3714]. However, one side of the Internet's | |||
| community consider that the very essence of IP is to provide open | commercial community considers that the very essence of IP is to | |||
| access to the internetwork for all applications. Congestion is seen | provide open access to the internetwork for all applications. They | |||
| as a symptom of over-conservative investment. And the goal of | see congestion as a symptom of over-conservative investment, and rely | |||
| application design is to find novel ways to continue working despite | on revising application designs to find novel ways to keep | |||
| congestion. They argue that the Internet was never intended to be | applications working despite congestion. They argue that the | |||
| solely for TCP-friendly applications. Another side of the Internet's | Internet was never intended to be solely for TCP-friendly | |||
| commercial community believe that it is no use providing a network | applications. Meanwhile, another side of the Internet's commercial | |||
| for novel applications if it has insufficient capacity. And it will | community believes that it is worthwhile providing a network for | |||
| always have insufficient capacity unless a greater share of | novel applications only if it has sufficient capacity, which can | |||
| application revenues can be /assured/ for the infrastructure | happen only if a greater share of application revenues can be | |||
| provider. Otherwise the major investments required will carry too | /assured/ for the infrastructure provider. Otherwise the major | |||
| much risk and won't happen. | investments required would carry too much risk and wouldn't happen. | |||
| The lesson articulated in [Tussle] is that we shouldn't embed our | The lesson articulated in [Tussle] is that we shouldn't embed our | |||
| view on these arguments into the Internet at design time. Instead we | view on these arguments into the Internet at design time. Instead we | |||
| should design the Internet so that the outcome of these arguments can | should design the Internet so that the outcome of these arguments can | |||
| get decided at run-time. Re-ECN is designed in that spirit. Once | get decided at run-time. Re-ECN is designed in that spirit. Once | |||
| the protocol is available, different network operators can choose how | the protocol is available, different network operators can choose how | |||
| liberal they want to be in holding people accountable for the | liberal they want to be in holding people accountable for the | |||
| congestion they cause. Some might boldly invest in capacity and not | congestion they cause. Some might boldly invest in capacity and not | |||
| police its use at all, hoping that novel applications will result. | police its use at all, hoping that novel applications will result. | |||
| Others might use re-ECN for fine-grained flow policing, expecting to | Others might use re-ECN for fine-grained flow policing, expecting to | |||
| skipping to change at page 62, line 39 | skipping to change at page 63, line 13 | |||
| the network layer to modify the next guess. | the network layer to modify the next guess. | |||
| 9. Related Work | 9. Related Work | |||
| {Due to lack of time, this section is incomplete. The reader is | {Due to lack of time, this section is incomplete. The reader is | |||
| referred to the Related Work section of [Re-fb] for a brief selection | referred to the Related Work section of [Re-fb] for a brief selection | |||
| of related ideas.} | of related ideas.} | |||
| 9.1. Policing Rate Response to Congestion | 9.1. Policing Rate Response to Congestion | |||
| ATM network elements send congestion back-pressure messages [ITU- | ATM network elements send congestion back-pressure | |||
| T.I.371] along each connection, duplicating any end to end feedback | messages [ITU-T.I.371] along each connection, duplicating any end to | |||
| because they don't trust it. On the other hand, re-ECN ensures | end feedback because they don't trust it. On the other hand, re-ECN | |||
| information in forwarded packets can be used for congestion | ensures information in forwarded packets can be used for congestion | |||
| management without requiring a connection-oriented architecture and | management without requiring a connection-oriented architecture and | |||
| re-using the overhead of fields that are already set aside for end to | re-using the overhead of fields that are already set aside for end to | |||
| end congestion control (and routing loop detection in the case of re- | end congestion control (and routing loop detection in the case of re- | |||
| TTL in Appendix F). | TTL in Appendix F). | |||
| We borrowed ideas from policers in the literature [pBox],[XCHOKe], | We borrowed ideas from policers in the literature [pBox],[XCHOKe], | |||
| AFD etc. for our rate equation policer. However, without the benefit | AFD etc. for our rate equation policer. However, without the benefit | |||
| of re-ECN they don't police the correct rate for the condition of | of re-ECN they don't police the correct rate for the condition of | |||
| their path. They detect unusually high /absolute/ rates, but only | their path. They detect unusually high /absolute/ rates, but only | |||
| while the policer itself is congested, because they work by detecting | while the policer itself is congested, because they work by detecting | |||
| skipping to change at page 63, line 25 | skipping to change at page 63, line 47 | |||
| accidental side-effect. They actually punish traffic that fills | accidental side-effect. They actually punish traffic that fills | |||
| troughs as much as traffic that causes peaks in utilisation. In | troughs as much as traffic that causes peaks in utilisation. In | |||
| practice network operators need to be able to allocate service by | practice network operators need to be able to allocate service by | |||
| cost during congestion, and by value at other times. | cost during congestion, and by value at other times. | |||
| 9.2. Congestion Notification Integrity | 9.2. Congestion Notification Integrity | |||
| The choice of two ECT code-points in the ECN field [RFC3168] | The choice of two ECT code-points in the ECN field [RFC3168] | |||
| permitted future flexibility, optionally allowing the sender to | permitted future flexibility, optionally allowing the sender to | |||
| encode the experimental ECN nonce [RFC3540] in the packet stream. | encode the experimental ECN nonce [RFC3540] in the packet stream. | |||
| This mechanism has since been included in the specifications of DCCP | ||||
| [RFC4340]. | ||||
| The ECN nonce is an elegant scheme that allows the sender to detect | The ECN nonce is an elegant scheme that allows the sender to detect | |||
| if someone in the feedback loop tries to claim no congestion was | if someone in the feedback loop - the receiver especially - tries to | |||
| experienced when it fact it was (whether drop or ECN marking). The | claim no congestion was experienced when in fact congestion lead to | |||
| sender chooses between the two ECT codepoints in a pseudo-random | packet drops or ECN marks. For each packet it sends, the sender | |||
| sequence. Then, whenever the network marks a packet with CE, to deny | chooses between the two ECT codepoints in a pseudo-random sequence. | |||
| the congestion happened, the cheater would have to guess which ECT | Then, whenever the network marks a packet with CE, if the receiver | |||
| codepoint was overwritten, with only a 50:50 chance of being correct | wants to deny congestion happened, she has to guess which ECT | |||
| each time. | codepoint was overwritten. She has only a 50:50 chance of being | |||
| correct each time she denies a congestion mark or a drop, which | ||||
| ultimately will give her away. | ||||
| The assumption behind the ECN nonce is that a sender will want to | The purpose of a network-layer nonce has to be the protection of the | |||
| detect whether a receiver is suppressing congestion feedback. This | network in the first place, while a transport-layer nonce had better | |||
| is only true if the sender's interests are aligned with the | be used to protect the sender from cheating receivers. Now, the | |||
| network's, or with the community of users as a whole. This may be | assumption behind the ECN nonce is that a sender will want to detect | |||
| true for certain large senders, who are under close scrutiny and have | whether a receiver is suppressing congestion feedback. This is only | |||
| a reputation to maintain. But we have to deal with a more hostile | true if the sender's interests are aligned with the network's, or | |||
| world, where traffic may be dominated by peer-to-peer transfers, | with the community of users as a whole. This may be true for certain | |||
| rather than downloads from a few popular sites. Often the `natural' | large senders, who are under close scrutiny and have a reputation to | |||
| self-interest of a sender is not aligned with the interests of other | maintain. But we have to deal with a more hostile world, where | |||
| traffic may be dominated by peer-to-peer transfers, rather than | ||||
| downloads from a few popular sites. Often the `natural' self- | ||||
| interest of a sender is not aligned with the interests of other | ||||
| users. It often wishes to transfer data quickly to the receiver as | users. It often wishes to transfer data quickly to the receiver as | |||
| much as the receiver wants the data quickly. | much as the receiver wants the data quickly. | |||
| In contrast, the re-ECN protocol enables policing of an agreed rate- | In contrast, the re-ECN protocol enables policing of an agreed rate- | |||
| response to congestion (e.g. TCP-friendliness) at the sender's | response to congestion (e.g. TCP-friendliness) at the sender's | |||
| interface with the internetwork. It also ensures downstream networks | interface with the internetwork. It also ensures downstream networks | |||
| can police their upstream neighbours, to encourage them to police | can police their upstream neighbours, to encourage them to police | |||
| their users in turn. But most importantly, it requires the sender to | their users in turn. But most importantly, it requires the sender to | |||
| declare path congestion to the network and it can remove traffic at | declare path congestion to the network and it can remove traffic at | |||
| the egress if this declaration is dishonest. So it can police | the egress if this declaration is dishonest. So it can police | |||
| skipping to change at page 67, line 22 | skipping to change at page 68, line 5 | |||
| [RFC2309] Braden, B., Clark, D., Crowcroft, J., Davie, B., Deering, | [RFC2309] Braden, B., Clark, D., Crowcroft, J., Davie, B., Deering, | |||
| S., Estrin, D., Floyd, S., Jacobson, V., Minshall, G., | S., Estrin, D., Floyd, S., Jacobson, V., Minshall, G., | |||
| Partridge, C., Peterson, L., Ramakrishnan, K., Shenker, | Partridge, C., Peterson, L., Ramakrishnan, K., Shenker, | |||
| S., Wroclawski, J., and L. Zhang, "Recommendations on | S., Wroclawski, J., and L. Zhang, "Recommendations on | |||
| Queue Management and Congestion Avoidance in the | Queue Management and Congestion Avoidance in the | |||
| Internet", RFC 2309, April 1998. | Internet", RFC 2309, April 1998. | |||
| [RFC2581] Allman, M., Paxson, V., and W. Stevens, "TCP Congestion | [RFC2581] Allman, M., Paxson, V., and W. Stevens, "TCP Congestion | |||
| Control", RFC 2581, April 1999. | Control", RFC 2581, April 1999. | |||
| [RFC2960] Stewart, R., Xie, Q., Morneault, K., Sharp, C., | ||||
| Schwarzbauer, H., Taylor, T., Rytina, I., Kalla, M., | ||||
| Zhang, L., and V. Paxson, "Stream Control Transmission | ||||
| Protocol", RFC 2960, October 2000. | ||||
| [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition | [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition | |||
| of Explicit Congestion Notification (ECN) to IP", | of Explicit Congestion Notification (ECN) to IP", | |||
| RFC 3168, September 2001. | RFC 3168, September 2001. | |||
| [RFC3390] Allman, M., Floyd, S., and C. Partridge, "Increasing TCP's | [RFC3390] Allman, M., Floyd, S., and C. Partridge, "Increasing TCP's | |||
| Initial Window", RFC 3390, October 2002. | Initial Window", RFC 3390, October 2002. | |||
| [RFC3540] Spring, N., Wetherall, D., and D. Ely, "Robust Explicit | [RFC4340] Kohler, E., Handley, M., and S. Floyd, "Datagram | |||
| Congestion Notification (ECN) Signaling with Nonces", | Congestion Control Protocol (DCCP)", RFC 4340, March 2006. | |||
| RFC 3540, June 2003. | ||||
| [RFC4341] Floyd, S. and E. Kohler, "Profile for Datagram Congestion | ||||
| Control Protocol (DCCP) Congestion Control ID 2: TCP-like | ||||
| Congestion Control", RFC 4341, March 2006. | ||||
| [RFC4342] Floyd, S., Kohler, E., and J. Padhye, "Profile for | ||||
| Datagram Congestion Control Protocol (DCCP) Congestion | ||||
| Control ID 3: TCP-Friendly Rate Control (TFRC)", RFC 4342, | ||||
| March 2006. | ||||
| 15.2. Informative References | 15.2. Informative References | |||
| [ARI05] Adams, J., Roberts, L., and A. IJsselmuiden, "Changing the | [ARI05] Adams, J., Roberts, L., and A. IJsselmuiden, "Changing the | |||
| Internet to Support Real-Time Content Supply from a Large | Internet to Support Real-Time Content Supply from a Large | |||
| Fraction of Broadband Residential Users", BT Technology | Fraction of Broadband Residential Users", BT Technology | |||
| Journal (BTTJ) 23(2), April 2005. | Journal (BTTJ) 23(2), April 2005. | |||
| [Bauer06] Bauer, S., Faratin, P., and R. Beverly, "Assessing the | [Bauer06] Bauer, S., Faratin, P., and R. Beverly, "Assessing the | |||
| assumptions underlying mechanism design for the Internet", | assumptions underlying mechanism design for the Internet", | |||
| skipping to change at page 69, line 28 | skipping to change at page 70, line 25 | |||
| [RFC2988] Paxson, V. and M. Allman, "Computing TCP's Retransmission | [RFC2988] Paxson, V. and M. Allman, "Computing TCP's Retransmission | |||
| Timer", RFC 2988, November 2000. | Timer", RFC 2988, November 2000. | |||
| [RFC3124] Balakrishnan, H. and S. Seshan, "The Congestion Manager", | [RFC3124] Balakrishnan, H. and S. Seshan, "The Congestion Manager", | |||
| RFC 3124, June 2001. | RFC 3124, June 2001. | |||
| [RFC3514] Bellovin, S., "The Security Flag in the IPv4 Header", | [RFC3514] Bellovin, S., "The Security Flag in the IPv4 Header", | |||
| RFC 3514, April 2003. | RFC 3514, April 2003. | |||
| [RFC3540] Spring, N., Wetherall, D., and D. Ely, "Robust Explicit | ||||
| Congestion Notification (ECN) Signaling with Nonces", | ||||
| RFC 3540, June 2003. | ||||
| [RFC3714] Floyd, S. and J. Kempf, "IAB Concerns Regarding Congestion | [RFC3714] Floyd, S. and J. Kempf, "IAB Concerns Regarding Congestion | |||
| Control for Voice Traffic in the Internet", RFC 3714, | Control for Voice Traffic in the Internet", RFC 3714, | |||
| March 2004. | March 2004. | |||
| [RFC4340] Kohler, E., Handley, M., and S. Floyd, "Datagram | ||||
| Congestion Control Protocol (DCCP)", RFC 4340, March 2006. | ||||
| [Re-PCN] Briscoe, B., "Emulating Border Flow Policing using Re-ECN | [Re-PCN] Briscoe, B., "Emulating Border Flow Policing using Re-ECN | |||
| on Bulk Data", draft-briscoe-tsvwg-re-ecn-border-cheat-01 | on Bulk Data", draft-briscoe-tsvwg-re-ecn-border-cheat-01 | |||
| (work in progress), March 2006. | (work in progress), March 2006. | |||
| [Re-fb] Briscoe, B., Jacquet, A., Di Cairano-Gilfedder, C., | [Re-fb] Briscoe, B., Jacquet, A., Di Cairano-Gilfedder, C., | |||
| Salvatori, A., Soppera, A., and M. Koyabe, "Policing | Salvatori, A., Soppera, A., and M. Koyabe, "Policing | |||
| Congestion Response in an Internetwork Using Re-Feedback", | Congestion Response in an Internetwork Using Re-Feedback", | |||
| ACM SIGCOMM CCR 35(4)277--288, August 2005, <http:// | ACM SIGCOMM CCR 35(4)277--288, August 2005, <http:// | |||
| www.acm.org/sigs/sigcomm/sigcomm2005/ | www.acm.org/sigs/sigcomm/sigcomm2005/ | |||
| techprog.html#session8>. | techprog.html#session8>. | |||
| skipping to change at page 70, line 31 | skipping to change at page 71, line 28 | |||
| Protocols (ICNP-02) , November 2002, | Protocols (ICNP-02) , November 2002, | |||
| <http://www.cc.gatech.edu/~akumar/xchoke.pdf>. | <http://www.cc.gatech.edu/~akumar/xchoke.pdf>. | |||
| [pBox] Floyd, S. and K. Fall, "Promoting the Use of End-to-End | [pBox] Floyd, S. and K. Fall, "Promoting the Use of End-to-End | |||
| Congestion Control in the Internet", IEEE/ACM Transactions | Congestion Control in the Internet", IEEE/ACM Transactions | |||
| on Networking 7(4) 458--472, August 1999, | on Networking 7(4) 458--472, August 1999, | |||
| <http://www.aciri.org/floyd/end2end-paper.html>. | <http://www.aciri.org/floyd/end2end-paper.html>. | |||
| Appendix A. Precise Re-ECN Protocol Operation | Appendix A. Precise Re-ECN Protocol Operation | |||
| {ToDo: fix this} | ||||
| The protocol operation described in Section 3.3 was an approximation. | The protocol operation described in Section 3.3 was an approximation. | |||
| In fact, standard ECN router marking combines 1% and 2% marking into | In fact, standard ECN router marking combines 1% and 2% marking into | |||
| slightly less than 3% whole-path marking, because routers | slightly less than 3% whole-path marking, because routers | |||
| deliberately mark CE whether or not it has already been marked by | deliberately mark CE whether or not it has already been marked by | |||
| another router upstream. So the combined marking fraction would | another router upstream. So the combined marking fraction would | |||
| actually be 100% - (100% - 1%)(100% - 2%) = 2.98%. | actually be 100% - (100% - 1%)(100% - 2%) = 2.98%. | |||
| To generalise this we will need some notation. | To generalise this we will need some notation. | |||
| o j represents the index of each resource (typically queues) along a | o j represents the index of each resource (typically queues) along a | |||
| skipping to change at page 74, line 12 | skipping to change at page 75, line 12 | |||
| defines this combination as a non-ECN-setup SYN ACK, which remains | defines this combination as a non-ECN-setup SYN ACK, which remains | |||
| true for vanilla and Nonce ECTs. But for re-ECN we define it as a | true for vanilla and Nonce ECTs. But for re-ECN we define it as a | |||
| Re-ECN-setup SYN ACK. We didn't use a SYN ACK with both CWR and | Re-ECN-setup SYN ACK. We didn't use a SYN ACK with both CWR and | |||
| ECE cleared to 0 because that would be the likely response from | ECE cleared to 0 because that would be the likely response from | |||
| most Not-ECT receivers. And we didn't use a SYN ACK with both CWR | most Not-ECT receivers. And we didn't use a SYN ACK with both CWR | |||
| and ECE set to 1 either, as at least one broken receiver | and ECE set to 1 either, as at least one broken receiver | |||
| implementation echoes whatever flags were in the SYN into its SYN | implementation echoes whatever flags were in the SYN into its SYN | |||
| ACK. Therefore we define a Re-ECN-setup SYN ACK as one with CWR=1 | ACK. Therefore we define a Re-ECN-setup SYN ACK as one with CWR=1 | |||
| & ECE=0. | & ECE=0. | |||
| Choice of two alternative SYN ACKs: the NS flag may take either value | Choice of two alternative SYN ACKs: the NS flag may take either | |||
| in a Re-ECN-setup SYN ACK. Section 5.4 REQUIRES that a Re-ECT | value in a Re-ECN-setup SYN ACK. Section 5.4 REQUIRES that a Re- | |||
| server MUST set the NS flag to 1 in a Re-ECN-setup SYN ACK to echo | ECT server MUST set the NS flag to 1 in a Re-ECN-setup SYN ACK to | |||
| congestion experienced (CE) on the initial SYN. Otherwise a Re- | echo congestion experienced (CE) on the initial SYN. Otherwise a | |||
| ECN-setup SYN ACK MUST be returned with NS=0. The only current | Re-ECN-setup SYN ACK MUST be returned with NS=0. The only current | |||
| known use of the NS flag in a SYN ACK is to indicate support for | known use of the NS flag in a SYN ACK is to indicate support for | |||
| the ECN nonce, which will be negotiated by setting CWR=0 & ECE=1. | the ECN nonce, which will be negotiated by setting CWR=0 & ECE=1. | |||
| Given the ECN nonce MUST NOT be used for a RECN mode connection, a | Given the ECN nonce MUST NOT be used for a RECN mode connection, a | |||
| Re-ECN-setup SYN ACK can use either setting of the NS flag without | Re-ECN-setup SYN ACK can use either setting of the NS flag without | |||
| any risk of confusion, because the CWR & ECE flags will be | any risk of confusion, because the CWR & ECE flags will be | |||
| reversed relative to those used by an ECN nonce SYN ACK. | reversed relative to those used by an ECN nonce SYN ACK. | |||
| Appendix D. Packet Marking During Flow Start | Appendix D. Packet Marking During Flow Start | |||
| {ToDo: Write up proof that sender should mark FNE on first and third | {ToDo: Write up proof that sender should mark FNE on first and third | |||
| skipping to change at page 81, line 5 | skipping to change at page 81, line 37 | |||
| account from the subset I. Then the weighted mean of all these | account from the subset I. Then the weighted mean of all these | |||
| samples should be taken a_S = sum_{forall I} V_{fI} / sum_{forall I} | samples should be taken a_S = sum_{forall I} V_{fI} / sum_{forall I} | |||
| V_{bI}. | V_{bI}. | |||
| If V_b is the result of the bulk accounting algorithm over the | If V_b is the result of the bulk accounting algorithm over the | |||
| accounting period (Appendix H.1) it can be inflated by this factor | accounting period (Appendix H.1) it can be inflated by this factor | |||
| a_S to get a good unbiased estimate of the volume of downstream | a_S to get a good unbiased estimate of the volume of downstream | |||
| congestion over the accounting period a_S.V_b, without being polluted | congestion over the accounting period a_S.V_b, without being polluted | |||
| by the effect of persistently negative flows. | by the effect of persistently negative flows. | |||
| Appendix I. Argument for holding back the ECN nonce | ||||
| The ECN nonce is a mechanism that allows a /sending/ transport to | ||||
| detect if drop or ECN marking at a congested router has been | ||||
| suppressed by a node somewhere in the feedback loop---another router | ||||
| or the receiver. | ||||
| Space for the ECN nonce was set aside in [RFC3168] (currently | ||||
| proposed standard) while the full nonce mechanism is specified in RFC | ||||
| 3540 (currently experimental). The specifications for [RFC4340] | ||||
| (currently proposed standard) requires that "Each DCCP sender SHOULD | ||||
| set ECN Nonces on its packets...". It also mandates as a requirement | ||||
| for all CCID profiles that "Any newly defined acknowledgement | ||||
| mechanism MUST include a way to transmit ECN Nonce Echoes back to the | ||||
| sender.", therefore: | ||||
| o The CCID profile for TCP-like Congestion Control [RFC4341] | ||||
| (currently proposed standard) says "The sender will use the ECN | ||||
| Nonce for data packets, and the receiver will echo those nonces in | ||||
| its Ack Vectors." | ||||
| o The CCID profile for TCP-Friendly Rate Control (TFRC) [RFC4342] | ||||
| recommends that "The sender [use] Loss Intervals options' ECN | ||||
| Nonce Echoes (and possibly any Ack Vectors' ECN Nonce Echoes) to | ||||
| probabilistically verify that the receiver is correctly reporting | ||||
| all dropped or marked packets." | ||||
| The ECN nonce is used for three types of functions: | ||||
| o if the sender wants to ensure the integrity of the information | ||||
| about packet drops, | ||||
| o if the sending transport chooses to act in the interests of a | ||||
| congested router, | ||||
| o if the sending transport wants to allocate its own resources in | ||||
| proportion to the rates that each network path can sustain, based | ||||
| on congestion control. | ||||
| However, when the nonce is used to protect the integrity of | ||||
| information about packet drops, rather than ECN marks, a transport | ||||
| layer nonce will always be sufficient (because a drop loses the | ||||
| transport header as well as the ECN field in the network header), | ||||
| which would avoid using scarce IP header codepoint space. Similarly, | ||||
| a transport layer nonce would protect against a receiver sending | ||||
| early acknowledgements. | ||||
| The other two functions need the ECN nonce to be in the network | ||||
| layer, but both require rather optimistic trust assumptions in order | ||||
| to be useful. If the sending transport chooses to act in the | ||||
| interests of a congested router, it can reduce its rate if it detects | ||||
| some malicious party in the feedback loop may be suppressing ECN | ||||
| feedback. But it would only be useful to a router when /all/ senders | ||||
| using the router are trusted to act in the router's interest. | ||||
| In the end, the only essential use of a network layer nonce is when | ||||
| sending transports (e.g. large servers) want to allocate their /own/ | ||||
| resources in proportion to the rates that each network path can | ||||
| sustain, based on congestion control. In that case, the nonce allows | ||||
| senders to be assured that they aren't being duped into giving more | ||||
| of their own resources to a particular flow. And if congestion | ||||
| suppression is detected, the sending transport can rate limit the | ||||
| offending connection to protect its own resources. Certainly, this | ||||
| is a useful function, but the IETF should carefully decide whether | ||||
| such a single, very specific case warrants IP header space. | ||||
| In contrast, re-ECN allows all routers to fully protect themselves | ||||
| from such attacks, without having to trust anyone - senders, | ||||
| receivers, neighbouring networks. Re-ECN is therefore proposed in | ||||
| preference to the ECN nonce on the basis that it addresses the | ||||
| generic problem of accountability for congestion of a network's | ||||
| resources at the IP layer. | ||||
| Delaying the ECN nonce is justified because the applicability of the | ||||
| ECN nonce seems too limited for it to consume a two-bit codepoint in | ||||
| the IP header. | ||||
| Moreover, while we have re-designed the re-ECN codepoints so that | ||||
| they do not prevent the ECN nonce progressing, the same is not true | ||||
| the other way round. If the ECN nonce started to see some deployment | ||||
| (perhaps because it was blessed with proposed standard status), | ||||
| incremental deployment of re-ECN would effectively be impossible, | ||||
| because re-ECN marking fractions at inter-domain borders would be | ||||
| polluted by unknown levels of nonce traffic. | ||||
| The authors are aware that re-ECN must prove it has the potential it | ||||
| claims if it is to displace the nonce. Therefore, every effort has | ||||
| been made to complete a comprehensive specification of re-ECN so that | ||||
| its potential can be assessed. We therefore seek the opinion of the | ||||
| Internet community on whether the re-ECN protocol is sufficiently | ||||
| useful to warrant standards action. | ||||
| Authors' Addresses | Authors' Addresses | |||
| Bob Briscoe | Bob Briscoe | |||
| BT & UCL | BT & UCL | |||
| B54/77, Adastral Park | B54/77, Adastral Park | |||
| Martlesham Heath | Martlesham Heath | |||
| Ipswich IP5 3RE | Ipswich IP5 3RE | |||
| UK | UK | |||
| Phone: +44 1473 645196 | Phone: +44 1473 645196 | |||
| skipping to change at page 82, line 5 | skipping to change at page 85, line 5 | |||
| BT | BT | |||
| B54/69, Adastral Park | B54/69, Adastral Park | |||
| Martlesham Heath | Martlesham Heath | |||
| Ipswich IP5 3RE | Ipswich IP5 3RE | |||
| UK | UK | |||
| Phone: +44 1473 646923 | Phone: +44 1473 646923 | |||
| Email: martin.koyabe@bt.com | Email: martin.koyabe@bt.com | |||
| URI: | URI: | |||
| Intellectual Property Statement | Full Copyright Statement | |||
| Copyright (C) The Internet Society (2006). | ||||
| This document is subject to the rights, licenses and restrictions | ||||
| contained in BCP 78, and except as set forth therein, the authors | ||||
| retain all their rights. | ||||
| This document and the information contained herein are provided on an | ||||
| "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS | ||||
| OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET | ||||
| ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, | ||||
| INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE | ||||
| INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED | ||||
| WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. | ||||
| Intellectual Property | ||||
| The IETF takes no position regarding the validity or scope of any | The IETF takes no position regarding the validity or scope of any | |||
| Intellectual Property Rights or other rights that might be claimed to | Intellectual Property Rights or other rights that might be claimed to | |||
| pertain to the implementation or use of the technology described in | pertain to the implementation or use of the technology described in | |||
| this document or the extent to which any license under such rights | this document or the extent to which any license under such rights | |||
| might or might not be available; nor does it represent that it has | might or might not be available; nor does it represent that it has | |||
| made any independent effort to identify any such rights. Information | made any independent effort to identify any such rights. Information | |||
| on the procedures with respect to rights in RFC documents can be | on the procedures with respect to rights in RFC documents can be | |||
| found in BCP 78 and BCP 79. | found in BCP 78 and BCP 79. | |||
| skipping to change at page 82, line 29 | skipping to change at page 85, line 45 | |||
| such proprietary rights by implementers or users of this | such proprietary rights by implementers or users of this | |||
| specification can be obtained from the IETF on-line IPR repository at | specification can be obtained from the IETF on-line IPR repository at | |||
| http://www.ietf.org/ipr. | http://www.ietf.org/ipr. | |||
| The IETF invites any interested party to bring to its attention any | The IETF invites any interested party to bring to its attention any | |||
| copyrights, patents or patent applications, or other proprietary | copyrights, patents or patent applications, or other proprietary | |||
| rights that may cover technology that may be required to implement | rights that may cover technology that may be required to implement | |||
| this standard. Please address the information to the IETF at | this standard. Please address the information to the IETF at | |||
| ietf-ipr@ietf.org. | ietf-ipr@ietf.org. | |||
| Disclaimer of Validity | ||||
| This document and the information contained herein are provided on an | ||||
| "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS | ||||
| OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET | ||||
| ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, | ||||
| INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE | ||||
| INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED | ||||
| WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. | ||||
| Copyright Statement | ||||
| Copyright (C) The Internet Society (2006). This document is subject | ||||
| to the rights, licenses and restrictions contained in BCP 78, and | ||||
| except as set forth therein, the authors retain all their rights. | ||||
| Acknowledgment | Acknowledgment | |||
| Funding for the RFC Editor function is currently provided by the | Funding for the RFC Editor function is provided by the IETF | |||
| Internet Society. | Administrative Support Activity (IASA). | |||
| End of changes. 68 change blocks. | ||||
| 219 lines changed or deleted | 340 lines changed or added | |||
This html diff was produced by rfcdiff 1.33. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ | ||||