| draft-briscoe-tsvwg-byte-pkt-mark-01.txt | draft-briscoe-tsvwg-byte-pkt-mark-02.txt | |||
|---|---|---|---|---|
| Transport Area Working Group B. Briscoe | Transport Area Working Group B. Briscoe | |||
| Internet-Draft BT & UCL | Internet-Draft BT & UCL | |||
| Intended status: Informational November 19, 2007 | Intended status: Informational February 24, 2008 | |||
| Expires: May 22, 2008 | Expires: August 27, 2008 | |||
| Byte and Packet Congestion Notification | Byte and Packet Congestion Notification | |||
| draft-briscoe-tsvwg-byte-pkt-mark-01 | draft-briscoe-tsvwg-byte-pkt-mark-02 | |||
| Status of this Memo | Status of this Memo | |||
| By submitting this Internet-Draft, each author represents that any | By submitting this Internet-Draft, each author represents that any | |||
| applicable patent or other IPR claims of which he or she is aware | applicable patent or other IPR claims of which he or she is aware | |||
| have been or will be disclosed, and any of which he or she becomes | have been or will be disclosed, and any of which he or she becomes | |||
| aware will be disclosed, in accordance with Section 6 of BCP 79. | aware will be disclosed, in accordance with Section 6 of BCP 79. | |||
| Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
| Task Force (IETF), its areas, and its working groups. Note that | Task Force (IETF), its areas, and its working groups. Note that | |||
| skipping to change at page 1, line 34 | skipping to change at page 1, line 34 | |||
| and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
| time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
| material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
| The list of current Internet-Drafts can be accessed at | The list of current Internet-Drafts can be accessed at | |||
| http://www.ietf.org/ietf/1id-abstracts.txt. | http://www.ietf.org/ietf/1id-abstracts.txt. | |||
| The list of Internet-Draft Shadow Directories can be accessed at | The list of Internet-Draft Shadow Directories can be accessed at | |||
| http://www.ietf.org/shadow.html. | http://www.ietf.org/shadow.html. | |||
| This Internet-Draft will expire on May 22, 2008. | This Internet-Draft will expire on August 27, 2008. | |||
| Copyright Notice | Copyright Notice | |||
| Copyright (C) The IETF Trust (2007). | Copyright (C) The IETF Trust (2008). | |||
| Abstract | Abstract | |||
| This memo concerns dropping or marking packets using active queue | This memo concerns dropping or marking packets using active queue | |||
| management (AQM) such as random early detection (RED) or pre- | management (AQM) such as random early detection (RED) or pre- | |||
| congestion notification (PCN). It answers the question of whether to | congestion notification (PCN). The primary conclusion is that packet | |||
| take packet size into account when network equipment writes | size should be taken into account when transports decode congestion | |||
| congestion notification, or when transports read it. The primary | indications, not when network equipment writes them. Reducing drop | |||
| conclusion is that the variant of RED that gives lower drop | of small packets has some tempting advantages: i) it drops less | |||
| probability to smaller packets (byte-mode packet drop) should not be | control packets, which tend to be small and ii) it makes TCP's bit- | |||
| used because it creates a perverse incentive for transports to use | rate less dependent on packet size. However, there are ways of | |||
| tiny segments, consequently also opening up a DoS vulnerability. | addressing these issues at the transport layer, rather than reverse | |||
| TCP's lack of attention to packet size and its sensitivity to loss of | engineering network forwarding to fix specific transport problems. | |||
| SYNs and ACKs should be fixed in TCP, not by reverse engineering | Network layer algorithms like the byte-mode packet drop variant of | |||
| network forwarding to fix transport protocols. Nonetheless raw drop- | RED should not be used to drop fewer small packets, because that | |||
| tail is just as vulnerable to gaming by small packets, so AQM itself | creates a perverse incentive for transports to use tiny segments, | |||
| should not be turned off. | consequently also opening up a DoS vulnerability. | |||
| Table of Contents | Table of Contents | |||
| 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 5 | |||
| 2. Requirements notation . . . . . . . . . . . . . . . . . . . . 6 | 2. Motivating Arguments . . . . . . . . . . . . . . . . . . . . . 9 | |||
| 3. Working Definition of Congestion Notification . . . . . . . . 7 | 2.1. Scaling Congestion Control with Packet Size . . . . . . . 9 | |||
| 4. Congestion Measurement . . . . . . . . . . . . . . . . . . . . 7 | 2.2. Avoiding Perverse Incentives to (ab)use Smaller Packets . 10 | |||
| 5. Idealised Wire Protocol Coding . . . . . . . . . . . . . . . . 8 | 2.3. Small != Control . . . . . . . . . . . . . . . . . . . . . 11 | |||
| 6. The State of the Art . . . . . . . . . . . . . . . . . . . . . 10 | 3. Working Definition of Congestion Notification . . . . . . . . 12 | |||
| 6.1. Congestion Measurement: Status . . . . . . . . . . . . . . 10 | 4. Congestion Measurement . . . . . . . . . . . . . . . . . . . . 12 | |||
| 6.2. Congestion Coding: Status . . . . . . . . . . . . . . . . 11 | 4.1. Congestion Measurement by Queue Length . . . . . . . . . . 12 | |||
| 6.2.1. Network Bias when Encoding . . . . . . . . . . . . . . 11 | 4.1.1. Fixed Size Packet Buffers . . . . . . . . . . . . . . 12 | |||
| 6.2.2. Transport Bias when Decoding . . . . . . . . . . . . . 13 | 4.2. Congestion Measurement without a Queue . . . . . . . . . . 14 | |||
| 6.2.3. Congestion Coding: Summary of Status . . . . . . . . . 14 | 5. Idealised Wire Protocol Coding . . . . . . . . . . . . . . . . 14 | |||
| 7. Outstanding Issues and Next Steps . . . . . . . . . . . . . . 15 | 6. The State of the Art . . . . . . . . . . . . . . . . . . . . . 16 | |||
| 7.1. Bit-congestible World . . . . . . . . . . . . . . . . . . 15 | 6.1. Congestion Measurement: Status . . . . . . . . . . . . . . 16 | |||
| 7.2. Bit- & Packet-congestible World . . . . . . . . . . . . . 16 | 6.2. Congestion Coding: Status . . . . . . . . . . . . . . . . 17 | |||
| 8. Security Considerations . . . . . . . . . . . . . . . . . . . 17 | 6.2.1. Network Bias when Encoding . . . . . . . . . . . . . . 17 | |||
| 9. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 17 | 6.2.2. Transport Bias when Decoding . . . . . . . . . . . . . 19 | |||
| 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 19 | 6.2.3. Making Transports Robust against Control Packet | |||
| 11. Comments Solicited . . . . . . . . . . . . . . . . . . . . . . 19 | Losses . . . . . . . . . . . . . . . . . . . . . . . . 20 | |||
| 6.2.4. Congestion Coding: Summary of Status . . . . . . . . . 21 | ||||
| 7. Outstanding Issues and Next Steps . . . . . . . . . . . . . . 23 | ||||
| 7.1. Bit-congestible World . . . . . . . . . . . . . . . . . . 23 | ||||
| 7.2. Bit- & Packet-congestible World . . . . . . . . . . . . . 24 | ||||
| 8. Security Considerations . . . . . . . . . . . . . . . . . . . 25 | ||||
| 9. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 26 | ||||
| 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 27 | ||||
| 11. Comments Solicited . . . . . . . . . . . . . . . . . . . . . . 27 | ||||
| Editorial Comments . . . . . . . . . . . . . . . . . . . . . . . . | Editorial Comments . . . . . . . . . . . . . . . . . . . . . . . . | |||
| Appendix A. Example Scenarios . . . . . . . . . . . . . . . . . . 19 | Appendix A. Example Scenarios . . . . . . . . . . . . . . . . . . 28 | |||
| A.1. Notation . . . . . . . . . . . . . . . . . . . . . . . . . 19 | A.1. Notation . . . . . . . . . . . . . . . . . . . . . . . . . 28 | |||
| A.2. Bit-congestible resource, equal bit rates (Ai) . . . . . . 20 | A.2. Bit-congestible resource, equal bit rates (Ai) . . . . . . 28 | |||
| A.3. Bit-congestible resource, equal packet rates (Bi) . . . . 21 | A.3. Bit-congestible resource, equal packet rates (Bi) . . . . 29 | |||
| A.4. Pkt-congestible resource, equal bit rates (Aii) . . . . . 22 | A.4. Pkt-congestible resource, equal bit rates (Aii) . . . . . 30 | |||
| A.5. Pkt-congestible resource, equal packet rates (Bii) . . . . 22 | A.5. Pkt-congestible resource, equal packet rates (Bii) . . . . 31 | |||
| Appendix B. Congestion Notification Definition: Further | Appendix B. Congestion Notification Definition: Further | |||
| Justification . . . . . . . . . . . . . . . . . . . . 23 | Justification . . . . . . . . . . . . . . . . . . . . 31 | |||
| Appendix C. Byte-mode Drop Complicates Policing Congestion | Appendix C. Byte-mode Drop Complicates Policing Congestion | |||
| Response . . . . . . . . . . . . . . . . . . . . . . 23 | Response . . . . . . . . . . . . . . . . . . . . . . 32 | |||
| 12. References . . . . . . . . . . . . . . . . . . . . . . . . . . 25 | 12. References . . . . . . . . . . . . . . . . . . . . . . . . . . 33 | |||
| 12.1. Normative References . . . . . . . . . . . . . . . . . . . 25 | 12.1. Normative References . . . . . . . . . . . . . . . . . . . 33 | |||
| 12.2. Informative References . . . . . . . . . . . . . . . . . . 26 | 12.2. Informative References . . . . . . . . . . . . . . . . . . 33 | |||
| Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 28 | Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 36 | |||
| Intellectual Property and Copyright Statements . . . . . . . . . . 29 | Intellectual Property and Copyright Statements . . . . . . . . . . 37 | |||
| Changes from Previous Versions | ||||
| To be removed by the RFC Editor on publication. | ||||
| Full incremental diffs between each version are available at | ||||
| <http://www.cs.ucl.ac.uk/staff/B.Briscoe/pubs.html#byte-pkt-mark> | ||||
| (courtesy of the rfcdiff tool): | ||||
| From -01 to -02 (this version): | ||||
| Abstract reorganised to align with clearer separation of issue | ||||
| in the memo. | ||||
| Introduction reorganised with motivating arguments removed to | ||||
| new Section 2. | ||||
| Clarified avoiding lock-out of large packets is not the main or | ||||
| only motivation for RED. | ||||
| Mentioned choice of drop or marking explicitly throughout, | ||||
| rather than trying to coin a word to mean either. | ||||
| Generalised the discussion throughout to any packet forwarding | ||||
| function on any network equipment, not just routers. | ||||
| Clarified the last point about why this is a good time to sort | ||||
| out this issue: because it will be hard / impossible to design | ||||
| new transports unless we decide whether the network or the | ||||
| transport is allowing for packet size. | ||||
| Added statement explaining the horizon of the memo is long | ||||
| term, but with short term expediency in mind. | ||||
| Added material on scaling congestion control with packet size | ||||
| (Section 2.1). | ||||
| Separated out issue of normalising TCP's bit rate from issue of | ||||
| preference to control packets (Section 2.3). | ||||
| Divided up Congestion Measurement section for clarity, | ||||
| including new material on fixed size packet buffers and buffer | ||||
| carving (Section 4.1.1 & Section 6.2.1) and on congestion | ||||
| measurement in wireless link technologies without queues | ||||
| (Section 4.2). | ||||
| Added section on 'Making Transports Robust against Control | ||||
| Packet Losses' (Section 6.2.3) with existing & new material | ||||
| included. | ||||
| Added tabulated results of vendor survey on byte-mode drop | ||||
| variant of RED (Table 2). | ||||
| From -00 to -01: | ||||
| Clarified applicability to drop as well as ECN. | ||||
| Highlighted DoS vulnerability. | ||||
| Emphasised that drop-tail suffers from similar problems to | ||||
| byte-mode drop, so only byte-mode drop should be turned off, | ||||
| not RED itself. | ||||
| Clarified the original apparent motivations for recommending | ||||
| byte-mode drop included protecting SYNs and pure ACKs more than | ||||
| equalising the bit rates of TCPs with different segment sizes. | ||||
| Removed some conjectured motivations. | ||||
| Added support for updates to TCP in progress (ackcc & ecn-syn- | ||||
| ack). | ||||
| Updated survey results with newly arrived data. | ||||
| Pulled all recommendations together into the conclusions. | ||||
| Moved some detailed points into two additional appendices and a | ||||
| note. | ||||
| Considerable clarifications throughout. | ||||
| Updated references | ||||
| Requirements notation | ||||
| The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | ||||
| "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this | ||||
| document are to be interpreted as described in [RFC2119]. | ||||
| 1. Introduction | 1. Introduction | |||
| When notifying congestion, the problem of how (and whether) to take | When notifying congestion, the problem of how (and whether) to take | |||
| packet sizes into account has exercised the minds of researchers and | packet sizes into account has exercised the minds of researchers and | |||
| practitioners for as long as active queue management (AQM) has been | practitioners for as long as active queue management (AQM) has been | |||
| discussed. Indeed, AQM was originally introduced largely to remove | discussed. Indeed, one reason AQM was originally introduced was to | |||
| the advantage that small packets get from drop-tail queues. This | reduce the lock-out effects that small packets can have on large | |||
| memo aims to state the principles we should be using and to come to | packets in drop-tail queues. This memo aims to state the principles | |||
| conclusions on what these principles will mean for future protocol | we should be using and to come to conclusions on what these | |||
| design, taking into account the deployments we have already. | principles will mean for future protocol design, taking into account | |||
| the deployments we have already. | ||||
| Note that the byte vs. packet dilemma concerns congestion | Note that the byte vs. packet dilemma concerns congestion | |||
| notification irrespective of whether it is signalled implicitly by | notification irrespective of whether it is signalled implicitly by | |||
| drop or using explicit congestion notification (ECN [RFC3168]). | drop or using explicit congestion notification (ECN [RFC3168] or PCN | |||
| Throughout this document, unless clear from the context, the term | [I-D.ietf-pcn-architecture]). Throughout this document, unless clear | |||
| congestion marking, or just marking, will be used to mean either drop | from the context, the term marking will be used to mean notifying | |||
| or explicit congestion notification. | congestion explicitly, while congestion notification will be used to | |||
| mean notifying congestion either implicitly by drop or explicitly by | ||||
| marking. | ||||
| If the load on a resource depends on the rate at which packets | If the load on a resource depends on the rate at which packets | |||
| arrive, it is called packet-congestible. If the load depends on the | arrive, it is called packet-congestible. If the load depends on the | |||
| rate at which bits arrive it is called bit-congestible. | rate at which bits arrive it is called bit-congestible. | |||
| Examples of packet-congestible resources are route look-up engines | Examples of packet-congestible resources are route look-up engines | |||
| and firewalls, because load depends on how many packet headers they | and firewalls, because load depends on how many packet headers they | |||
| have to process. Examples of bit-congestible resources are | have to process. Examples of bit-congestible resources are | |||
| transmission links, and buffer memory, because the load depends on | transmission links, and buffer memory, because the load depends on | |||
| how many bits they have to transmit or store. Note that information | how many bits they have to transmit or store. Note that information | |||
| skipping to change at page 4, line 27 | skipping to change at page 7, line 27 | |||
| The controversy is mainly around the other two stages: whether to | The controversy is mainly around the other two stages: whether to | |||
| allow for packet size when the network codes or when the transport | allow for packet size when the network codes or when the transport | |||
| decodes congestion notification. In RED, the variant that reduces | decodes congestion notification. In RED, the variant that reduces | |||
| drop probability for packets based on their size in bytes is called | drop probability for packets based on their size in bytes is called | |||
| byte-mode drop, while the variant that doesn't is called packet mode | byte-mode drop, while the variant that doesn't is called packet mode | |||
| drop. Whether queues are measured in bytes or packets is an | drop. Whether queues are measured in bytes or packets is an | |||
| orthogonal choice, termed byte-mode queue measurement or packet-mode | orthogonal choice, termed byte-mode queue measurement or packet-mode | |||
| queue measurement. | queue measurement. | |||
| Currently, the paper trail of advice referenced from the RFC series | Currently, the RFC series is silent on this matter other than a paper | |||
| conditionally recommends byte-mode (packet-size dependent) drop, | trail of advice referenced from [RFC2309], which conditionally | |||
| although all the implementers who responded to our survey have | recommends byte-mode (packet-size dependent) drop [pktByteEmail]. | |||
| ignored this advice. The primary purpose of this memo is to build a | However, all the implementers who responded to our survey have not | |||
| definitive consensus against allowing for packet size in AQM | followed this advice. The primary purpose of this memo is to build a | |||
| algorithms and record this advice within the RFC series. | definitive consensus against deliberate preferential treatment for | |||
| small packets in AQM algorithms and to record this advice within the | ||||
| Increasingly, it is being recognised that a protocol design must take | RFC series. | |||
| care not to cause unintended consequences by giving the parties in | ||||
| the protocol exchange perverse incentives [Evol_cc][RFC3426]. For | ||||
| instance, imagine a scenario where the same bit rate of packets will | ||||
| contribute the same to congestion of a link irrespective of whether | ||||
| it is sent as fewer larger packets or more smaller packets. A | ||||
| protocol design that caused larger packets to be more likely to be | ||||
| dropped than smaller ones would be dangerous in this case. | ||||
| Transports would tend to act in their own interests by breaking their | ||||
| data stream down into tiny segments, reducing their drop rate without | ||||
| reducing their bit rate. Further, encouraging a high volume of tiny | ||||
| packets might in turn unnecessarily overload a completely unrelated | ||||
| part of the system, perhaps more limited by header-processing than | ||||
| bandwidth. | ||||
| Imagine two flows arrive at a bit-congestible transmission link each | ||||
| with the same bit rate, say 1Mbps, but one consists of 1500B and the | ||||
| other 60B packets, which are 25x smaller. If the advice referred to | ||||
| from RFC2309 is followed, gentle RED [gentle_RED] would be used, | ||||
| configured to adjust the drop probability of packets in proportion to | ||||
| each packet's size (byte mode packet drop). So in this case, if RED | ||||
| drops 25% of the larger packets, it will aim to drop 1% of the | ||||
| smaller packets (but in practice it may drop more as congestion | ||||
| increases [RFC4828](S.B.4)[Note_Variation]). Even though both flows | ||||
| arrive with the same bit rate, the bit rate the RED queue aims to | ||||
| pass to the line will be 750k for the flow of larger packet but 990k | ||||
| for the smaller packets (but because of rate variation it will be | ||||
| less than this target). It can be seen that this behaviour reopens | ||||
| the same denial of service vulnerability that drop tail queues offer | ||||
| to floods of small packet, though not necessarily as strongly (see | ||||
| Section 8). | ||||
| The above advice (that referred to by RFC2309) says the question of | ||||
| whether a packet's own size should affect its drop probability | ||||
| "depends on the dominant end-to-end congestion control mechanisms". | ||||
| But we argue the network layer should not be optimised for whatever | ||||
| transport is predominant. For instance, TCP congestion control | ||||
| ensures that flows competing for the same resource each maintain the | ||||
| same number of segments in flight, irrespective of segment size. | ||||
| Even though reducing the drop probability of small packets helps | ||||
| correct this feature of TCP, we argue it should be corrected in TCP | ||||
| itself, not in the network. Favouring small packets also reduces the | ||||
| chance of dropping SYNs and pure ACKs, which has a disproportionate | ||||
| effect on TCP performance. But again, rather than fix these problems | ||||
| in the network, we argue that TCP should be altered. Effectively, | ||||
| favouring small packets is reverse engineering of the network layer | ||||
| around TCP, contrary to the excellent advice in [RFC3426], which asks | ||||
| designers to question "Why are you proposing a solution at this layer | ||||
| of the protocol stack, rather than at another layer?" | ||||
| Now is a good time to discuss whether fairness between different | Now is a good time to discuss whether fairness between different | |||
| sized packets would best be implemented in the network layer, or at | sized packets would best be implemented in the network layer, or at | |||
| the transport, for a number of reasons: | the transport, for a number of reasons: | |||
| 1. The packet vs. byte issue requires speedy resolution because the | 1. The packet vs. byte issue requires speedy resolution because the | |||
| IETF pre-congestion notification (PCN) working group is in the | IETF pre-congestion notification (PCN) working group is in the | |||
| process of being chartered to produce a standards track | process of being chartered to produce a standards track | |||
| specification of its congestion marking (AQM) algorithm | specification of its congestion notification (AQM) algorithm | |||
| [PCNcharter]; | [PCNcharter]; | |||
| 2. [RFC2309] says RED may either take account of packet size or not | 2. [RFC2309] says RED may either take account of packet size or not | |||
| when dropping, but gives no recommendation between the two, | when dropping, but gives no recommendation between the two, | |||
| referring instead to advice on the performance implications in an | referring instead to advice on the performance implications in an | |||
| email [pktByteEmail], which recommends byte-mode drop. Further, | email [pktByteEmail], which recommends byte-mode drop. Further, | |||
| just before RFC2309 was issued, an addendum was added to the | just before RFC2309 was issued, an addendum was added to the | |||
| archived email that revisited the issue of packet vs. byte-mode | archived email that revisited the issue of packet vs. byte-mode | |||
| drop in its last para, making the recommendation less clear-cut; | drop in its last para, making the recommendation less clear-cut; | |||
| 3. Without this memo, the only advice in the RFC series on packet | 3. Without this memo, the only advice in the RFC series on packet | |||
| size bias in AQM algorithms would be a reference to an archived | size bias in AQM algorithms would be a reference to an archived | |||
| email in [RFC2309] (including an addendum at the end of the email | email in [RFC2309] (including an addendum at the end of the email | |||
| to correct the original). | to correct the original). | |||
| 4. The IRTF Internet Congestion Control Research Group (ICCRG) | 4. The IRTF Internet Congestion Control Research Group (ICCRG) | |||
| recently took on the challenge of building consensus on what | recently took on the challenge of building consensus on what | |||
| common congestion control support should be required from | common congestion control support should be required from network | |||
| forwarding engines on routers in the future | forwarding functions in future | |||
| [I-D.irtf-iccrg-welzl-congestion-control-open-research]. The | [I-D.irtf-iccrg-welzl-congestion-control-open-research]. The | |||
| wider Internet community needs to discuss whether the complexity | wider Internet community needs to discuss whether the complexity | |||
| of adjusting for packet size should be on routers or in | of adjusting for packet size should be in the network or in | |||
| transports; | transports; | |||
| 5. Given there are many good reasons why larger path max | 5. Given there are many good reasons why larger path max | |||
| transmission units (PMTUs) would help solve a number of scaling | transmission units (PMTUs) would help solve a number of scaling | |||
| issues, we don't want to create any bias against large packets | issues, we don't want to create any bias against large packets | |||
| that is greater than their true cost; | that is greater than their true cost; | |||
| 6. And finally, given it has recently been pointed out that TCP | 6. The IETF has started to consider the question of fairness between | |||
| doesn't achieve any meaningful fairness anyway [Rate_fair_Dis], | flows that use different packet sizes (e.g. in the small-packet | |||
| because it doesn't consider fairness over all the flows a user | variant of TCP-friendly rate control, TFRC-SP [RFC4828]). Given | |||
| transmits nor over time, modifying the network rather than | transports with different packet sizes, if we don't decide | |||
| modifying TCP still won't achieve fairness. It seems more likely | whether the network or the transport should allow for packet | |||
| we have to face up to evolving beyond TCP anyway. | size, it will be hard if not impossible to design any transport | |||
| protocol so that its bit-rate relative to other transports meets | ||||
| design guidelines [RFC5033] (Note however that, if the concern | ||||
| were fairness between users, rather than between flows | ||||
| [Rate_fair_Dis], relative rates between flows would have to come | ||||
| under run-time control rather than being embedded in protocol | ||||
| designs). | ||||
| This memo starts from first principles, defining congestion | This memo is initially concerned with how we should correctly scale | |||
| notification in Section 3 then determining the correct way to measure | congestion control functions with packet size for the long term. But | |||
| congestion (Section 4) and to design an idealised congestion | it also recognises that expediency may be necessary to deal with | |||
| notification protocol (Section 5). It then surveys the advice given | existing widely deployed protocols that don't live up to the long | |||
| previously in the RFC series, the research literature and the | term goal. It turns out that the 'correct' variant of RED to deploy | |||
| deployed legacy (Section 6) before listing outstanding issues | seems to be the one everyone has deployed, and no-one who responded | |||
| to our survey has implemented the other variant. However, at the | ||||
| transport layer, TCP congestion control is a widely deployed protocol | ||||
| that we argue doesn't scale correctly with packet size. To date this | ||||
| hasn't been a significant problem because most TCPs have been used | ||||
| with similar packet sizes. But, as we design new congestion | ||||
| controls, we should build in scaling with packet size rather than | ||||
| assuming we should follow TCP's example. | ||||
| Motivating arguments for our advice are given next in Section 2. | ||||
| Then the body of the memo starts from first principles, defining | ||||
| congestion notification in Section 3 then determining the correct way | ||||
| to measure congestion (Section 4) and to design an idealised | ||||
| congestion notification protocol (Section 5). It then surveys the | ||||
| advice given previously in the RFC series, the research literature | ||||
| and the deployed legacy (Section 6) before listing outstanding issues | ||||
| (Section 7) that will need resolution both to achieve the ideal | (Section 7) that will need resolution both to achieve the ideal | |||
| protocol and to handle legacy. After discussing security | protocol and to handle legacy. After discussing security | |||
| considerations (Section 8) strong recommendations for the way forward | considerations (Section 8) strong recommendations for the way forward | |||
| are given in the conclusions (Section 9). | are given in the conclusions (Section 9). | |||
| 2. Requirements notation | 2. Motivating Arguments | |||
| The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | 2.1. Scaling Congestion Control with Packet Size | |||
| "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this | ||||
| document are to be interpreted as described in [RFC2119]. | There are two ways of interpreting a dropped or marked packet. It | |||
| can either be considered as a single loss event or as loss/marking of | ||||
| the bytes in the packet. Here we try to design a test to see which | ||||
| approach scales with packet size. | ||||
| Imagine a bit-congestible link shared by many flows, so that each | ||||
| busy period tends to cause packets to be lost from different flows. | ||||
| The test compares two identical scenarios with the same applications, | ||||
| the same numbers of sources and the same load. But every source | ||||
| breaks the load into large packets in one case and small packets in | ||||
| the other. Of course, because the load is the same, there will be | ||||
| proportionately more packets in the small packet case. | ||||
| The test of whether a congestion control scales with packet size is | ||||
| that it should respond the same to the same congestion excursion, | ||||
| irrespective of the size of the packets that the bytes causing | ||||
| congestion happen to be broken down into. | ||||
| A bit-congestible queue suffering a congestion excursion has to drop | ||||
| or mark the same excess bytes whether they are in a few large packets | ||||
| or many small packets. So for the same congestion excursion, the | ||||
| same amount of bytes have to be shed to get the load back to its | ||||
| operating point. But, of course, for smaller packets more packets | ||||
| will have to be discarded to shed the same bytes. | ||||
| If all the transports interpret each drop/mark as a single loss event | ||||
| irrespective of the size of the packet dropped, they will respond | ||||
| more to the same congestion excursion, failing our test. On the | ||||
| other hand, if they respond proportionately less when smaller packets | ||||
| are dropped/marked, overall they will be able to respond the same to | ||||
| the same congestion excursion. | ||||
| Therefore, for a congestion control to scale with packet size it | ||||
| should respond to dropped or marked bytes (as TFRC-SP [RFC4828] | ||||
| does), not just to dropped or marked packets irrespective of packet | ||||
| size (as TCP does). | ||||
| The above advice (the email [pktByteEmail] referred to by RFC2309) | ||||
| says the question of whether a packet's own size should affect its | ||||
| drop probability "depends on the dominant end-to-end congestion | ||||
| control mechanisms". But we argue the network layer should not be | ||||
| optimised for whatever transport is predominant. | ||||
| TCP congestion control ensures that flows competing for the same | ||||
| resource each maintain the same number of segments in flight, | ||||
| irrespective of segment size. So under similar conditions, flows | ||||
| with different segment sizes will get different bit rates. But even | ||||
| though reducing the drop probability of small packets helps ensure | ||||
| TCPs with different packet sizes will achieve similar bit rates, we | ||||
| argue this should be achieved in TCP itself, not in the network. | ||||
| Effectively, favouring small packets is reverse engineering of the | ||||
| network layer around TCP, contrary to the excellent advice in | ||||
| [RFC3426], which asks designers to question "Why are you proposing a | ||||
| solution at this layer of the protocol stack, rather than at another | ||||
| layer?" | ||||
| 2.2. Avoiding Perverse Incentives to (ab)use Smaller Packets | ||||
| Increasingly, it is being recognised that a protocol design must take | ||||
| care not to cause unintended consequences by giving the parties in | ||||
| the protocol exchange perverse incentives [Evol_cc][RFC3426]. Again, | ||||
| imagine a scenario where the same bit rate of packets will contribute | ||||
| the same to congestion of a link irrespective of whether it is sent | ||||
| as fewer larger packets or more smaller packets. A protocol design | ||||
| that caused larger packets to be more likely to be dropped than | ||||
| smaller ones would be dangerous in this case: | ||||
| Malicious transports: A queue that gives an advantage to small | ||||
| packets can be used to amplify the force of a flooding attack. By | ||||
| sending a flood of small packets, the attacker can get the queue | ||||
| to discard more traffic in large packets, allowing more attack | ||||
| traffic to get through to cause further damage. Such a queue | ||||
| allows attack traffic to have a disproportionately large effect on | ||||
| regular traffic without the attacker having to do much work. The | ||||
| byte-mode drop variant of RED amplifies small packet attacks. | ||||
| Drop-tail queues amplify small packet attacks even more than RED | ||||
| byte-mode drop (see the Security Considerations section | ||||
| Section 8). Wherever possible neither should be used. | ||||
| Normal transports: Even if a transport is not malicious, if it finds | ||||
| small packets go faster, it will tend to act in its own interest | ||||
| and use them. Queues that give advantage to small packets create | ||||
| an evolutionary pressure for transports to send at the same bit- | ||||
| rate but break their data stream down into tiny segments to reduce | ||||
| their drop rate. Encouraging a high volume of tiny packets might | ||||
| in turn unnecessarily overload a completely unrelated part of the | ||||
| system, perhaps more limited by header-processing than bandwidth. | ||||
| Imagine two flows arrive at a bit-congestible transmission link each | ||||
| with the same bit rate, say 1Mbps, but one consists of 1500B and the | ||||
| other 60B packets, which are 25x smaller. Consider a scenario where | ||||
| gentle RED [gentle_RED] is used, along with the variant of RED we | ||||
| advise against, i.e. where the RED algorithm is configured to adjust | ||||
| the drop probability of packets in proportion to each packet's size | ||||
| (byte mode packet drop). In this case, if RED drops 25% of the | ||||
| larger packets, it will aim to drop 1% of the smaller packets (but in | ||||
| practice it may drop more as congestion increases | ||||
| [RFC4828](S.B.4)[Note_Variation]). Even though both flows arrive | ||||
| with the same bit rate, the bit rate the RED queue aims to pass to | ||||
| the line will be 750k for the flow of larger packet but 990k for the | ||||
| smaller packets (but because of rate variation it will be less than | ||||
| this target). It can be seen that this behaviour reopens the same | ||||
| denial of service vulnerability that drop tail queues offer to floods | ||||
| of small packet, though not necessarily as strongly (see Section 8). | ||||
| 2.3. Small != Control | ||||
| It is tempting to drop small packets with lower probability to | ||||
| improve performance, because many control packets are small (TCP SYNs | ||||
| & ACKs, DNS queries & responses, SIP messages, HTTP GETs, etc) and | ||||
| dropping fewer control packets considerably improves performance. | ||||
| However, we must not give control packets preference purely by virtue | ||||
| of their smallness, otherwise it is too easy for any data source to | ||||
| get the same preferential treatment simply by sending data in smaller | ||||
| packets. Again we are creating perverse incentives to favour small | ||||
| packets rather than to favour control packets, which is what we | ||||
| intend. | ||||
| Just because many control packets are small does not mean all small | ||||
| packets are control packets. | ||||
| So again, rather than fix these problems in the network layer, we | ||||
| argue that the transport should be made more robust against losses of | ||||
| control packets (see 'Making Transports Robust against Control Packet | ||||
| Losses' in Section 6.2.3). | ||||
| 3. Working Definition of Congestion Notification | 3. Working Definition of Congestion Notification | |||
| Rather than aim to achieve what many have tried and failed, this memo | Rather than aim to achieve what many have tried and failed, this memo | |||
| will not try to define congestion. It will give a working definition | will not try to define congestion. It will give a working definition | |||
| of what congestion notification should be taken to mean for this | of what congestion notification should be taken to mean for this | |||
| document. Congestion notification is a changing signal that aims to | document. Congestion notification is a changing signal that aims to | |||
| communicate the ratio E/L, where E is the instantaneous excess load | communicate the ratio E/L, where E is the instantaneous excess load | |||
| offered to a resource that it cannot (or would not) serve and L is | offered to a resource that it cannot (or would not) serve and L is | |||
| the instantaneous offered load. | the instantaneous offered load. | |||
| skipping to change at page 7, line 31 | skipping to change at page 12, line 31 | |||
| congestion notification is a real number bounded by the range [0,1]. | congestion notification is a real number bounded by the range [0,1]. | |||
| This ties in with the most well-understood form of congestion | This ties in with the most well-understood form of congestion | |||
| notification: drop rate. It also means that congestion has a natural | notification: drop rate. It also means that congestion has a natural | |||
| interpretation as a probability; the probability of offered traffic | interpretation as a probability; the probability of offered traffic | |||
| not being served (or being marked as at risk of not being served). | not being served (or being marked as at risk of not being served). | |||
| Appendix B describes a further incidental benefit that arises from | Appendix B describes a further incidental benefit that arises from | |||
| using load as the denominator of congestion notification. | using load as the denominator of congestion notification. | |||
| 4. Congestion Measurement | 4. Congestion Measurement | |||
| 4.1. Congestion Measurement by Queue Length | ||||
| Queue length is usually the most correct and simplest way to measure | Queue length is usually the most correct and simplest way to measure | |||
| congestion of a resource. To avoid the pathological effects of drop | congestion of a resource. To avoid the pathological effects of drop | |||
| tail, an AQM function can then be used to transform queue length into | tail, an AQM function can then be used to transform queue length into | |||
| the probability of dropping or marking a packet (e.g. RED's | the probability of dropping or marking a packet (e.g. RED's | |||
| piecewise linear function between thresholds). If the resource is | piecewise linear function between thresholds). If the resource is | |||
| bit-congestible, the length of the queue SHOULD be measured in bytes. | bit-congestible, the length of the queue SHOULD be measured in bytes. | |||
| If the resource is packet-congestible, the length of the queue SHOULD | If the resource is packet-congestible, the length of the queue SHOULD | |||
| be measured in packets. No other choice makes sense, because the | be measured in packets. No other choice makes sense, because the | |||
| number of packets waiting in the queue isn't relevant if the resource | number of packets waiting in the queue isn't relevant if the resource | |||
| gets congested by bytes and vice versa. We discuss the implications | gets congested by bytes and vice versa. We discuss the implications | |||
| on RED's byte mode and packet mode for measuring queue length in | on RED's byte mode and packet mode for measuring queue length in | |||
| Section 6. | Section 6. | |||
| There is a complication for some queuing hardware that consists of | 4.1.1. Fixed Size Packet Buffers | |||
| fixed sized buffers. Each packet fills as many buffers as are | ||||
| necessary leaving remaining space empty in the last buffer. Also, | ||||
| with some hardware, any fixed sized buffers not completely filled by | ||||
| the end of a packet are padded when transmitted to the wire. | ||||
| Taking the extreme for the size of these buffers, a forwarding system | Some, mostly older, queuing hardware sets aside fixed sized buffers | |||
| with both queuing and transmission in MTU-sized units should clearly | in which to store each packet in the queue. Also, with some | |||
| be treated as packet-congestible, because the queue length in packets | hardware, any fixed sized buffers not completely filled by a packet | |||
| would be a good model of congestion of the lower layer link. | are padded when transmitted to the wire. If we imagine a theoretical | |||
| forwarding system with both queuing and transmission in fixed, MTU- | ||||
| sized units, it should clearly be treated as packet-congestible, | ||||
| because the queue length in packets would be a good model of | ||||
| congestion of the lower layer link. | ||||
| A hybrid forwarding system with transmission delay largely dependent | If we now imagine a hybrid forwarding system with transmission delay | |||
| on the byte-size of packets but buffers of one MTU per packet would | largely dependent on the byte-size of packets but buffers of one MTU | |||
| strictly require a more complex algorithm to determine the | per packet, it should strictly require a more complex algorithm to | |||
| probability of congestion. It would have to be treated as two | determine the probability of congestion. It should be treated as two | |||
| resources in sequence, where the sum of the byte-sizes of the packets | resources in sequence, where the sum of the byte-sizes of the packets | |||
| within each packet buffer modelled congestion of the line while the | within each packet buffer models congestion of the line while the | |||
| length of the queue in packets modelled congestion of the buffer. | length of the queue in packets models congestion of the queue. Then | |||
| Then the probability of congesting the forwarding buffer would have | the probability of congesting the forwarding buffer would be a | |||
| to be a conditional probability--conditional on the previously | conditional probability--conditional on the previously calculated | |||
| calculated probability of congesting the line. The sub-MTU-sized | probability of congesting the line. | |||
| fixed buffers described above would require a slightly more complex | ||||
| model to fully determine how best to measure the queue. It would | ||||
| then be necessary to approximate this back to some practical | ||||
| algorithm. | ||||
| Not all congested resources lead to queues. For instance, wireless | However, in systems that use fixed size buffers, it is unusual for | |||
| spectrum is bit-congestible (for a given coding scheme), because | all the buffers used by an interface to be the same size. Typically | |||
| interference increases with the rate at which bits are transmitted. | pools of different sized buffers are provided (Cisco uses the term | |||
| But wireless link protocols do not always maintain a queue that | 'buffer carving' for the process of dividing up memory into these | |||
| depends on spectrum interference. Similarly, power limited resources | pools [IOSArch]). Usually, if the pool of small buffers is | |||
| are also usually bit-congestible if energy is primarily required for | exhausted, arriving small packets can borrow space in the pool of | |||
| transmission rather than header processing, but it is rare for a link | large buffers, but not vice versa. However, it is easier to work out | |||
| protocol to build a queue as it approaches maximum power. | what should be done if we temporarily set aside the possibility of | |||
| such borrowing. Then, with fixed pools of buffers for different | ||||
| sized packets and no borrowing, the size of each pool and the current | ||||
| queue length in each pool would both be measured in packets. So an | ||||
| AQM algorithm would have to maintain the queue length for each pool, | ||||
| and judge whether to drop/mark a packet of a particular size by | ||||
| looking at the pool for packets of that size and using the length (in | ||||
| packets) of its queue. | ||||
| [ECNFixedWireless] proposes a practical and theoretically sound way | We now return to the issue we temporarily set aside: small packets | |||
| to combine congestion notification for different bit-congestible | borrowing space in larger buffers. In this case, the only difference | |||
| resources along an end to end path, whether wireless or wired, and | is that the pools for smaller packets have a the maximum queue size | |||
| whether with or without queues. | that includes all the pools for larger packets. And every time a | |||
| packet takes a larger buffer, the current queue size has to be | ||||
| incremented for all queues in the pools of buffers less than or equal | ||||
| to the buffer size used. | ||||
| We will return to borrowing of fixed sized buffers when we discuss | ||||
| biasing the drop/marling probability of a specific packet because of | ||||
| its size in Section 6.2.1. But here we can give a simple summary of | ||||
| the present discussion on how to measure the length of queues of | ||||
| fixed buffers: no matter how complicated the scheme is, ultimately | ||||
| any fixed buffer systems will need to measure its queue length in | ||||
| packets not bytes. | ||||
| 4.2. Congestion Measurement without a Queue | ||||
| AQM algorithms are nearly always described assuming there is a queue | ||||
| for a congested resource and the algorithm can use the queue length | ||||
| to determine the probability that it will drop or mark each packet. | ||||
| But not all congested resources lead to queues. For instance, | ||||
| wireless spectrum is bit-congestible (for a given coding scheme), | ||||
| because interference increases with the rate at which bits are | ||||
| transmitted. But wireless link protocols do not always maintain a | ||||
| queue that depends on spectrum interference. Similarly, power | ||||
| limited resources are also usually bit-congestible if energy is | ||||
| primarily required for transmission rather than header processing, | ||||
| but it is rare for a link protocol to build a queue as it approaches | ||||
| maximum power. | ||||
| However, AQM algorithms don't require a queue to work. For instance | ||||
| spectrum congestion can be modelled by signal quality using target | ||||
| bit-energy-to-noise-density ratio. And, to model radio power | ||||
| exhaustion, transmission power levels can be measured and compared to | ||||
| the maximum power available. [ECNFixedWireless] proposes a practical | ||||
| and theoretically sound way to combine congestion notification for | ||||
| different bit-congestible resources at different layers along an end | ||||
| to end path, whether wireless or wired, and whether with or without | ||||
| queues. | ||||
| 5. Idealised Wire Protocol Coding | 5. Idealised Wire Protocol Coding | |||
| We will start by inventing an idealised congestion notification | We will start by inventing an idealised congestion notification | |||
| protocol before discussing how to make it practical. The idealised | protocol before discussing how to make it practical. The idealised | |||
| protocol is shown to be correct using examples in Appendix A. | protocol is shown to be correct using examples in Appendix A. | |||
| Congestion notification involves the congested resource coding a | Congestion notification involves the congested resource coding a | |||
| congestion notification signal into the packet stream and the | congestion notification signal into the packet stream and the | |||
| transports decoding it. The idealised protocol uses two different | transports decoding it. The idealised protocol uses two different | |||
| fields in each datagram to signal congestion: one for byte congestion | fields in each datagram to signal congestion: one for byte congestion | |||
| skipping to change at page 10, line 9 | skipping to change at page 15, line 50 | |||
| these two flows into one to show that a flow with mixed packet sizes | these two flows into one to show that a flow with mixed packet sizes | |||
| would still be able to extract sufficient and correct information. | would still be able to extract sufficient and correct information. | |||
| Sufficient and correct congestion information means that there is | Sufficient and correct congestion information means that there is | |||
| sufficient information for the two different types of transport | sufficient information for the two different types of transport | |||
| requirements: | requirements: | |||
| Ratio-based: Established transport congestion controls like TCP's | Ratio-based: Established transport congestion controls like TCP's | |||
| [RFC2581] aim to achieve equal segment rates per RTT through the | [RFC2581] aim to achieve equal segment rates per RTT through the | |||
| same bottleneck--TCP friendliness [RFC3448]. They work with the | same bottleneck--TCP friendliness [RFC3448]. They work with the | |||
| ratio of marked to unmarked segments. The example scenarios show | ratio of dropped to delivered segments (or marked to unmarked | |||
| that these ratio-based transports are effectively the same whether | segments in the case of ECN). The example scenarios show that | |||
| counting in bytes or marks, because the units cancel out. | these ratio-based transports are effectively the same whether | |||
| counting in bytes or packets, because the units cancel out. | ||||
| (Incidentally, this is why TCP's bit rate is still proportional to | (Incidentally, this is why TCP's bit rate is still proportional to | |||
| packet size even when byte-counting is used, as recommended for | packet size even when byte-counting is used, as recommended for | |||
| TCP in [I-D.ietf-tcpm-rfc2581bis], mainly for orthogonal security | TCP in [I-D.ietf-tcpm-rfc2581bis], mainly for orthogonal security | |||
| reasons.) | reasons.) | |||
| Absolute-target-based: Other congestion controls proposed in the | Absolute-target-based: Other congestion controls proposed in the | |||
| research community aim to limit the volume of congestion caused to | research community aim to limit the volume of congestion caused to | |||
| a constant weight parameter. [MulTCP][WindowPropFair] are | a constant weight parameter. [MulTCP][WindowPropFair] are | |||
| examples of weighted proportionally fair transports designed for | examples of weighted proportionally fair transports designed for | |||
| cost-fair environments [Rate_fair_Dis]. In this case, the | cost-fair environments [Rate_fair_Dis]. In this case, the | |||
| transport requires a count (not a ratio) of dropped/marked bytes | transport requires a count (not a ratio) of dropped/marked bytes | |||
| in the bit-congestible case and of dropped/marked packets in the | in the bit-congestible case and of dropped/marked packets in the | |||
| packet congestible case. | packet congestible case. | |||
| 6. The State of the Art | 6. The State of the Art | |||
| The original 1993 paper on RED [RED93] proposed two options for the | The original 1993 paper on RED [RED93] proposed two options for the | |||
| RED active queue management algorithm: packet mode and byte mode. | RED active queue management algorithm: packet mode and byte mode. | |||
| Packet mode measured the queue length in packets and marked (or | Packet mode measured the queue length in packets and dropped (or | |||
| dropped) individual packets with a probability independent of their | marked) individual packets with a probability independent of their | |||
| size. Byte mode measured the queue length in bytes and marked an | size. Byte mode measured the queue length in bytes and marked an | |||
| individual packet with probability in proportion to its size | individual packet with probability in proportion to its size | |||
| (relative to the maximum packet size). In the paper's outline of | (relative to the maximum packet size). In the paper's outline of | |||
| further work, it was stated that no recommendation had been made on | further work, it was stated that no recommendation had been made on | |||
| whether the queue size should be measured in bytes or packets, but | whether the queue size should be measured in bytes or packets, but | |||
| noted that the difference could be significant. | noted that the difference could be significant. | |||
| When RED was recommended for general deployment in 1998 [RFC2309], | When RED was recommended for general deployment in 1998 [RFC2309], | |||
| the two modes were mentioned implying the choice between them was a | the two modes were mentioned implying the choice between them was a | |||
| question of performance, referring to a 1997 email [pktByteEmail] for | question of performance, referring to a 1997 email [pktByteEmail] for | |||
| skipping to change at page 11, line 21 | skipping to change at page 17, line 15 | |||
| buffer is measured in packets, the operator will have set the | buffer is measured in packets, the operator will have set the | |||
| thresholds mindful of a typical mix of packets sizes. Any AQM | thresholds mindful of a typical mix of packets sizes. Any AQM | |||
| algorithm on such a buffer will be oversensitive to high proportions | algorithm on such a buffer will be oversensitive to high proportions | |||
| of small packets, e.g. a DoS attack, and undersensitive to high | of small packets, e.g. a DoS attack, and undersensitive to high | |||
| proportions of large packets. But an operator can safely keep such a | proportions of large packets. But an operator can safely keep such a | |||
| legacy buffer because any undersensitivity during unusual traffic | legacy buffer because any undersensitivity during unusual traffic | |||
| mixes cannot lead to congestion collapse given the buffer will | mixes cannot lead to congestion collapse given the buffer will | |||
| eventually revert to tail drop, discarding proportionately more large | eventually revert to tail drop, discarding proportionately more large | |||
| packets. | packets. | |||
| Some modern router implementations give a choice for setting RED's | Some modern queue implementations give a choice for setting RED's | |||
| thresholds in byte-mode or packet-mode. This may merely be an | thresholds in byte-mode or packet-mode. This may merely be an | |||
| administrator-interface preference, not altering how the queue itself | administrator-interface preference, not altering how the queue itself | |||
| is measured but on some hardware it does actually change the way it | is measured but on some hardware it does actually change the way it | |||
| measures its queue. Whether a resource is bit-congestible or packet- | measures its queue. Whether a resource is bit-congestible or packet- | |||
| congestible is a property of the resource, so an admin SHOULD NOT | congestible is a property of the resource, so an admin SHOULD NOT | |||
| ever need to, or be able to, configure the way a queue measures | ever need to, or be able to, configure the way a queue measures | |||
| itself. | itself. | |||
| We believe the question of whether to measure queues in bytes or | We believe the question of whether to measure queues in bytes or | |||
| packets is fairly well understood these days. The only outstanding | packets is fairly well understood these days. The only outstanding | |||
| issues concern how to measure congestion when the queue is bit | issues concern how to measure congestion when the queue is bit | |||
| congestible but the resource is packet congestible or vice versa (see | congestible but the resource is packet congestible or vice versa (see | |||
| Section 4). | Section 4). But there is no controversy over what should be done. | |||
| It's just you have to be an expert in probability to work out what | ||||
| should be done and, even if you have, it's not always easy to find a | ||||
| practical algorithm to implement it. | ||||
| 6.2. Congestion Coding: Status | 6.2. Congestion Coding: Status | |||
| 6.2.1. Network Bias when Encoding | 6.2.1. Network Bias when Encoding | |||
| The previously mentioned email [pktByteEmail] referred to by | The previously mentioned email [pktByteEmail] referred to by | |||
| [RFC2309] said that the choice over whether a packet's own size | [RFC2309] said that the choice over whether a packet's own size | |||
| should affect its drop probability "depends on the dominant end-to- | should affect its drop probability "depends on the dominant end-to- | |||
| end congestion control mechanisms". [Section 1 argues against this | end congestion control mechanisms". [Section 2 argues against this | |||
| approach, citing the excellent advice in RFC3246.] The referenced | approach, citing the excellent advice in RFC3246.] The referenced | |||
| email went on to argue that drop probability should depend on the | email went on to argue that drop probability should depend on the | |||
| size of the packet being considered for drop if the resource is bit- | size of the packet being considered for drop if the resource is bit- | |||
| congestible, but not if it is packet-congestible, but advised that | congestible, but not if it is packet-congestible, but advised that | |||
| most scarce resources in the Internet were currently bit-congestible. | most scarce resources in the Internet were currently bit-congestible. | |||
| The argument continued that if packet drops were inflated by packet | The argument continued that if packet drops were inflated by packet | |||
| size (byte-mode dropping), "a flow's fraction of the packet drops is | size (byte-mode dropping), "a flow's fraction of the packet drops is | |||
| then a good indication of that flow's fraction of the link bandwidth | then a good indication of that flow's fraction of the link bandwidth | |||
| in bits per second". This was consistent with a referenced policing | in bits per second". This was consistent with a referenced policing | |||
| mechanism being worked on at the time for detecting unusually high | mechanism being worked on at the time for detecting unusually high | |||
| bandwidth flows, eventually published in 1999 [pBox]. [The problem | bandwidth flows, eventually published in 1999 [pBox]. [The problem | |||
| could have been solved by making the policing mechanism count the | could have been solved by making the policing mechanism count the | |||
| volume of bytes randomly dropped, not the number of packets.] | volume of bytes randomly dropped, not the number of packets.] | |||
| A few months before RFC2309 was published, an addendum was added to | A few months before RFC2309 was published, an addendum was added to | |||
| the above archived email referenced from the RFC, in which the final | the above archived email referenced from the RFC, in which the final | |||
| paragraph seemed to partially retract what had previously been said. | paragraph seemed to partially retract what had previously been said. | |||
| It clarified that the question of whether the probability of marking | It clarified that the question of whether the probability of | |||
| a packet should depend on its size was not related to whether the | dropping/marking a packet should depend on its size was not related | |||
| resource itself was bit congestible, but a completely orthogonal | to whether the resource itself was bit congestible, but a completely | |||
| question. However the only example given had the queue measured in | orthogonal question. However the only example given had the queue | |||
| packets but packet drop depended on the byte-size of the packet in | measured in packets but packet drop depended on the byte-size of the | |||
| question. No example was given the other way round. | packet in question. No example was given the other way round. | |||
| In 2000, Cnodder et al [REDbyte] pointed out that there was an error | In 2000, Cnodder et al [REDbyte] pointed out that there was an error | |||
| in the part of the original 1993 RED algorithm that aimed to | in the part of the original 1993 RED algorithm that aimed to | |||
| distribute drops uniformly, because it didn't correctly take into | distribute drops uniformly, because it didn't correctly take into | |||
| account the adjustment for packet size. They recommended an | account the adjustment for packet size. They recommended an | |||
| algorithm called RED_4 to fix this. But they also recommended a | algorithm called RED_4 to fix this. But they also recommended a | |||
| further change, RED_5, to adjust drop rate dependent on the square of | further change, RED_5, to adjust drop rate dependent on the square of | |||
| relative packet size. This was indeed consistent with the stated | relative packet size. This was indeed consistent with one stated | |||
| motivation behind RED's byte mode drop--that we should reverse | motivation behind RED's byte mode drop--that we should reverse | |||
| engineer the network to improve the performance of dominant end-to- | engineer the network to improve the performance of dominant end-to- | |||
| end congestion control mechanisms. | end congestion control mechanisms. | |||
| By 2003, a further change had been made to the adjustment for packet | By 2003, a further change had been made to the adjustment for packet | |||
| size, this time in the RED algorithm of the ns2 simulator. Instead | size, this time in the RED algorithm of the ns2 simulator. Instead | |||
| of taking each packet's size relative to a `maximum packet size' it | of taking each packet's size relative to a `maximum packet size' it | |||
| was taken relative to a `mean packet size', intended to be a static | was taken relative to a `mean packet size', intended to be a static | |||
| value representative of the `typical' packet size on the link. We | value representative of the `typical' packet size on the link. We | |||
| have not been able to find a justification for this change in the | have not been able to find a justification for this change in the | |||
| literature, however Eddy and Allman conducted experiments [REDbias] | literature, however Eddy and Allman conducted experiments [REDbias] | |||
| that assessed how sensitive RED was to this parameter, amongst other | that assessed how sensitive RED was to this parameter, amongst other | |||
| things. No-one seems to have pointed out that this changed algorithm | things. No-one seems to have pointed out that this changed algorithm | |||
| can often lead to drop probabilities of greater than 1 [which should | can often lead to drop probabilities of greater than 1 [which should | |||
| ring alarm bells hinting that there's a mistake in the theory | ring alarm bells hinting that there's a mistake in the theory | |||
| somewhere]. On 10-Nov-2004, this variant of byte-mode packet drop | somewhere]. On 10-Nov-2004, this variant of byte-mode packet drop | |||
| was made the default in the ns2 simulator. | was made the default in the ns2 simulator. | |||
| More recently, two drafts have proposed changes to TCP that make it | The byte-mode drop variant of RED is, of course, not the only | |||
| more robust against losing small control packets | possible bias towards small packets in queueing algorithms. We have | |||
| [I-D.ietf-tcpm-ecnsyn] [I-D.floyd-tcpm-ackcc]. In both cases they | already mentioned that tail-drop queues naturally tend to lock-out | |||
| note that the case for these TCP changes would be weaker if RED were | large packets once they are full. But also queues with fixed sized | |||
| biased against dropping small packets. We argue here that these two | buffers reduce the probability that small packets will be dropped if | |||
| proposals are a safer and more principled way to achieve TCP | (and only if) they allow small packets to borrow buffers from the | |||
| performance improvements than reverse engineering RED to benefit TCP. | pools for larger packets. As was explained in Section 4.1.1 on fixed | |||
| size buffer carving, borrowing effectively makes the maximum queue | ||||
| size for small packets greater than that for large packets, because | ||||
| more buffers can be used by small packets while less will fit large | ||||
| packets. | ||||
| However, in itself, the bias towards small packets caused by buffer | ||||
| borrowing is perfectly correct. Lower drop probability for small | ||||
| packets is legitimate in buffer borrowing schemes, because small | ||||
| packets genuinely congest the machine's buffer memory less than large | ||||
| packets, given they can fit in more spaces. The bias towards small | ||||
| packets is not artificially added (as it is in RED's byte-mode drop | ||||
| algorithm), it merely reflects the reality of the way fixed buffer | ||||
| memory gets congested. Incidentally, the bias towards small packets | ||||
| from buffer borrowing is nothing like as large as that of RED's byte- | ||||
| mode drop. | ||||
| Nonetheless, fixed-buffer memory with tail drop is still prone to | ||||
| lock-out large packets, purely because of the tail-drop aspect. So a | ||||
| good AQM algorithm like RED with packet-mode drop should be used with | ||||
| fixed buffer memories where possible. If RED is too complicated to | ||||
| implement with multiple fixed buffer pools, the minimum necessary to | ||||
| prevent large packet lock-out is to ensure smaller packets never use | ||||
| the last available buffer in any of the pools for larger packets. | ||||
| 6.2.2. Transport Bias when Decoding | 6.2.2. Transport Bias when Decoding | |||
| The above proposals to alter the network layer to fix TCP's | The above proposals to alter the network layer to give a bias towards | |||
| insensitivity to segment size have largely carried on outside the | smaller packets have largely carried on outside the IETF process | |||
| IETF process (unless one counts a reference in an informational RFC | (unless one counts a reference in an informational RFC to an archived | |||
| to an archived email!). | email!). Whereas, within the IETF, there are many different | |||
| proposals to alter transport protocols to achieve the same goals, | ||||
| i.e. either to make the flow bit-rate take account of packet size, or | ||||
| to protect control packets from loss. This memo argues that altering | ||||
| transport protocols is the more principled approach. | ||||
| Within the IETF, a recently approved experimental RFC adapts its | A recently approved experimental RFC adapts its transport layer | |||
| transport layer protocol to take account of packet sizes relative to | protocol to take account of packet sizes relative to typical TCP | |||
| typical TCP packet sizes. This proposes a new small-packet variant | packet sizes. This proposes a new small-packet variant of TCP- | |||
| of TCP-friendly rate control [RFC3448] called TFRC-SP [RFC4828]. | friendly rate control [RFC3448] called TFRC-SP [RFC4828]. | |||
| Essentially, it proposes a rate equation that inflates the flow rate | Essentially, it proposes a rate equation that inflates the flow rate | |||
| by the ratio of a typical TCP segment size (1500B including TCP | by the ratio of a typical TCP segment size (1500B including TCP | |||
| header) over the actual segment size [PktSizeEquCC]. (There are also | header) over the actual segment size [PktSizeEquCC]. (There are also | |||
| other important differences of detail relative to TFRC, such as using | other important differences of detail relative to TFRC, such as using | |||
| virtual packets [CCvarPktSize] to avoid responding to multiple losses | virtual packets [CCvarPktSize] to avoid responding to multiple losses | |||
| per round trip and using a minimum inter-packet interval.) | per round trip and using a minimum inter-packet interval.) | |||
| Section 4.5.1 of this TFRC-SP spec discusses the implications of | Section 4.5.1 of this TFRC-SP spec discusses the implications of | |||
| operating in an environment where routers have been configured to | operating in an environment where queues have been configured to drop | |||
| drop smaller packets with proportionately lower probability than | smaller packets with proportionately lower probability than larger | |||
| larger ones. But surprisingly, it only discusses TCP operating in | ones. But it only discusses TCP operating in such an environment, | |||
| such an environment, only mentioning TFRC-SP briefly when discussing | only mentioning TFRC-SP briefly when discussing how to define | |||
| how to define fairness with TCP. And it only discusses the byte-mode | fairness with TCP. And it only discusses the byte-mode dropping | |||
| dropping version of RED as it was before Cnodder et al pointed out it | version of RED as it was before Cnodder et al pointed out it didn't | |||
| didn't sufficiently bias towards small packets to make TCP | sufficiently bias towards small packets to make TCP independent of | |||
| independent of packet size. | packet size. | |||
| So the TFRC-SP spec doesn't address the issue of which of the network | So the TFRC-SP spec doesn't address the issue of which of the network | |||
| or the transport _should_ handle fairness between different packet | or the transport _should_ handle fairness between different packet | |||
| sizes. In its Appendix B.4 it discusses the possibility of both | sizes. In its Appendix B.4 it discusses the possibility of both | |||
| TFRC-SP and some network buffers duplicating each other's attempts to | TFRC-SP and some network buffers duplicating each other's attempts to | |||
| deliberately bias towards small packets. But the discussion is not | deliberately bias towards small packets. But the discussion is not | |||
| conclusive, instead reporting simulations of many of the | conclusive, instead reporting simulations of many of the | |||
| possibilities in order to assess performance rather than recommending | possibilities in order to assess performance but not recommending any | |||
| any action. | particular course of action. | |||
| The paper originally proposing TFRC with virtual packets (VP-TFRC) | The paper originally proposing TFRC with virtual packets (VP-TFRC) | |||
| [CCvarPktSize] proposed that there should perhaps be two variants to | [CCvarPktSize] proposed that there should perhaps be two variants to | |||
| cater for the different variants of RED. However, as the TFRC-SP | cater for the different variants of RED. However, as the TFRC-SP | |||
| authors point out, there is no way for a transport to know whether | authors point out, there is no way for a transport to know whether | |||
| some queues on its path have deployed RED with byte-mode packet drop | some queues on its path have deployed RED with byte-mode packet drop | |||
| (except if an exhaustive survey found that no-one has deployed it!-- | (except if an exhaustive survey found that no-one has deployed it!-- | |||
| see Section 6.2.3). Incidentally, VP-TFRC also proposed that byte- | see Section 6.2.4). Incidentally, VP-TFRC also proposed that byte- | |||
| mode RED dropping should really square the packet size compensation | mode RED dropping should really square the packet size compensation | |||
| factor (like that of RED_5, but apparently unaware of it). | factor (like that of RED_5, but apparently unaware of it). | |||
| Pre-congestion notification [I-D.ietf-pcn-architecture] is a proposal | Pre-congestion notification [I-D.ietf-pcn-architecture] is a proposal | |||
| to use a virtual queue for AQM marking for packets within one | to use a virtual queue for AQM marking for packets within one | |||
| Diffserv class in order to give early warning prior to any real | Diffserv class in order to give early warning prior to any real | |||
| queuing. The proposed PCN marking algorithms have been designed not | queuing. The proposed PCN marking algorithms have been designed not | |||
| to take account of packet size on routers. Instead the general | to take account of packet size when forwarding through queues. | |||
| principle has been to take account of the sizes of marked packets | Instead the general principle has been to take account of the sizes | |||
| when monitoring the fraction of marking at the edge of the network. | of marked packets when monitoring the fraction of marking at the edge | |||
| of the network. | ||||
| 6.2.3. Congestion Coding: Summary of Status | 6.2.3. Making Transports Robust against Control Packet Losses | |||
| Recently, two drafts have proposed changes to TCP that make it more | ||||
| robust against losing small control packets [I-D.ietf-tcpm-ecnsyn] | ||||
| [I-D.floyd-tcpm-ackcc]. In both cases they note that the case for | ||||
| these TCP changes would be weaker if RED were biased against dropping | ||||
| small packets. We argue here that these two proposals are a safer | ||||
| and more principled way to achieve TCP performance improvements than | ||||
| reverse engineering RED to benefit TCP. | ||||
| Although no proposals exist as far as we know, it would also be | ||||
| possible and perfectly valid to make control packets robust against | ||||
| drop by explicitly requesting a lower drop probability using their | ||||
| Diffserv code point [RFC2474] to request a scheduling class with | ||||
| lower drop. | ||||
| The re-ECN protocol proposal [Re-TCP] is designed so that transports | ||||
| can be made more robust against losing control packets. It gives | ||||
| queues an incentive to optionally give preference against drop to | ||||
| packets with the 'feedback not established' codepoint in the proposed | ||||
| 'extended ECN' field. Senders have incentives to use this codepoint | ||||
| sparingly, but they can use it on control packets to reduce their | ||||
| chance of being dropped. For instance, the proposed modification to | ||||
| TCP for re-ECN uses this codepoint on the SYN and SYN-ACK. | ||||
| Although not brought to the IETF, a simple proposal from Wischik | ||||
| [DupTCP] suggests that the first three packets of every TCP flow | ||||
| should be routinely duplicated after a short delay. It shows that | ||||
| this would greatly improve the chances of short flows completing | ||||
| quickly, but it would hardly increase traffic levels on the Internet, | ||||
| because Internet bytes have always been concentrated in the large | ||||
| flows. It further shows that the performance of many typical | ||||
| applications depends on completion of long serial chains of short | ||||
| messages. It argues that, given most of the value people get from | ||||
| the Internet is concentrated within short flows, this simple | ||||
| expedient would greatly increase the value of the best efforts | ||||
| Internet at minimal cost. | ||||
| 6.2.4. Congestion Coding: Summary of Status | ||||
| +-----------+----------------+-----------------+--------------------+ | +-----------+----------------+-----------------+--------------------+ | |||
| | transport | RED_1 (packet | RED_4 (linear | RED_5 (square byte | | | transport | RED_1 (packet | RED_4 (linear | RED_5 (square byte | | |||
| | cc | mode drop) | byte mode drop) | mode drop) | | | cc | mode drop) | byte mode drop) | mode drop) | | |||
| +-----------+----------------+-----------------+--------------------+ | +-----------+----------------+-----------------+--------------------+ | |||
| | TCP or | s/sqrt(p) | sqrt(s/p) | 1/sqrt(p) | | | TCP or | s/sqrt(p) | sqrt(s/p) | 1/sqrt(p) | | |||
| | TFRC | | | | | | TFRC | | | | | |||
| | TFRC-SP | 1/sqrt(p) | 1/sqrt(sp) | 1/(s.sqrt(p)) | | | TFRC-SP | 1/sqrt(p) | 1/sqrt(sp) | 1/(s.sqrt(p)) | | |||
| +-----------+----------------+-----------------+--------------------+ | +-----------+----------------+-----------------+--------------------+ | |||
| Table 1: Dependence of flow bit-rate per RTT on packet size s and | Table 1: Dependence of flow bit-rate per RTT on packet size s and | |||
| drop rate p when network and/or transport bias towards small packets | drop rate p when network and/or transport bias towards small packets | |||
| to varying degrees | to varying degrees | |||
| Table 1 aims to summarise the positions we may now be in. Each | Table 1 aims to summarise the positions we may now be in. Each | |||
| column shows a different possible AQM behaviour on different routers | column shows a different possible AQM behaviour in different queues | |||
| in the network, using the terminology of Cnodder et al outlined | in the network, using the terminology of Cnodder et al outlined | |||
| earlier (RED_1 is basic RED with packet-mode drop). Each row shows a | earlier (RED_1 is basic RED with packet-mode drop). Each row shows a | |||
| different transport behaviour: TCP [RFC2581] and TFRC [RFC3448] on | different transport behaviour: TCP [RFC2581] and TFRC [RFC3448] on | |||
| the top row with TFRC-SP [RFC4828] below. Suppressing all | the top row with TFRC-SP [RFC4828] below. Suppressing all | |||
| inessential details the table shows that independence from packet | inessential details the table shows that independence from packet | |||
| size should either be achievable by not altering the TCP transport in | size should either be achievable by not altering the TCP transport in | |||
| a RED_5 network, or using the small packet TFRC-SP transport in a | a RED_5 network, or using the small packet TFRC-SP transport in a | |||
| network without any byte-mode dropping RED (top right and bottom | network without any byte-mode dropping RED (top right and bottom | |||
| left). Top left is the `do nothing' scenario, while bottom right is | left). Top left is the `do nothing' scenario, while bottom right is | |||
| the `do-both' scenario in which bit-rate would become far too biased | the `do-both' scenario in which bit-rate would become far too biased | |||
| towards small packets. Of course, if any form of byte-mode dropping | towards small packets. Of course, if any form of byte-mode dropping | |||
| RED has been deployed on a selection of congested routers, each path | RED has been deployed on a selection of congested queues, each path | |||
| will present a different hybrid scenario to its transport. | will present a different hybrid scenario to its transport. | |||
| Whatever, we can see that the linear byte-mode drop column in the | Whatever, we can see that the linear byte-mode drop column in the | |||
| middle considerably complicates the Internet. It's a half-way house | middle considerably complicates the Internet. It's a half-way house | |||
| that doesn't bias enough towards small packets even if one believes | that doesn't bias enough towards small packets even if one believes | |||
| the network should be doing the biasing. We argue below that _all_ | the network should be doing the biasing. We argue below that _all_ | |||
| network layer bias towards small packets should be turned off--if | network layer bias towards small packets should be turned off--if | |||
| indeed any router vendors have implemented it--leaving packet size | indeed any equipment vendors have implemented it--leaving packet size | |||
| bias solely as the preserve of the transport layer (solely the | bias solely as the preserve of the transport layer (solely the | |||
| leftmost, packet-mode drop column). | leftmost, packet-mode drop column). | |||
| A survey has been conducted of 84 vendors to assess how widely drop | A survey has been conducted of 84 vendors to assess how widely drop | |||
| probability based on packet size has been implemented in RED. Prior | probability based on packet size has been implemented in RED. Prior | |||
| to the survey, an individual approach to Cisco received confirmation | to the survey, an individual approach to Cisco received confirmation | |||
| that, having checked the code-base for each of the product ranges, | that, having checked the code-base for each of the product ranges, | |||
| Cisco has not implemented any discrimination based on packet size in | Cisco has not implemented any discrimination based on packet size in | |||
| any AQM algorithm in any of its products. Also an individual | any AQM algorithm in any of its products. Also an individual | |||
| approach to Alcatel-Lucent drew a confirmation that it was very | approach to Alcatel-Lucent drew a confirmation that it was very | |||
| likely that none of their products contained RED code that | likely that none of their products contained RED code that | |||
| implemented any packet-size bias. | implemented any packet-size bias. | |||
| Turning to our more formal survey, about 19% of those surveyed have | Turning to our more formal survey (Table 2), about 19% of those | |||
| replied so far, giving a sample size of 16. Although we do not have | surveyed have replied so far, giving a sample size of 16. Although | |||
| permission to identify the respondents, we can say that those that | we do not have permission to identify the respondents, we can say | |||
| have responded include most of the larger vendors, covering a large | that those that have responded include most of the larger vendors, | |||
| fraction of the market. They range across the large network | covering a large fraction of the market. They range across the large | |||
| equipment vendors at L3 & L2, firewall vendors, wireless equipment | network equipment vendors at L3 & L2, firewall vendors, wireless | |||
| vendors, as well as large software businesses with a small selection | equipment vendors, as well as large software businesses with a small | |||
| of networking products. So far, all those who have responded have | selection of networking products. So far, all those who have | |||
| confirmed that they have not implemented the variant of RED with drop | responded have confirmed that they have not implemented the variant | |||
| dependent on packet size (2 are fairly sure they haven't but need to | of RED with drop dependent on packet size (2 are fairly sure they | |||
| check more thoroughly). | haven't but need to check more thoroughly). | |||
| +-------------------------------+----------------+-----------------+ | ||||
| | Response | No. of vendors | %age of vendors | | ||||
| +-------------------------------+----------------+-----------------+ | ||||
| | Not implemented | 14 | 17% | | ||||
| | Not implemented (probably) | 2 | 2% | | ||||
| | Implemented | 0 | 0% | | ||||
| | No response | 68 | 81% | | ||||
| | Total companies/orgs surveyed | 84 | 100% | | ||||
| +-------------------------------+----------------+-----------------+ | ||||
| Table 2: Vendor Survey on byte-mode drop variant of RED (lower drop | ||||
| probability for small packets) | ||||
| Where reasons have been given, the extra complexity of packet bias | Where reasons have been given, the extra complexity of packet bias | |||
| code has been most prevalent, though one vendor had a more principled | code has been most prevalent, though one vendor had a more principled | |||
| reason for avoiding it--similar to the argument of this document. We | reason for avoiding it--similar to the argument of this document. We | |||
| have established that Linux does not implement RED with packet size | have established that Linux does not implement RED with packet size | |||
| drop bias, although we have not investigated a wider range of open | drop bias, although we have not investigated a wider range of open | |||
| source code. | source code. | |||
| Finally, we repeat that RED's byte mode drop is not the only way to | ||||
| bias towards small packets--tail-drop tends to lock-out large packets | ||||
| very effectively. Our survey was of vendor implementations, so we | ||||
| cannot be certain about operator deployment. But we believe many | ||||
| queues in the Internet are still tail-drop. My own company (BT) has | ||||
| widely deployed RED, but there are bound to be many tail-drop queues, | ||||
| particularly in access network equipment and on middleboxes like | ||||
| firewalls, where RED is not always available. Routers using a memory | ||||
| architecture based on fixed size buffers with borrowing may also | ||||
| still be prevalent in the Internet. As explained in Section 6.2.1, | ||||
| these also provide a marginal (but legitimate) bias towards small | ||||
| packets. So even though RED byte-mode drop is not prevalent, it is | ||||
| likely there is still some bias towards small packets in the Internet | ||||
| due to tail drop and fixed buffer borrowing. | ||||
| 7. Outstanding Issues and Next Steps | 7. Outstanding Issues and Next Steps | |||
| 7.1. Bit-congestible World | 7.1. Bit-congestible World | |||
| For a connectionless network with only bit-congestible resources we | For a connectionless network with only bit-congestible resources we | |||
| believe the recommended position is now unarguably clear--that the | believe the recommended position is now unarguably clear--that the | |||
| network should not make allowance for packet sizes and the transport | network should not make allowance for packet sizes and the transport | |||
| should. This leaves two outstanding issues: | should. This leaves two outstanding issues: | |||
| o How to handle any legacy of AQM with byte-mode drop already | o How to handle any legacy of AQM with byte-mode drop already | |||
| skipping to change at page 15, line 43 | skipping to change at page 24, line 4 | |||
| 7.1. Bit-congestible World | 7.1. Bit-congestible World | |||
| For a connectionless network with only bit-congestible resources we | For a connectionless network with only bit-congestible resources we | |||
| believe the recommended position is now unarguably clear--that the | believe the recommended position is now unarguably clear--that the | |||
| network should not make allowance for packet sizes and the transport | network should not make allowance for packet sizes and the transport | |||
| should. This leaves two outstanding issues: | should. This leaves two outstanding issues: | |||
| o How to handle any legacy of AQM with byte-mode drop already | o How to handle any legacy of AQM with byte-mode drop already | |||
| deployed; | deployed; | |||
| o The need to start a programme to update transport congestion | o The need to start a programme to update transport congestion | |||
| control protocol standards to take account of packet size. | control protocol standards to take account of packet size. | |||
| The sample of returns from our vendor survey Section 6.2.3 suggest | The sample of returns from our vendor survey Section 6.2.4 suggest | |||
| that byte-mode packet drop seems not to be implemented at all let | that byte-mode packet drop seems not to be implemented at all let | |||
| alone deployed, or if it is, it is likely to be very sparse. | alone deployed, or if it is, it is likely to be very sparse. | |||
| Therefore, we do not really need a migration strategy from all but | Therefore, we do not really need a migration strategy from all but | |||
| nothing to nothing. | nothing to nothing. | |||
| A programme of standards updates to take account of packet size in | A programme of standards updates to take account of packet size in | |||
| transport congestion control protocols has started with TFRC-SP | transport congestion control protocols has started with TFRC-SP | |||
| [RFC4828], while weighted TCPs implemented in the research community | [RFC4828], while weighted TCPs implemented in the research community | |||
| [WindowPropFair] could form the basis of a future change to TCP | [WindowPropFair] could form the basis of a future change to TCP | |||
| congestion control [RFC2581] itself. | congestion control [RFC2581] itself. | |||
| skipping to change at page 16, line 46 | skipping to change at page 25, line 5 | |||
| complexity of each look-up and whether the pattern of arrivals is | complexity of each look-up and whether the pattern of arrivals is | |||
| amenable to caching or not. Further, this reminds us that any | amenable to caching or not. Further, this reminds us that any | |||
| solution must not require a forwarding engine to use excessive | solution must not require a forwarding engine to use excessive | |||
| processor cycles in order to decide how to say it has no spare | processor cycles in order to decide how to say it has no spare | |||
| processor cycles. | processor cycles. | |||
| The problem of signalling packet processing congestion is not | The problem of signalling packet processing congestion is not | |||
| pressing, as most if not all Internet resources are designed to be | pressing, as most if not all Internet resources are designed to be | |||
| bit-congestible before packet processing starts to congest. However, | bit-congestible before packet processing starts to congest. However, | |||
| given the IRTF ICCRG has set itself the task of reaching consensus on | given the IRTF ICCRG has set itself the task of reaching consensus on | |||
| generic router mechanisms that are necessary and sufficient to | generic forwarding mechanisms that are necessary and sufficient to | |||
| support the Internet's future congestion control requirements | support the Internet's future congestion control requirements | |||
| [I-D.irtf-iccrg-welzl-congestion-control-open-research], we must not | [I-D.irtf-iccrg-welzl-congestion-control-open-research], we must not | |||
| give this problem no thought at all, just because it is hard and | give this problem no thought at all, just because it is hard and | |||
| currently hypothetical. | currently hypothetical. | |||
| 8. Security Considerations | 8. Security Considerations | |||
| This draft recommends that queues do not bias drop probability | This draft recommends that queues do not bias drop probability | |||
| towards small packets as this creates a perverse incentive for | towards small packets as this creates a perverse incentive for | |||
| transports to break down their flows into tiny segments. One of the | transports to break down their flows into tiny segments. One of the | |||
| skipping to change at page 17, line 47 | skipping to change at page 26, line 9 | |||
| summary, it says that making drop probability depend on the size of | summary, it says that making drop probability depend on the size of | |||
| the packets that bits happen to be divided into simply encourages the | the packets that bits happen to be divided into simply encourages the | |||
| bits to be divided into smaller packets. Byte-mode drop would | bits to be divided into smaller packets. Byte-mode drop would | |||
| therefore irreversibly complicate any attempt to fix the Internet's | therefore irreversibly complicate any attempt to fix the Internet's | |||
| incentive structures. | incentive structures. | |||
| 9. Conclusions | 9. Conclusions | |||
| The strong conclusion is that AQM algorithms such as RED SHOULD NOT | The strong conclusion is that AQM algorithms such as RED SHOULD NOT | |||
| use byte-mode drop. More generally, the Internet's congestion | use byte-mode drop. More generally, the Internet's congestion | |||
| notification protocols (drop and ECN) SHOULD take account of packet | notification protocols (drop, ECN & PCN) SHOULD take account of | |||
| size when the notification is read by the transport layer, NOT when | packet size when the notification is read by the transport layer, NOT | |||
| it is written by the network layer. This approach offers sufficient | when it is written by the network layer. This approach offers | |||
| and correct congestion information for all known and future transport | sufficient and correct congestion information for all known and | |||
| protocols and also ensures no perverse incentives are created that | future transport protocols and also ensures no perverse incentives | |||
| would encourage transports to use inappropriately small packet sizes. | are created that would encourage transports to use inappropriately | |||
| small packet sizes. | ||||
| The alternative of deflating RED's drop probability for smaller | The alternative of deflating RED's drop probability for smaller | |||
| packet sizes (byte-mode drop) has no enduring advantages. It is more | packet sizes (byte-mode drop) has no enduring advantages. It is more | |||
| complex, it creates the perverse incentive to fragment segments into | complex, it creates the perverse incentive to fragment segments into | |||
| tiny pieces and it reopens the vulnerability to foods of small- | tiny pieces and it reopens the vulnerability to floods of small- | |||
| packets that drop-tail queues suffered from and AQM was designed to | packets that drop-tail queues suffered from and AQM was designed to | |||
| remove. Byte-mode drop is a change to the network layer that makes | remove. Byte-mode drop is a change to the network layer that makes | |||
| allowance for an omission from the design of TCP, effectively reverse | allowance for an omission from the design of TCP, effectively reverse | |||
| engineering the network layer to contrive to make two TCPs with | engineering the network layer to contrive to make two TCPs with | |||
| different packet sizes run at equal bit rates (rather than packet | different packet sizes run at equal bit rates (rather than packet | |||
| rates) under the same path conditions. It also improves TCP | rates) under the same path conditions. It also improves TCP | |||
| performance by reducing the chance that a SYN or a pure ACK will be | performance by reducing the chance that a SYN or a pure ACK will be | |||
| dropped, because they are small. But we SHOULD NOT hack the network | dropped, because they are small. But we SHOULD NOT hack the network | |||
| layer to improve or fix certain transport protocols. No matter how | layer to improve or fix certain transport protocols. No matter how | |||
| predominant a transport protocol is (even if it's TCP), trying to | predominant a transport protocol is (even if it's TCP), trying to | |||
| correct for its failings by biasing towards small packets in the | correct for its failings by biasing towards small packets in the | |||
| network layer creates a perverse incentive to break down all flows | network layer creates a perverse incentive to break down all flows | |||
| from all transports into tiny segments. | from all transports into tiny segments. | |||
| So far, our survey of over 100 vendors across the industry has drawn | So far, our survey of 84 vendors across the industry has drawn | |||
| responses from about 19%, none of whom have implemented the byte mode | responses from about 19%, none of whom have implemented the byte mode | |||
| packet drop variant of RED. Given there appears to be little, if | packet drop variant of RED. Given there appears to be little, if | |||
| any, installed base recommending removal of byte-mode drop from RED | any, installed base recommending removal of byte-mode drop from RED | |||
| is possibly only a paper exercise with few, if any, incremental | is possibly only a paper exercise with few, if any, incremental | |||
| deployment issues. | deployment issues. | |||
| If a vendor has implemented byte-mode drop, and an operator has | If a vendor has implemented byte-mode drop, and an operator has | |||
| turned it on, it is strongly RECOMMENDED that it SHOULD be turned | turned it on, it is strongly RECOMMENDED that it SHOULD be turned | |||
| off. Note that RED as a whole SHOULD NOT be turned off, as without | off. Note that RED as a whole SHOULD NOT be turned off, as without | |||
| it, a drop tail queue also biases against large packets. But note | it, a drop tail queue also biases against large packets. But note | |||
| skipping to change at page 19, line 17 | skipping to change at page 27, line 27 | |||
| it can handle a mix of bit-congestible and packet-congestible | it can handle a mix of bit-congestible and packet-congestible | |||
| resources. | resources. | |||
| 10. Acknowledgements | 10. Acknowledgements | |||
| Thank you to Sally Floyd, who gave extensive and useful review | Thank you to Sally Floyd, who gave extensive and useful review | |||
| comments. Also thanks for the reviews from Toby Moncaster and Arnaud | comments. Also thanks for the reviews from Toby Moncaster and Arnaud | |||
| Jacquet. I am grateful to Bruce Davie and his colleagues for | Jacquet. I am grateful to Bruce Davie and his colleagues for | |||
| providing a timely and efficient survey of RED implementation in | providing a timely and efficient survey of RED implementation in | |||
| Cisco's product range. Also grateful thanks to Toby Moncaster, Will | Cisco's product range. Also grateful thanks to Toby Moncaster, Will | |||
| Dormann, John Regnault, Simon Carter and Stefaan De Cnodder further | Dormann, John Regnault, Simon Carter and Stefaan De Cnodder who | |||
| helped survey the current status of RED implementation and deployment | further helped survey the current status of RED implementation and | |||
| and, finally, thanks to the anonymous individuals who responded. | deployment and, finally, thanks to the anonymous individuals who | |||
| responded. | ||||
| 11. Comments Solicited | 11. Comments Solicited | |||
| Comments and questions are encouraged and very welcome. They can be | Comments and questions are encouraged and very welcome. They can be | |||
| addressed to the IETF Transport Area working group mailing list | addressed to the IETF Transport Area working group mailing list | |||
| <tsvwg@ietf.org>, and/or to the authors. | <tsvwg@ietf.org>, and/or to the authors. | |||
| Editorial Comments | Editorial Comments | |||
| [Note_Variation] The algorithm of the byte-mode drop variant of RED | [Note_Variation] The algorithm of the byte-mode drop variant of RED | |||
| skipping to change at page 20, line 20 | skipping to change at page 28, line 30 | |||
| We will consider a 2x2 matrix of four scenarios: | We will consider a 2x2 matrix of four scenarios: | |||
| +-----------------------------+------------------+------------------+ | +-----------------------------+------------------+------------------+ | |||
| | resource type and | A) Equal bit | B) Equal pkt | | | resource type and | A) Equal bit | B) Equal pkt | | |||
| | congestion level | rates | rates | | | congestion level | rates | rates | | |||
| +-----------------------------+------------------+------------------+ | +-----------------------------+------------------+------------------+ | |||
| | i) bit-congestible, p_b | (Ai) | (Bi) | | | i) bit-congestible, p_b | (Ai) | (Bi) | | |||
| | ii) pkt-congestible, p_p | (Aii) | (Bii) | | | ii) pkt-congestible, p_p | (Aii) | (Bii) | | |||
| +-----------------------------+------------------+------------------+ | +-----------------------------+------------------+------------------+ | |||
| Table 2 | Table 3 | |||
| A.2. Bit-congestible resource, equal bit rates (Ai) | A.2. Bit-congestible resource, equal bit rates (Ai) | |||
| Starting with the bit-congestible scenario, for two flows to maintain | Starting with the bit-congestible scenario, for two flows to maintain | |||
| equal bit rates (Ai) the ratio of the packet rates must be the | equal bit rates (Ai) the ratio of the packet rates must be the | |||
| inverse of the ratio of packet sizes: u_2/u_1 = s_1/s_2. So, for | inverse of the ratio of packet sizes: u_2/u_1 = s_1/s_2. So, for | |||
| instance, a flow of 60B packets would have to send 25x more packets | instance, a flow of 60B packets would have to send 25x more packets | |||
| to achieve the same bit rate as a flow of 1500B packets. If a | to achieve the same bit rate as a flow of 1500B packets. If a | |||
| congested resource marks proportion p_b of packets irrespective of | congested resource marks proportion p_b of packets irrespective of | |||
| size, the ratio of marked packets received by each transport will | size, the ratio of marked packets received by each transport will | |||
| skipping to change at page 24, line 8 | skipping to change at page 32, line 23 | |||
| flows, the policer has to have an integrated view of all the | flows, the policer has to have an integrated view of all the | |||
| congestion an individual (not just one flow) has caused due to all | congestion an individual (not just one flow) has caused due to all | |||
| traffic entering the Internet from that individual. This is termed | traffic entering the Internet from that individual. This is termed | |||
| congestion accountability. | congestion accountability. | |||
| But with byte-mode drop, one dropped or marked packet is not | But with byte-mode drop, one dropped or marked packet is not | |||
| necessarily equivalent to another unless you know the MTU that caused | necessarily equivalent to another unless you know the MTU that caused | |||
| it to be dropped/marked. To have an integrated view of a user, we | it to be dropped/marked. To have an integrated view of a user, we | |||
| believe congestion policing has to be located at an individual's | believe congestion policing has to be located at an individual's | |||
| attachment point to the Internet [Re-TCP]. But from there it cannot | attachment point to the Internet [Re-TCP]. But from there it cannot | |||
| know the MTU of each remote router that caused each mark. Therefore | know the MTU of each remote queue that caused each drop/mark. | |||
| it cannot take an integrated approach to policing all the responses | Therefore it cannot take an integrated approach to policing all the | |||
| to congestion of all the transports of one individual. Therefore it | responses to congestion of all the transports of one individual. | |||
| cannot police anything. | Therefore it cannot police anything. | |||
| The security/incentive argument _for_ packet-mode drop is similar. | The security/incentive argument _for_ packet-mode drop is similar. | |||
| Firstly, confining RED to packet-mode drop would not preclude | Firstly, confining RED to packet-mode drop would not preclude | |||
| bottleneck policing approaches such as [pBox] as it seems likely they | bottleneck policing approaches such as [pBox] as it seems likely they | |||
| could work just as well by monitoring the volume of dropped bytes | could work just as well by monitoring the volume of dropped bytes | |||
| rather than packets. Secondly packet-mode marking naturally allows | rather than packets. Secondly packet-mode dropping/marking naturally | |||
| the congestion marking on packets to be globally meaningful without | allows the congestion notification of packets to be globally | |||
| relying on MTU information held elsewhere. | meaningful without relying on MTU information held elsewhere. | |||
| Because we recommend that a marked packet should be taken to mean | Because we recommend that a dropped/marked packet should be taken to | |||
| that all the bytes in the packet are congestion marked, a policer can | mean that all the bytes in the packet are dropped/marked, a policer | |||
| remain robust against bits being re-divided into different size | can remain robust against bits being re-divided into different size | |||
| packets or across different size flows [Rate_fair_Dis]. Therefore | packets or across different size flows [Rate_fair_Dis]. Therefore | |||
| policing would work naturally with just simple packet-mode drop in | policing would work naturally with just simple packet-mode drop in | |||
| RED. | RED. | |||
| In summary, making drop probability depend on the size of the packets | In summary, making drop probability depend on the size of the packets | |||
| that bits happen to be divided into simply encourages the bits to be | that bits happen to be divided into simply encourages the bits to be | |||
| divided into smaller packets. Byte-mode drop would therefore | divided into smaller packets. Byte-mode drop would therefore | |||
| irreversibly complicate any attempt to fix the Internet's incentive | irreversibly complicate any attempt to fix the Internet's incentive | |||
| structures. | structures. | |||
| Changes from Previous Versions | ||||
| To be removed by the RFC Editor on publication. | ||||
| From -00 to -01: | ||||
| Clarified applicability to drop as well as ECN. | ||||
| Highlighted DoS vulnerability. | ||||
| Emphasised that drop-tail suffers from similar problems to | ||||
| byte-mode drop, so only byte-mode drop should be turned off, | ||||
| not RED itself. | ||||
| Clarified the original apparent motivations for recommending | ||||
| byte-mode drop included protecting SYNs and pure ACKs more than | ||||
| equalising the bit rates of TCPs with different segment sizes. | ||||
| Removed some conjectured motivations. | ||||
| Added support for updates to TCP in progress (ackcc & ecn-syn- | ||||
| ack). | ||||
| Updated survey results with newly arrived data. | ||||
| Pulled all recommendations together into the conclusions. | ||||
| Moved some detailed points into two additional appendices and a | ||||
| note. | ||||
| Considerable clarifications throughout. | ||||
| Updated references | ||||
| 12. References | 12. References | |||
| 12.1. Normative References | 12.1. Normative References | |||
| [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | |||
| Requirement Levels", BCP 14, RFC 2119, March 1997. | Requirement Levels", BCP 14, RFC 2119, March 1997. | |||
| [RFC2309] Braden, B., Clark, D., Crowcroft, J., Davie, B., Deering, | [RFC2309] Braden, B., Clark, D., Crowcroft, J., Davie, B., Deering, | |||
| S., Estrin, D., Floyd, S., Jacobson, V., Minshall, G., | S., Estrin, D., Floyd, S., Jacobson, V., Minshall, G., | |||
| Partridge, C., Peterson, L., Ramakrishnan, K., Shenker, | Partridge, C., Peterson, L., Ramakrishnan, K., Shenker, | |||
| S., Wroclawski, J., and L. Zhang, "Recommendations on | S., Wroclawski, J., and L. Zhang, "Recommendations on | |||
| Queue Management and Congestion Avoidance in the | Queue Management and Congestion Avoidance in the | |||
| Internet", RFC 2309, April 1998. | Internet", RFC 2309, April 1998. | |||
| [RFC2474] Nichols, K., Blake, S., Baker, F., and D. Black, | ||||
| "Definition of the Differentiated Services Field (DS | ||||
| Field) in the IPv4 and IPv6 Headers", RFC 2474, | ||||
| December 1998. | ||||
| [RFC2581] Allman, M., Paxson, V., and W. Stevens, "TCP Congestion | [RFC2581] Allman, M., Paxson, V., and W. Stevens, "TCP Congestion | |||
| Control", RFC 2581, April 1999. | Control", RFC 2581, April 1999. | |||
| [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition | [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition | |||
| of Explicit Congestion Notification (ECN) to IP", | of Explicit Congestion Notification (ECN) to IP", | |||
| RFC 3168, September 2001. | RFC 3168, September 2001. | |||
| [RFC3426] Floyd, S., "General Architectural and Policy | [RFC3426] Floyd, S., "General Architectural and Policy | |||
| Considerations", RFC 3426, November 2002. | Considerations", RFC 3426, November 2002. | |||
| [RFC3448] Handley, M., Floyd, S., Padhye, J., and J. Widmer, "TCP | [RFC3448] Handley, M., Floyd, S., Padhye, J., and J. Widmer, "TCP | |||
| Friendly Rate Control (TFRC): Protocol Specification", | Friendly Rate Control (TFRC): Protocol Specification", | |||
| RFC 3448, January 2003. | RFC 3448, January 2003. | |||
| [RFC4828] Floyd, S. and E. Kohler, "TCP Friendly Rate Control | [RFC4828] Floyd, S. and E. Kohler, "TCP Friendly Rate Control | |||
| (TFRC): The Small-Packet (SP) Variant", RFC 4828, | (TFRC): The Small-Packet (SP) Variant", RFC 4828, | |||
| April 2007. | April 2007. | |||
| [RFC5033] Floyd, S. and M. Allman, "Specifying New Congestion | ||||
| Control Algorithms", BCP 133, RFC 5033, August 2007. | ||||
| 12.2. Informative References | 12.2. Informative References | |||
| [CCvarPktSize] | [CCvarPktSize] | |||
| Widmer, J., Boutremans, C., and J-Y. Le Boudec, | Widmer, J., Boutremans, C., and J-Y. Le Boudec, | |||
| "Congestion Control for Flows with Variable Packet Size", | "Congestion Control for Flows with Variable Packet Size", | |||
| ACM CCR 34(2) 137--151, 2004, | ACM CCR 34(2) 137--151, 2004, | |||
| <http://doi.acm.org/10.1145/997150.997162>. | <http://doi.acm.org/10.1145/997150.997162>. | |||
| [DupTCP] Wischik, D., "Short messages", Royal Society workshop on | ||||
| networks: modelling and control , September 2007, <http:// | ||||
| www.cs.ucl.ac.uk/staff/ucacdjw/Research/shortmsg.html>. | ||||
| [ECNFixedWireless] | [ECNFixedWireless] | |||
| Siris, V., "Resource Control for Elastic Traffic in CDMA | Siris, V., "Resource Control for Elastic Traffic in CDMA | |||
| Networks", Proc. ACM MOBICOM'02 , September 2002, <http:// | Networks", Proc. ACM MOBICOM'02 , September 2002, <http:// | |||
| www.ics.forth.gr/netlab/publications/ | www.ics.forth.gr/netlab/publications/ | |||
| resource_control_elastic_cdma.html>. | resource_control_elastic_cdma.html>. | |||
| [Evol_cc] Gibbens, R. and F. Kelly, "Resource pricing and the | [Evol_cc] Gibbens, R. and F. Kelly, "Resource pricing and the | |||
| evolution of congestion control", Automatica 35(12)1969-- | evolution of congestion control", Automatica 35(12)1969-- | |||
| 1985, December 1999, | 1985, December 1999, | |||
| <http://www.statslab.cam.ac.uk/~frank/evol.html>. | <http://www.statslab.cam.ac.uk/~frank/evol.html>. | |||
| skipping to change at page 26, line 36 | skipping to change at page 34, line 29 | |||
| (XCP)", draft-falk-xcp-spec-03 (work in progress), | (XCP)", draft-falk-xcp-spec-03 (work in progress), | |||
| July 2007. | July 2007. | |||
| [I-D.floyd-tcpm-ackcc] | [I-D.floyd-tcpm-ackcc] | |||
| Floyd, S. and I. Property, "Adding Acknowledgement | Floyd, S. and I. Property, "Adding Acknowledgement | |||
| Congestion Control to TCP", draft-floyd-tcpm-ackcc-02 | Congestion Control to TCP", draft-floyd-tcpm-ackcc-02 | |||
| (work in progress), November 2007. | (work in progress), November 2007. | |||
| [I-D.ietf-pcn-architecture] | [I-D.ietf-pcn-architecture] | |||
| Eardley, P., "Pre-Congestion Notification Architecture", | Eardley, P., "Pre-Congestion Notification Architecture", | |||
| draft-ietf-pcn-architecture-01 (work in progress), | draft-ietf-pcn-architecture-03 (work in progress), | |||
| October 2007. | February 2008. | |||
| [I-D.ietf-tcpm-ecnsyn] | [I-D.ietf-tcpm-ecnsyn] | |||
| Floyd, S. and I. Property, "Adding Explicit Congestion | Floyd, S., "Adding Explicit Congestion Notification (ECN) | |||
| Notification (ECN) Capability to TCP's SYN/ACK Packets", | Capability to TCP's SYN/ACK Packets", | |||
| draft-ietf-tcpm-ecnsyn-03 (work in progress), | draft-ietf-tcpm-ecnsyn-05 (work in progress), | |||
| November 2007. | February 2008. | |||
| [I-D.ietf-tcpm-rfc2581bis] | [I-D.ietf-tcpm-rfc2581bis] | |||
| Allman, M., "TCP Congestion Control", | Allman, M., "TCP Congestion Control", | |||
| draft-ietf-tcpm-rfc2581bis-03 (work in progress), | draft-ietf-tcpm-rfc2581bis-03 (work in progress), | |||
| September 2007. | September 2007. | |||
| [I-D.irtf-iccrg-welzl-congestion-control-open-research] | [I-D.irtf-iccrg-welzl-congestion-control-open-research] | |||
| Papadimitriou, D., "Open Research Issues in Internet | Papadimitriou, D., "Open Research Issues in Internet | |||
| Congestion Control", | Congestion Control", | |||
| draft-irtf-iccrg-welzl-congestion-control-open-research-00 | ||||
| (work in progress), July 2007. | (work in progress), July 2007. | |||
| [IOSArch] Bollapragada, V., White, R., and C. Murphy, "Inside Cisco | ||||
| IOS Software Architecture", Cisco Press: CCIE Professional | ||||
| Development ISBN13: 978-1-57870-181-0, July 2000. | ||||
| [MulTCP] Crowcroft, J. and Ph. Oechslin, "Differentiated End to End | [MulTCP] Crowcroft, J. and Ph. Oechslin, "Differentiated End to End | |||
| Internet Services using a Weighted Proportional Fair | Internet Services using a Weighted Proportional Fair | |||
| Sharing TCP", CCR 28(3) 53--69, July 1998, <http:// | Sharing TCP", CCR 28(3) 53--69, July 1998, <http:// | |||
| www.cs.ucl.ac.uk/staff/J.Crowcroft/hipparch/pricing.html>. | www.cs.ucl.ac.uk/staff/J.Crowcroft/hipparch/pricing.html>. | |||
| [PCNcharter] | [PCNcharter] | |||
| IETF, "Congestion and Pre-Congestion Notification (pcn)", | IETF, "Congestion and Pre-Congestion Notification (pcn)", | |||
| IETF w-g charter , Feb 2007, | IETF w-g charter , Feb 2007, | |||
| <http://www.ietf.org/html.charters/pcn-charter.html>. | <http://www.ietf.org/html.charters/pcn-charter.html>. | |||
| skipping to change at page 27, line 49 | skipping to change at page 35, line 48 | |||
| March 2004. | March 2004. | |||
| [RFC4782] Floyd, S., Allman, M., Jain, A., and P. Sarolahti, "Quick- | [RFC4782] Floyd, S., Allman, M., Jain, A., and P. Sarolahti, "Quick- | |||
| Start for TCP and IP", RFC 4782, January 2007. | Start for TCP and IP", RFC 4782, January 2007. | |||
| [Rate_fair_Dis] | [Rate_fair_Dis] | |||
| Briscoe, B., "Flow Rate Fairness: Dismantling a Religion", | Briscoe, B., "Flow Rate Fairness: Dismantling a Religion", | |||
| ACM CCR 37(2)63--74, April 2007, | ACM CCR 37(2)63--74, April 2007, | |||
| <http://portal.acm.org/citation.cfm?id=1232926>. | <http://portal.acm.org/citation.cfm?id=1232926>. | |||
| [Re-TCP] Briscoe, B., Jacquet, A., Salvatori, A., Koyabi, M., and | [Re-TCP] Briscoe, B., Jacquet, A., Moncaster, T., and A. Smith, | |||
| T. Moncaster, "Re-ECN: Adding Accountability for Causing | "Re-ECN: Adding Accountability for Causing Congestion to | |||
| Congestion to TCP/IP", draft-briscoe-tsvwg-re-ecn-tcp-04 | TCP/IP", draft-briscoe-tsvwg-re-ecn-tcp-05 (work in | |||
| (work in progress), July 2007. | progress), January 2008. | |||
| [WindowPropFair] | [WindowPropFair] | |||
| Siris, V., "Service Differentiation and Performance of | Siris, V., "Service Differentiation and Performance of | |||
| Weighted Window-Based Congestion Control and Packet | Weighted Window-Based Congestion Control and Packet | |||
| Marking Algorithms in ECN Networks", Computer | Marking Algorithms in ECN Networks", Computer | |||
| Communications 26(4) 314--326, 2002, <http:// | Communications 26(4) 314--326, 2002, <http:// | |||
| www.ics.forth.gr/netgroup/publications/ | www.ics.forth.gr/netgroup/publications/ | |||
| weighted_window_control.html>. | weighted_window_control.html>. | |||
| [gentle_RED] | [gentle_RED] | |||
| skipping to change at page 29, line 7 | skipping to change at page 37, line 7 | |||
| Martlesham Heath | Martlesham Heath | |||
| Ipswich IP5 3RE | Ipswich IP5 3RE | |||
| UK | UK | |||
| Phone: +44 1473 645196 | Phone: +44 1473 645196 | |||
| Email: bob.briscoe@bt.com | Email: bob.briscoe@bt.com | |||
| URI: http://www.cs.ucl.ac.uk/staff/B.Briscoe/ | URI: http://www.cs.ucl.ac.uk/staff/B.Briscoe/ | |||
| Full Copyright Statement | Full Copyright Statement | |||
| Copyright (C) The IETF Trust (2007). | Copyright (C) The IETF Trust (2008). | |||
| This document is subject to the rights, licenses and restrictions | This document is subject to the rights, licenses and restrictions | |||
| contained in BCP 78, and except as set forth therein, the authors | contained in BCP 78, and except as set forth therein, the authors | |||
| retain all their rights. | retain all their rights. | |||
| This document and the information contained herein are provided on an | This document and the information contained herein are provided on an | |||
| "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS | "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS | |||
| OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND | OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND | |||
| THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS | THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS | |||
| OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF | OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF | |||
| End of changes. 67 change blocks. | ||||
| 295 lines changed or deleted | 615 lines changed or added | |||
This html diff was produced by rfcdiff 1.34. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ | ||||