| < draft-ietf-tsvwg-aqm-dualq-coupled-24.txt | draft-ietf-tsvwg-aqm-dualq-coupled-25h.txt > | |||
|---|---|---|---|---|
| Transport Area working group (tsvwg) K. De Schepper | Transport Area working group (tsvwg) K. De Schepper | |||
| Internet-Draft Nokia Bell Labs | Internet-Draft Nokia Bell Labs | |||
| Intended status: Experimental B. Briscoe, Ed. | Intended status: Experimental B. Briscoe, Ed. | |||
| Expires: 8 January 2023 Independent | Expires: 1 March 2023 Independent | |||
| G. White | G. White | |||
| CableLabs | CableLabs | |||
| 7 July 2022 | 28 August 2022 | |||
| DualQ Coupled AQMs for Low Latency, Low Loss and Scalable Throughput | DualQ Coupled AQMs for Low Latency, Low Loss and Scalable Throughput | |||
| (L4S) | (L4S) | |||
| draft-ietf-tsvwg-aqm-dualq-coupled-24 | draft-ietf-tsvwg-aqm-dualq-coupled-25 | |||
| Abstract | Abstract | |||
| This specification defines a framework for coupling the Active Queue | This specification defines a framework for coupling the Active Queue | |||
| Management (AQM) algorithms in two queues intended for flows with | Management (AQM) algorithms in two queues intended for flows with | |||
| different responses to congestion. This provides a way for the | different responses to congestion. This provides a way for the | |||
| Internet to transition from the scaling problems of standard TCP | Internet to transition from the scaling problems of standard TCP | |||
| Reno-friendly ('Classic') congestion controls to the family of | Reno-friendly ('Classic') congestion controls to the family of | |||
| 'Scalable' congestion controls. These are designed for consistently | 'Scalable' congestion controls. These are designed for consistently | |||
| very Low queuing Latency, very Low congestion Loss and Scaling of | very Low queuing Latency, very Low congestion Loss and Scaling of | |||
| per-flow throughput (L4S) by using Explicit Congestion Notification | per-flow throughput (L4S) by using Explicit Congestion Notification | |||
| (ECN) in a modified way. Until the Coupled DualQ, these L4S senders | (ECN) in a modified way. Until the Coupled DualQ, these scalable L4S | |||
| could only be deployed where a clean-slate environment could be | congestion controls could only be deployed where a clean-slate | |||
| arranged, such as in private data centres. The coupling acts like a | environment could be arranged, such as in private data centres. | |||
| semi-permeable membrane: isolating the sub-millisecond average | ||||
| queuing delay and zero congestion loss of L4S from Classic latency | The specification first explains how a Coupled DualQ works. It then | |||
| and loss; but pooling the capacity between any combination of | gives the normative requirements that are necessary for it to work | |||
| Scalable and Classic flows with roughly equivalent throughput per | well. All this is independent of which two AQMs are used, but | |||
| flow. The DualQ achieves this indirectly, without having to inspect | pseudocode examples of specific AQMs are given in appendices. | |||
| transport layer flow identifiers and without compromising the | ||||
| performance of the Classic traffic, relative to a single queue. The | ||||
| DualQ design has low complexity and requires no configuration for the | ||||
| public Internet. | ||||
| Status of This Memo | Status of This Memo | |||
| This Internet-Draft is submitted in full conformance with the | This Internet-Draft is submitted in full conformance with the | |||
| provisions of BCP 78 and BCP 79. | provisions of BCP 78 and BCP 79. | |||
| Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
| Task Force (IETF). Note that other groups may also distribute | Task Force (IETF). Note that other groups may also distribute | |||
| working documents as Internet-Drafts. The list of current Internet- | working documents as Internet-Drafts. The list of current Internet- | |||
| Drafts is at https://datatracker.ietf.org/drafts/current/. | Drafts is at https://datatracker.ietf.org/drafts/current/. | |||
| Internet-Drafts are draft documents valid for a maximum of six months | Internet-Drafts are draft documents valid for a maximum of six months | |||
| and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
| time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
| material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
| This Internet-Draft will expire on 8 January 2023. | This Internet-Draft will expire on 1 March 2023. | |||
| Copyright Notice | Copyright Notice | |||
| Copyright (c) 2022 IETF Trust and the persons identified as the | Copyright (c) 2022 IETF Trust and the persons identified as the | |||
| document authors. All rights reserved. | document authors. All rights reserved. | |||
| This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
| Provisions Relating to IETF Documents (https://trustee.ietf.org/ | Provisions Relating to IETF Documents (https://trustee.ietf.org/ | |||
| license-info) in effect on the date of publication of this document. | license-info) in effect on the date of publication of this document. | |||
| Please review these documents carefully, as they describe your rights | Please review these documents carefully, as they describe your rights | |||
| and restrictions with respect to this document. Code Components | and restrictions with respect to this document. Code Components | |||
| extracted from this document must include Revised BSD License text as | extracted from this document must include Revised BSD License text as | |||
| described in Section 4.e of the Trust Legal Provisions and are | described in Section 4.e of the Trust Legal Provisions and are | |||
| provided without warranty as described in the Revised BSD License. | provided without warranty as described in the Revised BSD License. | |||
| Table of Contents | Table of Contents | |||
| 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 | |||
| 1.1. Outline of the Problem . . . . . . . . . . . . . . . . . 3 | 1.1. Outline of the Problem . . . . . . . . . . . . . . . . . 3 | |||
| 1.2. Scope . . . . . . . . . . . . . . . . . . . . . . . . . . 6 | 1.2. Context, Scope & Applicability . . . . . . . . . . . . . 6 | |||
| 1.3. Terminology . . . . . . . . . . . . . . . . . . . . . . . 7 | 1.3. Terminology . . . . . . . . . . . . . . . . . . . . . . . 7 | |||
| 1.4. Features . . . . . . . . . . . . . . . . . . . . . . . . 9 | 1.4. Features . . . . . . . . . . . . . . . . . . . . . . . . 9 | |||
| 2. DualQ Coupled AQM . . . . . . . . . . . . . . . . . . . . . . 11 | 2. DualQ Coupled AQM . . . . . . . . . . . . . . . . . . . . . . 11 | |||
| 2.1. Coupled AQM . . . . . . . . . . . . . . . . . . . . . . . 11 | 2.1. Coupled AQM . . . . . . . . . . . . . . . . . . . . . . . 11 | |||
| 2.2. Dual Queue . . . . . . . . . . . . . . . . . . . . . . . 13 | 2.2. Dual Queue . . . . . . . . . . . . . . . . . . . . . . . 12 | |||
| 2.3. Traffic Classification . . . . . . . . . . . . . . . . . 13 | 2.3. Traffic Classification . . . . . . . . . . . . . . . . . 12 | |||
| 2.4. Overall DualQ Coupled AQM Structure . . . . . . . . . . . 14 | 2.4. Overall DualQ Coupled AQM Structure . . . . . . . . . . . 13 | |||
| 2.5. Normative Requirements for a DualQ Coupled AQM . . . . . 17 | 2.5. Normative Requirements for a DualQ Coupled AQM . . . . . 17 | |||
| 2.5.1. Functional Requirements . . . . . . . . . . . . . . . 17 | 2.5.1. Functional Requirements . . . . . . . . . . . . . . . 17 | |||
| 2.5.1.1. Requirements in Unexpected Cases . . . . . . . . 18 | 2.5.1.1. Requirements in Unexpected Cases . . . . . . . . 18 | |||
| 2.5.2. Management Requirements . . . . . . . . . . . . . . . 19 | 2.5.2. Management Requirements . . . . . . . . . . . . . . . 19 | |||
| 2.5.2.1. Configuration . . . . . . . . . . . . . . . . . . 20 | 2.5.2.1. Configuration . . . . . . . . . . . . . . . . . . 19 | |||
| 2.5.2.2. Monitoring . . . . . . . . . . . . . . . . . . . 21 | 2.5.2.2. Monitoring . . . . . . . . . . . . . . . . . . . 21 | |||
| 2.5.2.3. Anomaly Detection . . . . . . . . . . . . . . . . 22 | 2.5.2.3. Anomaly Detection . . . . . . . . . . . . . . . . 22 | |||
| 2.5.2.4. Deployment, Coexistence and Scaling . . . . . . . 22 | 2.5.2.4. Deployment, Coexistence and Scaling . . . . . . . 22 | |||
| 3. IANA Considerations (to be removed by RFC Editor) . . . . . . 22 | 3. IANA Considerations (to be removed by RFC Editor) . . . . . . 22 | |||
| 4. Security Considerations . . . . . . . . . . . . . . . . . . . 22 | 4. Security Considerations . . . . . . . . . . . . . . . . . . . 22 | |||
| 4.1. Low Delay without Requiring Per-Flow Processing . . . . . 22 | 4.1. Low Delay without Requiring Per-Flow Processing . . . . . 22 | |||
| 4.2. Handling Unresponsive Flows and Overload . . . . . . . . 23 | 4.2. Handling Unresponsive Flows and Overload . . . . . . . . 23 | |||
| 4.2.1. Unresponsive Traffic without Overload . . . . . . . . 24 | 4.2.1. Unresponsive Traffic without Overload . . . . . . . . 24 | |||
| 4.2.2. Avoiding Short-Term Classic Starvation: Sacrifice L4S | 4.2.2. Avoiding Short-Term Classic Starvation: Sacrifice L4S | |||
| Throughput or Delay? . . . . . . . . . . . . . . . . 25 | Throughput or Delay? . . . . . . . . . . . . . . . . 25 | |||
| skipping to change at page 3, line 4 ¶ | skipping to change at page 2, line 46 ¶ | |||
| 2.5.2.2. Monitoring . . . . . . . . . . . . . . . . . . . 21 | 2.5.2.2. Monitoring . . . . . . . . . . . . . . . . . . . 21 | |||
| 2.5.2.3. Anomaly Detection . . . . . . . . . . . . . . . . 22 | 2.5.2.3. Anomaly Detection . . . . . . . . . . . . . . . . 22 | |||
| 2.5.2.4. Deployment, Coexistence and Scaling . . . . . . . 22 | 2.5.2.4. Deployment, Coexistence and Scaling . . . . . . . 22 | |||
| 3. IANA Considerations (to be removed by RFC Editor) . . . . . . 22 | 3. IANA Considerations (to be removed by RFC Editor) . . . . . . 22 | |||
| 4. Security Considerations . . . . . . . . . . . . . . . . . . . 22 | 4. Security Considerations . . . . . . . . . . . . . . . . . . . 22 | |||
| 4.1. Low Delay without Requiring Per-Flow Processing . . . . . 22 | 4.1. Low Delay without Requiring Per-Flow Processing . . . . . 22 | |||
| 4.2. Handling Unresponsive Flows and Overload . . . . . . . . 23 | 4.2. Handling Unresponsive Flows and Overload . . . . . . . . 23 | |||
| 4.2.1. Unresponsive Traffic without Overload . . . . . . . . 24 | 4.2.1. Unresponsive Traffic without Overload . . . . . . . . 24 | |||
| 4.2.2. Avoiding Short-Term Classic Starvation: Sacrifice L4S | 4.2.2. Avoiding Short-Term Classic Starvation: Sacrifice L4S | |||
| Throughput or Delay? . . . . . . . . . . . . . . . . 25 | Throughput or Delay? . . . . . . . . . . . . . . . . 25 | |||
| 4.2.3. L4S ECN Saturation: Introduce Drop or Delay? . . . . 26 | 4.2.3. L4S ECN Saturation: Introduce Drop or Delay? . . . . 26 | |||
| 4.2.3.1. Protecting against Overload by Unresponsive | 4.2.3.1. Protecting against Overload by Unresponsive | |||
| ECN-Capable Traffic . . . . . . . . . . . . . . . . 28 | ECN-Capable Traffic . . . . . . . . . . . . . . . . 28 | |||
| 5. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 28 | 5. References . . . . . . . . . . . . . . . . . . . . . . . . . 28 | |||
| 6. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 29 | 5.1. Normative References . . . . . . . . . . . . . . . . . . 28 | |||
| 7. References . . . . . . . . . . . . . . . . . . . . . . . . . 29 | 5.2. Informative References . . . . . . . . . . . . . . . . . 29 | |||
| 7.1. Normative References . . . . . . . . . . . . . . . . . . 29 | ||||
| 7.2. Informative References . . . . . . . . . . . . . . . . . 30 | ||||
| Appendix A. Example DualQ Coupled PI2 Algorithm . . . . . . . . 35 | Appendix A. Example DualQ Coupled PI2 Algorithm . . . . . . . . 35 | |||
| A.1. Pass #1: Core Concepts . . . . . . . . . . . . . . . . . 36 | A.1. Pass #1: Core Concepts . . . . . . . . . . . . . . . . . 35 | |||
| A.2. Pass #2: Edge-Case Details . . . . . . . . . . . . . . . 47 | A.2. Pass #2: Edge-Case Details . . . . . . . . . . . . . . . 46 | |||
| Appendix B. Example DualQ Coupled Curvy RED Algorithm . . . . . 52 | Appendix B. Example DualQ Coupled Curvy RED Algorithm . . . . . 51 | |||
| B.1. Curvy RED in Pseudocode . . . . . . . . . . . . . . . . . 52 | B.1. Curvy RED in Pseudocode . . . . . . . . . . . . . . . . . 51 | |||
| B.2. Efficient Implementation of Curvy RED . . . . . . . . . . 58 | B.2. Efficient Implementation of Curvy RED . . . . . . . . . . 57 | |||
| Appendix C. Choice of Coupling Factor, k . . . . . . . . . . . . 60 | Appendix C. Choice of Coupling Factor, k . . . . . . . . . . . . 59 | |||
| C.1. RTT-Dependence . . . . . . . . . . . . . . . . . . . . . 60 | C.1. RTT-Dependence . . . . . . . . . . . . . . . . . . . . . 59 | |||
| C.2. Guidance on Controlling Throughput Equivalence . . . . . 61 | C.2. Guidance on Controlling Throughput Equivalence . . . . . 60 | |||
| Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . 64 | ||||
| Contributors . . . . . . . . . . . . . . . . . . . . . . . . . . 64 | ||||
| Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 65 | Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 65 | |||
| 1. Introduction | 1. Introduction | |||
| This document specifies a framework for DualQ Coupled AQMs, which is | This document specifies a framework for DualQ Coupled AQMs, which can | |||
| the network part of the L4S architecture [I-D.ietf-tsvwg-l4s-arch]. | serve as the network part of the L4S | |||
| L4S enables both very low queuing latency (sub-millisecond on | architecture [I-D.ietf-tsvwg-l4s-arch]. A Coupled DualQ AQM consists | |||
| average) and high throughput at the same time, for ad hoc numbers of | of two queues; L4S and Classic. The L4S queue is intended for | |||
| capacity-seeking applications all sharing the same capacity. | Scalable congestion controls that can maintain very low queuing | |||
| latency (sub-millisecond on average) and high throughput at the same | ||||
| time. The Coupled DualQ acts like a semi-permeable membrane: the L4S | ||||
| queue isolates the sub-millisecond average queuing delay of L4S from | ||||
| Classic latency; while the coupling between the queues pools the | ||||
| capacity between both queues so that ad hoc numbers of capacity- | ||||
| seeking applications all sharing the same capacity can have roughly | ||||
| equivalent throughput per flow, whichever queue they use. The DualQ | ||||
| achieves this indirectly, without having to inspect transport layer | ||||
| flow identifiers and without compromising the performance of the | ||||
| Classic traffic, relative to a single queue. The DualQ design has | ||||
| low complexity and requires no configuration for the public Internet. | ||||
| 1.1. Outline of the Problem | 1.1. Outline of the Problem | |||
| Latency is becoming the critical performance factor for many (most?) | Latency is becoming the critical performance factor for many (most?) | |||
| applications on the public Internet, e.g. interactive Web, Web | applications on the public Internet, e.g. interactive Web, Web | |||
| services, voice, conversational video, interactive video, interactive | services, voice, conversational video, interactive video, interactive | |||
| remote presence, instant messaging, online gaming, remote desktop, | remote presence, instant messaging, online gaming, remote desktop, | |||
| cloud-based applications, and video-assisted remote control of | cloud-based applications, and video-assisted remote control of | |||
| machinery and industrial processes. In the developed world, further | machinery and industrial processes. Once access network bit rates | |||
| increases in access network bit-rate offer diminishing returns, | reach levels now common in the developed world, further increases | |||
| whereas latency is still a multi-faceted problem. In the last decade | offer diminishing returns unless latency is also addressed | |||
| or so, much has been done to reduce propagation time by placing | [Dukkipati06]. In the last decade or so, much has been done to | |||
| caches or servers closer to users. However, queuing remains a major | reduce propagation time by placing caches or servers closer to users. | |||
| intermittent component of latency. | However, queuing remains a major intermittent component of latency. | |||
| Traditionally very low latency has only been available for a few | Traditionally very low latency has only been available for a few | |||
| selected low rate applications, that confine their sending rate | selected low rate applications, that confine their sending rate | |||
| within a specially carved-off portion of capacity, which is | within a specially carved-off portion of capacity, which is | |||
| prioritized over other traffic, e.g. Diffserv EF [RFC3246]. Up to | prioritized over other traffic, e.g. Diffserv EF [RFC3246]. Up to | |||
| now it has not been possible to allow any number of low latency, high | now it has not been possible to allow any number of low latency, high | |||
| throughput applications to seek to fully utilize available capacity, | throughput applications to seek to fully utilize available capacity, | |||
| because the capacity-seeking process itself causes too much queuing | because the capacity-seeking process itself causes too much queuing | |||
| delay. | delay. | |||
| skipping to change at page 4, line 37 ¶ | skipping to change at page 4, line 33 ¶ | |||
| to induce an average queue that roughly doubles the base RTT, | to induce an average queue that roughly doubles the base RTT, | |||
| adding 5-15 ms of queuing on average (cf. 500 microseconds with | adding 5-15 ms of queuing on average (cf. 500 microseconds with | |||
| L4S for the same mix of long-running and web traffic). However, | L4S for the same mix of long-running and web traffic). However, | |||
| for many applications low delay is not useful unless it is | for many applications low delay is not useful unless it is | |||
| consistently low. With these AQMs, 99th percentile queuing delay | consistently low. With these AQMs, 99th percentile queuing delay | |||
| is 20-30 ms (cf. 2 ms with the same traffic over L4S). | is 20-30 ms (cf. 2 ms with the same traffic over L4S). | |||
| * Similarly, recent research into using e2e congestion control | * Similarly, recent research into using e2e congestion control | |||
| without needing an AQM in the network (e.g. BBR | without needing an AQM in the network (e.g. BBR | |||
| [I-D.cardwell-iccrg-bbr-congestion-control]) seems to have hit a | [I-D.cardwell-iccrg-bbr-congestion-control]) seems to have hit a | |||
| similar lower limit to queuing delay of about 20ms on average but | similar lower limit to queuing delay of about 20ms on average, but | |||
| there are also regular 25ms delay spikes due to bandwidth probes | there are also regular 25ms delay spikes due to bandwidth probes | |||
| and 60ms spikes due to flow-starts. | and 60ms spikes due to flow-starts. | |||
| L4S learns from the experience of Data Center TCP [RFC8257], which | L4S learns from the experience of Data Center TCP [RFC8257], which | |||
| shows the power of complementary changes both in the network and on | shows the power of complementary changes both in the network and on | |||
| end-systems. DCTCP teaches us that two small but radical changes to | end-systems. DCTCP teaches us that two small but radical changes to | |||
| congestion control are needed to cut the two major outstanding causes | congestion control are needed to cut the two major outstanding causes | |||
| of queuing delay variability: | of queuing delay variability: | |||
| 1. Far smaller rate variations (sawteeth) than Reno-friendly | 1. Far smaller rate variations (sawteeth) than Reno-friendly | |||
| skipping to change at page 5, line 27 ¶ | skipping to change at page 5, line 23 ¶ | |||
| with ECN, not drop, for the signalling: | with ECN, not drop, for the signalling: | |||
| 1. The smaller sawteeth allow an extremely shallow ECN packet- | 1. The smaller sawteeth allow an extremely shallow ECN packet- | |||
| marking threshold in the queue. | marking threshold in the queue. | |||
| 2. And no smoothing in the network means that every fluctuation of | 2. And no smoothing in the network means that every fluctuation of | |||
| the queue is signalled immediately. | the queue is signalled immediately. | |||
| Without ECN, either of these would lead to very high loss levels. | Without ECN, either of these would lead to very high loss levels. | |||
| But, with ECN, the resulting high marking levels are just signals, | But, with ECN, the resulting high marking levels are just signals, | |||
| not impairments. BBRv2 combines the best of both worlds - it works | not impairments. (Note that BBRv2 [BBRv2] combines the best of both | |||
| as a scalable congestion control when ECN is available, but also aims | worlds - it works as a scalable congestion control when ECN is | |||
| to minimize delay when it isn't. | available, but also aims to minimize delay when it isn't.) | |||
| However, until now, Scalable congestion controls (like DCTCP) did not | However, until now, Scalable congestion controls (like DCTCP) did not | |||
| co-exist well in a shared ECN-capable queue with existing ECN-capable | co-exist well in a shared ECN-capable queue with existing Classic | |||
| TCP Reno [RFC5681] or Cubic [RFC8312] congestion controls -- Scalable | (e.g. Reno [RFC5681] or Cubic [RFC8312]) congestion controls -- | |||
| controls are so aggressive that these 'Classic' algorithms would | Scalable controls are so aggressive that these 'Classic' algorithms | |||
| drive themselves to a small capacity share. Therefore, until now, | would drive themselves to a small capacity share. Therefore, until | |||
| L4S controls could only be deployed where a clean-slate environment | now, L4S controls could only be deployed where a clean-slate | |||
| could be arranged, such as in private data centres (hence the name | environment could be arranged, such as in private data centres (hence | |||
| DCTCP). | the name DCTCP). | |||
| This document specifies a `DualQ Coupled AQM' extension that solves | One way to solve the problem of coexistence between Scalable and | |||
| the problem of coexistence between Scalable and Classic flows, | Classic flows is to use a per-flow-queuing approach such as FQ- | |||
| without having to inspect flow identifiers. It is not like flow- | CoDel [RFC8290]. It classifies packets by flow identifier into | |||
| queuing approaches [RFC8290] that classify packets by flow identifier | separate queues in order to isolate sparse flows from the higher | |||
| into separate queues in order to isolate sparse flows from the higher | latency in the queues assigned to heavier flows. However, if a | |||
| latency in the queues assigned to heavier flows. If a flow needs | Classic flow needs both low delay and high throughput, having a queue | |||
| both low delay and high throughput, having a queue to itself does not | to itself does not isolate it from the harm it causes to itself. | |||
| isolate it from the harm it causes to itself. In contrast, DualQ | Also FQ approaches need to inspect flow identifiers, which is not | |||
| Coupled AQMs address the root cause of the latency problem -- they | always practical. | |||
| are an enabler for the smooth low latency scalable behaviour of | ||||
| Scalable congestion controls, so that every packet in every flow can | ||||
| potentially enjoy very low latency, then there would be no need to | ||||
| isolate each flow into a separate queue. | ||||
| 1.2. Scope | In summary, Scalable congestion controls address the root cause of | |||
| the latency, loss and scaling problems with Classic congestion | ||||
| controls. Both FQ and DualQ AQMs can be enablers for this smooth low | ||||
| latency scalable behaviour. The DualQ approach is particularly | ||||
| useful because identifying flows is sometimes not practical or | ||||
| desirable. | ||||
| 1.2. Context, Scope & Applicability | ||||
| L4S involves complementary changes in the network and on end-systems: | L4S involves complementary changes in the network and on end-systems: | |||
| Network: A DualQ Coupled AQM (defined in the present document) or a | Network: A DualQ Coupled AQM (defined in the present document) or a | |||
| modification to flow-queue AQMs (described in section 4.2.b of the | modification to flow-queue AQMs (described in section 4.2.b of the | |||
| L4S architecture [I-D.ietf-tsvwg-l4s-arch]); | L4S architecture [I-D.ietf-tsvwg-l4s-arch]); | |||
| End-system: A Scalable congestion control (defined in section 4 of | End-system: A Scalable congestion control (defined in section 4 of | |||
| the L4S ECN protocol [I-D.ietf-tsvwg-ecn-l4s-id]). | the L4S ECN protocol [I-D.ietf-tsvwg-ecn-l4s-id]). | |||
| skipping to change at page 6, line 43 ¶ | skipping to change at page 6, line 43 ¶ | |||
| intervention, applications can exploit this new network capability as | intervention, applications can exploit this new network capability as | |||
| their operating systems migrate to Scalable congestion controls, | their operating systems migrate to Scalable congestion controls, | |||
| which can then evolve _while_ their benefits are being enjoyed by | which can then evolve _while_ their benefits are being enjoyed by | |||
| everyone on the Internet. | everyone on the Internet. | |||
| The DualQ Coupled AQM framework can incorporate any AQM designed for | The DualQ Coupled AQM framework can incorporate any AQM designed for | |||
| a single queue that generates a statistical or deterministic mark/ | a single queue that generates a statistical or deterministic mark/ | |||
| drop probability driven by the queue dynamics. Pseudocode examples | drop probability driven by the queue dynamics. Pseudocode examples | |||
| of two different DualQ Coupled AQMs are given in the appendices. In | of two different DualQ Coupled AQMs are given in the appendices. In | |||
| many cases the framework simplifies the basic control algorithm, and | many cases the framework simplifies the basic control algorithm, and | |||
| requires little extra processing. Therefore it is believed the | requires little extra processing. Therefore, it is believed the | |||
| Coupled AQM would be applicable and easy to deploy in all types of | Coupled AQM would be applicable and easy to deploy in all types of | |||
| buffers; buffers in cost-reduced mass-market residential equipment; | buffers; buffers in cost-reduced mass-market residential equipment; | |||
| buffers in end-system stacks; buffers in carrier-scale equipment | buffers in end-system stacks; buffers in carrier-scale equipment | |||
| including remote access servers, routers, firewalls and Ethernet | including remote access servers, routers, firewalls and Ethernet | |||
| switches; buffers in network interface cards, buffers in virtualized | switches; buffers in network interface cards, buffers in virtualized | |||
| network appliances, hypervisors, and so on. | network appliances, hypervisors, and so on. | |||
| For the public Internet, nearly all the benefit will typically be | For the public Internet, nearly all the benefit will typically be | |||
| achieved by deploying the Coupled AQM into either end of the access | achieved by deploying the Coupled AQM into either end of the access | |||
| link between a 'site' and the Internet, which is invariably the | link between a 'site' and the Internet, which is invariably the | |||
| skipping to change at page 7, line 45 ¶ | skipping to change at page 7, line 45 ¶ | |||
| The main results have been validated independently when using the | The main results have been validated independently when using the | |||
| Prague congestion control [Boru20] (experiments are run using Prague | Prague congestion control [Boru20] (experiments are run using Prague | |||
| and DCTCP, but only the former are relevant for validation, because | and DCTCP, but only the former are relevant for validation, because | |||
| Prague fixes a number of problems with the Linux DCTCP code that make | Prague fixes a number of problems with the Linux DCTCP code that make | |||
| it unsuitable for the public Internet). | it unsuitable for the public Internet). | |||
| 1.3. Terminology | 1.3. Terminology | |||
| The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | |||
| "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this | "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this | |||
| document are to be interpreted as described in [RFC2119] when, and | document are to be interpreted as described in [RFC2119] [RFC8174] | |||
| only when, they appear in all capitals, as shown here. | when, and only when, they appear in all capitals, as shown here. | |||
| The DualQ Coupled AQM uses two queues for two services. Each of the | The DualQ Coupled AQM uses two queues for two services. Each of the | |||
| following terms identifies both the service and the queue that | following terms identifies both the service and the queue that | |||
| provides the service: | provides the service: | |||
| Classic service/queue: The Classic service is intended for all the | Classic service/queue: The Classic service is intended for all the | |||
| congestion control behaviours that co-exist with Reno [RFC5681] | congestion control behaviours that co-exist with Reno [RFC5681] | |||
| (e.g. Reno itself, Cubic [RFC8312], TFRC [RFC5348]). | (e.g. Reno itself, Cubic [RFC8312], TFRC [RFC5348]). | |||
| Low-Latency, Low-Loss Scalable throughput (L4S) service/queue: The | Low-Latency, Low-Loss Scalable throughput (L4S) service/queue: The | |||
| 'L4S' service is intended for traffic from scalable congestion | 'L4S' service is intended for traffic from scalable congestion | |||
| control algorithms, such as TCP Prague | control algorithms, such as TCP Prague | |||
| [I-D.briscoe-iccrg-prague-congestion-control], which was derived | [I-D.briscoe-iccrg-prague-congestion-control], which was derived | |||
| from Data Center TCP [RFC8257]. The L4S service is for more | from Data Center TCP [RFC8257]. The L4S service is for more | |||
| general traffic than just TCP Prague -- it allows the set of | general traffic than just TCP Prague -- it allows the set of | |||
| congestion controls with similar scaling properties to Prague to | congestion controls with similar scaling properties to Prague to | |||
| evolve, such as the examples listed earlier (Relentless, SCReAM, | evolve, such as the examples of Scalable congestion controls | |||
| etc.). | listed below (Relentless, SCReAM, etc.). | |||
| Classic Congestion Control: A congestion control behaviour that can | Classic Congestion Control: A congestion control behaviour that can | |||
| co-exist with standard TCP Reno [RFC5681] without causing | co-exist with standard TCP Reno [RFC5681] without causing | |||
| significantly negative impact on its flow rate [RFC5033]. With | significantly negative impact on its flow rate [RFC5033]. With | |||
| Classic congestion controls, such as Reno or Cubic, because flow | Classic congestion controls, such as Reno or Cubic, because flow | |||
| rate has scaled since TCP congestion control was first designed in | rate has scaled since TCP congestion control was first designed in | |||
| 1988, it now takes hundreds of round trips (and growing) to | 1988, it now takes hundreds of round trips (and growing) to | |||
| recover after a congestion signal (whether a loss or an ECN mark) | recover after a congestion signal (whether a loss or an ECN mark) | |||
| as shown in the examples in section 5.1 of the L4S | as shown in the examples in section 5.1 of the L4S | |||
| architecture [I-D.ietf-tsvwg-l4s-arch] and in [RFC3649]. | architecture [I-D.ietf-tsvwg-l4s-arch] and in [RFC3649]. | |||
| Therefore control of queuing and utilization becomes very slack, | Therefore, control of queuing and utilization becomes very slack, | |||
| and the slightest disturbances (e.g. from new flows starting) | and the slightest disturbances (e.g. from new flows starting) | |||
| prevent a high rate from being attained. | prevent a high rate from being attained. | |||
| Scalable Congestion Control: A congestion control where the average | Scalable Congestion Control: A congestion control where the average | |||
| time from one congestion signal to the next (the recovery time) | time from one congestion signal to the next (the recovery time) | |||
| remains invariant as the flow rate scales, all other factors being | remains invariant as the flow rate scales, all other factors being | |||
| equal. This maintains the same degree of control over queueing | equal. This maintains the same degree of control over queueing | |||
| and utilization whatever the flow rate, as well as ensuring that | and utilization whatever the flow rate, as well as ensuring that | |||
| high throughput is robust to disturbances. For instance, DCTCP | high throughput is robust to disturbances. For instance, DCTCP | |||
| averages 2 congestion signals per round-trip whatever the flow | averages 2 congestion signals per round-trip whatever the flow | |||
| rate, as do other recently developed scalable congestion controls, | rate, as do other recently developed scalable congestion controls, | |||
| e.g. Relentless TCP [Mathis09], TCP Prague | e.g. Relentless TCP [I-D.mathis-iccrg-relentless-tcp], TCP Prague | |||
| [I-D.briscoe-iccrg-prague-congestion-control], [PragueLinux], | [I-D.briscoe-iccrg-prague-congestion-control], [PragueLinux], | |||
| BBRv2 [BBRv2], [I-D.cardwell-iccrg-bbr-congestion-control] and the | BBRv2 [BBRv2], [I-D.cardwell-iccrg-bbr-congestion-control] and the | |||
| L4S variant of SCREAM for real-time media [SCReAM], [RFC8298]). | L4S variant of SCREAM for real-time media [SCReAM], [RFC8298]). | |||
| For the public Internet a Scalable transport has to comply with | For the public Internet a Scalable transport has to comply with | |||
| the requirements in Section 4 of [I-D.ietf-tsvwg-ecn-l4s-id] | the requirements in Section 4 of [I-D.ietf-tsvwg-ecn-l4s-id] | |||
| (aka. the 'Prague L4S requirements'). | (aka. the 'Prague L4S requirements'). | |||
| C: Abbreviation for Classic, e.g. when used as a subscript. | C: Abbreviation for Classic, e.g. when used as a subscript. | |||
| L: Abbreviation for L4S, e.g. when used as a subscript. | L: Abbreviation for L4S, e.g. when used as a subscript. | |||
| The terms Classic or L4S can also qualify other nouns, such as | The terms Classic or L4S can also qualify other nouns, such as | |||
| 'codepoint', 'identifier', 'classification', 'packet', 'flow'. | 'codepoint', 'identifier', 'classification', 'packet', 'flow'. | |||
| For example: an L4S packet means a packet with an L4S identifier | For example: an L4S packet means a packet with an L4S identifier | |||
| sent from an L4S congestion control. | sent from an L4S congestion control. | |||
| Both Classic and L4S services can cope with a proportion of | Both Classic and L4S services can cope with a proportion of | |||
| unresponsive or less-responsive traffic as well, but in the L4S | unresponsive or less-responsive traffic as well, but in the L4S | |||
| case its rate has to be smooth enough or low enough not to build a | case its rate has to be smooth enough or low enough not to build a | |||
| queue (e.g. DNS, VoIP, game sync datagrams, etc). The DualQ | queue (e.g. DNS, VoIP, game sync datagrams, etc.). The DualQ | |||
| Coupled AQM behaviour is defined to be similar to a single FIFO | Coupled AQM behaviour is defined to be similar to a single FIFO | |||
| queue with respect to unresponsive and overload traffic. | queue with respect to unresponsive and overload traffic. | |||
| Reno-friendly: The subset of Classic traffic that is friendly to the | Reno-friendly: The subset of Classic traffic that is friendly to the | |||
| standard Reno congestion control defined for TCP in [RFC5681]. | standard Reno congestion control defined for TCP in [RFC5681]. | |||
| Reno-friendly is used in place of 'TCP-friendly', given the latter | Reno-friendly is used in place of 'TCP-friendly', given the latter | |||
| has become imprecise, because the TCP protocol is now used with so | has become imprecise, because the TCP protocol is now used with so | |||
| many different congestion control behaviours, and Reno is used in | many different congestion control behaviours, and Reno is used in | |||
| non-TCP transports such as QUIC. | non-TCP transports such as QUIC. | |||
| skipping to change at page 9, line 40 ¶ | skipping to change at page 9, line 40 ¶ | |||
| ECN field are unchanged from those defined in [RFC3168]: Not ECT, | ECN field are unchanged from those defined in [RFC3168]: Not ECT, | |||
| ECT(0), ECT(1) and CE, where ECT stands for ECN-Capable Transport | ECT(0), ECT(1) and CE, where ECT stands for ECN-Capable Transport | |||
| and CE stands for Congestion Experienced. A packet marked with | and CE stands for Congestion Experienced. A packet marked with | |||
| the CE codepoint is termed 'ECN-marked' or sometimes just 'marked' | the CE codepoint is termed 'ECN-marked' or sometimes just 'marked' | |||
| where the context makes ECN obvious. | where the context makes ECN obvious. | |||
| 1.4. Features | 1.4. Features | |||
| The AQM couples marking and/or dropping from the Classic queue to the | The AQM couples marking and/or dropping from the Classic queue to the | |||
| L4S queue in such a way that a flow will get roughly the same | L4S queue in such a way that a flow will get roughly the same | |||
| throughput whichever it uses. Therefore both queues can feed into | throughput whichever it uses. Therefore, both queues can feed into | |||
| the full capacity of a link and no rates need to be configured for | the full capacity of a link and no rates need to be configured for | |||
| the queues. The L4S queue enables Scalable congestion controls like | the queues. The L4S queue enables Scalable congestion controls like | |||
| DCTCP or TCP Prague to give very low and predictably low latency, | DCTCP or TCP Prague to give very low and predictably low latency, | |||
| without compromising the performance of competing 'Classic' Internet | without compromising the performance of competing 'Classic' Internet | |||
| traffic. | traffic. | |||
| Thousands of tests have been conducted in a typical fixed residential | Thousands of tests have been conducted in a typical fixed residential | |||
| broadband setting. Experiments used a range of base round trip | broadband setting. Experiments used a range of base round trip | |||
| delays up to 100ms and link rates up to 200 Mb/s between the data | delays up to 100ms and link rates up to 200 Mb/s between the data | |||
| centre and home network, with varying amounts of background traffic | centre and home network, with varying amounts of background traffic | |||
| in both queues. For every L4S packet, the AQM kept the average | in both queues. For every L4S packet, the AQM kept the average | |||
| queuing delay below 1ms (or 2 packets where serialization delay | queuing delay below 1ms (or 2 packets where serialization delay | |||
| exceeded 1ms on slower links), with 99th percentile no worse than | exceeded 1ms on slower links), with 99th percentile no worse than | |||
| 2ms. No losses at all were introduced by the L4S AQM. Details of | 2ms. No losses at all were introduced by the L4S AQM. Details of | |||
| the extensive experiments are available [DualPI2Linux], [PI2], | the extensive experiments are available [DualPI2Linux], [PI2], | |||
| [DCttH19]. | [DCttH19]. Subjective testing using very demanding high bandwidth | |||
| low latency applications over a single shared access link is also | ||||
| described in [L4Sdemo16] and summarized in the section about | ||||
| applications in the L4S architecture [I-D.ietf-tsvwg-l4s-arch] . | ||||
| In all these experiments, the host was connected to the home network | In all these experiments, the host was connected to the home network | |||
| by fixed Ethernet, in order to quantify the queuing delay that can be | by fixed Ethernet, in order to quantify the queuing delay that can be | |||
| achieved by a user who cares about delay. It should be emphasized | achieved by a user who cares about delay. It should be emphasized | |||
| that L4S support at the bottleneck link cannot 'undelay' bursts | that L4S support at the bottleneck link cannot 'undelay' bursts | |||
| introduced by another link on the path, for instance by legacy WiFi | introduced by another link on the path, for instance by legacy Wi-Fi | |||
| equipment. However, if L4S support is added to the queue feeding the | equipment. However, if L4S support is added to the queue feeding the | |||
| _outgoing_ WAN link of a home gateway, it would be counterproductive | _outgoing_ WAN link of a home gateway, it would be counterproductive | |||
| not to also reduce the burstiness of the _incoming_ WiFi. Also, | not to also reduce the burstiness of the _incoming_ Wi-Fi. Also, | |||
| trials of WiFi equipment with an L4S DualQ Coupled AQM on the | trials of Wi-Fi equipment with an L4S DualQ Coupled AQM on the | |||
| _outgoing_ WiFi interface are in progress, and early results of an | _outgoing_ Wi-Fi interface are in progress, and early results of an | |||
| L4S DualQ Coupled AQM in a 5G radio access network testbed with | L4S DualQ Coupled AQM in a 5G radio access network testbed with | |||
| emulated outdoor cell edge radio fading are given in [L4S_5G]. | emulated outdoor cell edge radio fading are given in [L4S_5G]. | |||
| Subjective testing has also been conducted by multiple people all | ||||
| simultaneously using very demanding high bandwidth low latency | ||||
| applications over a single shared access link [L4Sdemo16]. In one | ||||
| application, each user could use finger gestures to pan or zoom their | ||||
| own high definition (HD) sub-window of a larger video scene generated | ||||
| on the fly in 'the cloud' from a football match. Another user | ||||
| wearing VR goggles was remotely receiving a feed from a 360-degree | ||||
| camera in a racing car, again with the sub-window in their field of | ||||
| vision generated on the fly in 'the cloud' dependent on their head | ||||
| movements. Even though other users were also downloading large | ||||
| amounts of L4S and Classic data, playing a gaming benchmark and | ||||
| watchings videos over the same 40Mb/s downstream broadband link, | ||||
| latency was so low that the football picture appeared to stick to the | ||||
| user's finger on the touch pad and the experience fed from the remote | ||||
| camera did not noticeably lag head movements. All the L4S data (even | ||||
| including the downloads) achieved the same very low latency. With an | ||||
| alternative AQM, the video noticeably lagged behind the finger | ||||
| gestures and head movements. | ||||
| Unlike Diffserv Expedited Forwarding, the L4S queue does not have to | Unlike Diffserv Expedited Forwarding, the L4S queue does not have to | |||
| be limited to a small proportion of the link capacity in order to | be limited to a small proportion of the link capacity in order to | |||
| achieve low delay. The L4S queue can be filled with a heavy load of | achieve low delay. The L4S queue can be filled with a heavy load of | |||
| capacity-seeking flows (TCP Prague etc.) and still achieve low delay. | capacity-seeking flows (TCP Prague etc.) and still achieve low delay. | |||
| The L4S queue does not rely on the presence of other traffic in the | The L4S queue does not rely on the presence of other traffic in the | |||
| Classic queue that can be 'overtaken'. It gives low latency to L4S | Classic queue that can be 'overtaken'. It gives low latency to L4S | |||
| traffic whether or not there is Classic traffic. The tail latency of | traffic whether or not there is Classic traffic. The tail latency of | |||
| traffic served by the Classic AQM is sometimes a little better | traffic served by the Classic AQM is sometimes a little better | |||
| sometimes a little worse, when a proportion of the traffic is L4S. | sometimes a little worse, when a proportion of the traffic is L4S. | |||
| skipping to change at page 13, line 8 ¶ | skipping to change at page 12, line 33 ¶ | |||
| the form: | the form: | |||
| p_C = ( p_CL / k )^2 (1) | p_C = ( p_CL / k )^2 (1) | |||
| where k is the constant of proportionality, which is termed the | where k is the constant of proportionality, which is termed the | |||
| coupling factor. | coupling factor. | |||
| 2.2. Dual Queue | 2.2. Dual Queue | |||
| Classic traffic needs to build a large queue to prevent under- | Classic traffic needs to build a large queue to prevent under- | |||
| utilization. Therefore a separate queue is provided for L4S traffic, | utilization. Therefore, a separate queue is provided for L4S | |||
| and it is scheduled with priority over the Classic queue. Priority | traffic, and it is scheduled with priority over the Classic queue. | |||
| is conditional to prevent starvation of Classic traffic in certain | Priority is conditional to prevent starvation of Classic traffic in | |||
| conditions (see Section 2.4). | certain conditions (see Section 2.4). | |||
| Nonetheless, coupled marking ensures that giving priority to L4S | Nonetheless, coupled marking ensures that giving priority to L4S | |||
| traffic still leaves the right amount of spare scheduling time for | traffic still leaves the right amount of spare scheduling time for | |||
| Classic flows to each get equivalent throughput to DCTCP flows (all | Classic flows to each get equivalent throughput to DCTCP flows (all | |||
| other factors such as RTT being equal). | other factors such as RTT being equal). | |||
| 2.3. Traffic Classification | 2.3. Traffic Classification | |||
| Both the Coupled AQM and DualQ mechanisms need an identifier to | Both the Coupled AQM and DualQ mechanisms need an identifier to | |||
| distinguish L4S (L) and Classic (C) packets. Then the coupling | distinguish L4S (L) and Classic (C) packets. Then the coupling | |||
| skipping to change at page 15, line 10 ¶ | skipping to change at page 14, line 34 ¶ | |||
| p_L = max(p'_L, p_CL), (4) | p_L = max(p'_L, p_CL), (4) | |||
| which has also been found to work very well in practice. | which has also been found to work very well in practice. | |||
| The two transformations of p' in equations (2) and (3) implement the | The two transformations of p' in equations (2) and (3) implement the | |||
| required coupling given in equation (1) earlier. | required coupling given in equation (1) earlier. | |||
| The constant of proportionality or coupling factor, k, in equation | The constant of proportionality or coupling factor, k, in equation | |||
| (1) determines the ratio between the congestion probabilities (loss | (1) determines the ratio between the congestion probabilities (loss | |||
| or marking) experienced by L4S and Classic traffic. Thus k | or marking) experienced by L4S and Classic traffic. Thus, k | |||
| indirectly determines the ratio between L4S and Classic flow rates, | indirectly determines the ratio between L4S and Classic flow rates, | |||
| because flows (assuming they are responsive) adjust their rate in | because flows (assuming they are responsive) adjust their rate in | |||
| response to congestion probability. Appendix C.2 gives guidance on | response to congestion probability. Appendix C.2 gives guidance on | |||
| the choice of k and its effect on relative flow rates. | the choice of k and its effect on relative flow rates. | |||
| _________ | _________ | |||
| | | ,------. | | | ,------. | |||
| L4S (L) queue | |===>| ECN | | L4S (L) queue | |===>| ECN | | |||
| ,'| _______|_| |marker|\ | ,'| _______|_| |marker|\ | |||
| <' | | `------'\\ | <' | | `------'\\ | |||
| skipping to change at page 16, line 7 ¶ | skipping to change at page 15, line 43 ¶ | |||
| forwards their packets to the link. Even though the scheduler gives | forwards their packets to the link. Even though the scheduler gives | |||
| priority to the L queue, it is not as strong as the coupling from the | priority to the L queue, it is not as strong as the coupling from the | |||
| C queue. This is because, as the C queue grows, the base AQM applies | C queue. This is because, as the C queue grows, the base AQM applies | |||
| more congestion signals to L traffic (as well as C). As L flows | more congestion signals to L traffic (as well as C). As L flows | |||
| reduce their rate in response, they use less than the scheduling | reduce their rate in response, they use less than the scheduling | |||
| share for L traffic. So, because the scheduler is work preserving, | share for L traffic. So, because the scheduler is work preserving, | |||
| it schedules any C traffic in the gaps. | it schedules any C traffic in the gaps. | |||
| Giving priority to the L queue has the benefit of very low L queue | Giving priority to the L queue has the benefit of very low L queue | |||
| delay, because the L queue is kept empty whenever L traffic is | delay, because the L queue is kept empty whenever L traffic is | |||
| controlled by the coupling. Also there only has to be a coupling in | controlled by the coupling. Also, there only has to be a coupling in | |||
| one direction - from Classic to L4S. Priority has to be conditional | one direction - from Classic to L4S. Priority has to be conditional | |||
| in some way to prevent the C queue being starved in the short-term | in some way to prevent the C queue being starved in the short-term | |||
| (see Section 4.2.2) to give C traffic a means to push in, as | (see Section 4.2.2) to give C traffic a means to push in, as | |||
| explained next. With normal responsive L traffic, the coupled ECN | explained next. With normal responsive L traffic, the coupled ECN | |||
| marking gives C traffic the ability to push back against even strict | marking gives C traffic the ability to push back against even strict | |||
| priority, by congestion marking the L traffic to make it yield some | priority, by congestion marking the L traffic to make it yield some | |||
| space. However, if there is just a small finite set of C packets | space. However, if there is just a small finite set of C packets | |||
| (e.g. a DNS request or an initial window of data) some Classic AQMs | (e.g. a DNS request or an initial window of data) some Classic AQMs | |||
| will not induce enough ECN marking in the L queue, no matter how long | will not induce enough ECN marking in the L queue, no matter how long | |||
| the small set of C packets waits. Then, if the L queue happens to | the small set of C packets waits. Then, if the L queue happens to | |||
| skipping to change at page 16, line 40 ¶ | skipping to change at page 16, line 27 ¶ | |||
| DualPI2 uses a Proportional-Integral (PI) controller as the Base AQM. | DualPI2 uses a Proportional-Integral (PI) controller as the Base AQM. | |||
| Indeed, this Base AQM with just the squared output and no L4S queue | Indeed, this Base AQM with just the squared output and no L4S queue | |||
| can be used as a drop-in replacement for PIE [RFC8033], in which case | can be used as a drop-in replacement for PIE [RFC8033], in which case | |||
| it is just called PI2 [PI2]. PI2 is a principled simplification of | it is just called PI2 [PI2]. PI2 is a principled simplification of | |||
| PIE that is both more responsive and more stable in the face of | PIE that is both more responsive and more stable in the face of | |||
| dynamically varying load. | dynamically varying load. | |||
| Curvy RED is derived from RED [RFC2309], except its configuration | Curvy RED is derived from RED [RFC2309], except its configuration | |||
| parameters are delay-based to make them insensitive to link rate and | parameters are delay-based to make them insensitive to link rate and | |||
| it requires less operations per packet than RED. However, DualPI2 is | it requires fewer operations per packet than RED. However, DualPI2 | |||
| more responsive and stable over a wider range of RTTs than Curvy RED. | is more responsive and stable over a wider range of RTTs than Curvy | |||
| As a consequence, at the time of writing, DualPI2 has attracted more | RED. As a consequence, at the time of writing, DualPI2 has attracted | |||
| development and evaluation attention than Curvy RED, leaving the | more development and evaluation attention than Curvy RED, leaving the | |||
| Curvy RED design not so fully evaluated. | Curvy RED design not so fully evaluated. | |||
| Both AQMs regulate their queue against targets configured in units of | Both AQMs regulate their queue against targets configured in units of | |||
| time rather than bytes. As already explained, this ensures | time rather than bytes. As already explained, this ensures | |||
| configuration can be invariant for different drain rates. With AQMs | configuration can be invariant for different drain rates. With AQMs | |||
| in a dualQ structure this is particularly important because the drain | in a dualQ structure this is particularly important because the drain | |||
| rate of each queue can vary rapidly as flows for the two queues | rate of each queue can vary rapidly as flows for the two queues | |||
| arrive and depart, even if the combined link rate is constant. | arrive and depart, even if the combined link rate is constant. | |||
| It would be possible to control the queues with other alternative | It would be possible to control the queues with other alternative | |||
| skipping to change at page 17, line 17 ¶ | skipping to change at page 17, line 9 ¶ | |||
| capitals) in Section 2.5 are observed. | capitals) in Section 2.5 are observed. | |||
| The two queues could optionally be part of a larger queuing | The two queues could optionally be part of a larger queuing | |||
| hierarchy, such as the initial example ideas in | hierarchy, such as the initial example ideas in | |||
| [I-D.briscoe-tsvwg-l4s-diffserv]. | [I-D.briscoe-tsvwg-l4s-diffserv]. | |||
| 2.5. Normative Requirements for a DualQ Coupled AQM | 2.5. Normative Requirements for a DualQ Coupled AQM | |||
| The following requirements are intended to capture only the essential | The following requirements are intended to capture only the essential | |||
| aspects of a DualQ Coupled AQM. They are intended to be independent | aspects of a DualQ Coupled AQM. They are intended to be independent | |||
| of the particular AQMs used for each queue. | of the particular AQMs implemented for each queue, but to still | |||
| define the DualQ framework built around those AQMs. | ||||
| 2.5.1. Functional Requirements | 2.5.1. Functional Requirements | |||
| A Dual Queue Coupled AQM implementation MUST comply with the | A Dual Queue Coupled AQM implementation MUST comply with the | |||
| prerequisite L4S behaviours for any L4S network node (not just a | prerequisite L4S behaviours for any L4S network node (not just a | |||
| DualQ) as specified in section 5 of [I-D.ietf-tsvwg-ecn-l4s-id]. | DualQ) as specified in section 5 of [I-D.ietf-tsvwg-ecn-l4s-id]. | |||
| These primarily concern classification and remarking as briefly | These primarily concern classification and remarking as briefly | |||
| summarized in Section 2.3 earlier. But there is also a subsection | summarized in Section 2.3 earlier. But there is also a subsection | |||
| (5.5) giving guidance on reducing the burstiness of the link | (5.5) giving guidance on reducing the burstiness of the link | |||
| technology underlying any L4S AQM. | technology underlying any L4S AQM. | |||
| skipping to change at page 21, line 40 ¶ | skipping to change at page 21, line 36 ¶ | |||
| two will measure proactive AQM discard; | two will measure proactive AQM discard; | |||
| * ECN packets marked, non-ECN packets dropped, ECN packets dropped, | * ECN packets marked, non-ECN packets dropped, ECN packets dropped, | |||
| which can be combined with the three total packet counts above to | which can be combined with the three total packet counts above to | |||
| calculate marking and dropping probabilities; | calculate marking and dropping probabilities; | |||
| * Queue delay (not including serialization delay of the head packet | * Queue delay (not including serialization delay of the head packet | |||
| or medium acquisition delay) - see further notes below. | or medium acquisition delay) - see further notes below. | |||
| Unlike the other statistics, queue delay cannot be captured in a | Unlike the other statistics, queue delay cannot be captured in a | |||
| simple accumulating counter. Therefore the type of queue delay | simple accumulating counter. Therefore, the type of queue delay | |||
| statistics produced (mean, percentiles, etc.) will depend on | statistics produced (mean, percentiles, etc.) will depend on | |||
| implementation constraints. To facilitate comparative evaluation | implementation constraints. To facilitate comparative evaluation | |||
| of different implementations and approaches, an implementation | of different implementations and approaches, an implementation | |||
| SHOULD allow mean and 99th percentile queue delay to be derived | SHOULD allow mean and 99th percentile queue delay to be derived | |||
| (per queue per sample interval). A relatively simple way to do | (per queue per sample interval). A relatively simple way to do | |||
| this would be to store a coarse-grained histogram of queue delay. | this would be to store a coarse-grained histogram of queue delay. | |||
| This could be done with a small number of bins with configurable | This could be done with a small number of bins with configurable | |||
| edges that represent contiguous ranges of queue delay. Then, over | edges that represent contiguous ranges of queue delay. Then, over | |||
| a sample interval, each bin would accumulate a count of the number | a sample interval, each bin would accumulate a count of the number | |||
| of packets that had fallen within each range. The maximum queue | of packets that had fallen within each range. The maximum queue | |||
| skipping to change at page 22, line 16 ¶ | skipping to change at page 22, line 16 ¶ | |||
| An experimental DualQ Coupled AQM SHOULD asynchronously report the | An experimental DualQ Coupled AQM SHOULD asynchronously report the | |||
| following data about anomalous conditions: | following data about anomalous conditions: | |||
| * Start-time and duration of overload state. | * Start-time and duration of overload state. | |||
| A hysteresis mechanism SHOULD be used to prevent flapping in and | A hysteresis mechanism SHOULD be used to prevent flapping in and | |||
| out of overload causing an event storm. For instance, exit from | out of overload causing an event storm. For instance, exit from | |||
| overload state could trigger one report, but also latch a timer. | overload state could trigger one report, but also latch a timer. | |||
| Then, during that time, if the AQM enters and exits overload state | Then, during that time, if the AQM enters and exits overload state | |||
| any number of times, the duration in overload state is accumulated | any number of times, the duration in overload state is | |||
| but no new report is generated until the first time the AQM is out | accumulated, but no new report is generated until the first time | |||
| of overload once the timer has expired. | the AQM is out of overload once the timer has expired. | |||
| 2.5.2.4. Deployment, Coexistence and Scaling | 2.5.2.4. Deployment, Coexistence and Scaling | |||
| [RFC5706] suggests that deployment, coexistence and scaling should | [RFC5706] suggests that deployment, coexistence and scaling should | |||
| also be covered as management requirements. The raison d'etre of the | also be covered as management requirements. The raison d'etre of the | |||
| DualQ Coupled AQM is to enable deployment and coexistence of Scalable | DualQ Coupled AQM is to enable deployment and coexistence of Scalable | |||
| congestion controls - as incremental replacements for today's Reno- | congestion controls - as incremental replacements for today's Reno- | |||
| friendly controls that do not scale with bandwidth-delay product. | friendly controls that do not scale with bandwidth-delay product. | |||
| Therefore there is no need to repeat these motivating issues here | Therefore, there is no need to repeat these motivating issues here | |||
| given they are already explained in the Introduction and detailed in | given they are already explained in the Introduction and detailed in | |||
| the L4S architecture [I-D.ietf-tsvwg-l4s-arch]. | the L4S architecture [I-D.ietf-tsvwg-l4s-arch]. | |||
| The descriptions of specific DualQ Coupled AQM algorithms in the | The descriptions of specific DualQ Coupled AQM algorithms in the | |||
| appendices cover scaling of their configuration parameters, e.g. with | appendices cover scaling of their configuration parameters, e.g. with | |||
| respect to RTT and sampling frequency. | respect to RTT and sampling frequency. | |||
| 3. IANA Considerations (to be removed by RFC Editor) | 3. IANA Considerations (to be removed by RFC Editor) | |||
| This specification contains no IANA considerations. | This specification contains no IANA considerations. | |||
| skipping to change at page 23, line 14 ¶ | skipping to change at page 23, line 14 ¶ | |||
| The security considerations section of the L4S architecture also | The security considerations section of the L4S architecture also | |||
| includes subsections on policing of relative flow-rates (section 8.1) | includes subsections on policing of relative flow-rates (section 8.1) | |||
| and on policing of flows that cause excessive queuing delay (section | and on policing of flows that cause excessive queuing delay (section | |||
| 8.2). It explains that the interests of users do not collide in the | 8.2). It explains that the interests of users do not collide in the | |||
| same way for delay as they do for bandwidth. For someone to get more | same way for delay as they do for bandwidth. For someone to get more | |||
| of the bandwidth of a shared link, someone else necessarily gets less | of the bandwidth of a shared link, someone else necessarily gets less | |||
| (a 'zero-sum game'), whereas queuing delay can be reduced for | (a 'zero-sum game'), whereas queuing delay can be reduced for | |||
| everyone, without any need for someone else to lose out. It also | everyone, without any need for someone else to lose out. It also | |||
| explains that, on the current Internet, scheduling usually enforces | explains that, on the current Internet, scheduling usually enforces | |||
| separation between 'sites' (e.g. households, businesses or mobile | separation of bandwidth between 'sites' (e.g. households, businesses | |||
| users), but it is not common to need to schedule or police individual | or mobile users), but it is not common to need to schedule or police | |||
| application flows. | the bandwidth used by individual application flows. | |||
| By the above arguments, per-flow policing might not be necessary and | By the above arguments, per-flow rate policing might not be necessary | |||
| in trusted environments it is certainly unlikely to be needed. | and in trusted environments (e.g. private data centres) it is | |||
| Therefore, because it is hard to avoid complexity and unintended | certainly unlikely to be needed. Therefore, because it is hard to | |||
| side-effects with per-flow policing, it needs to be separable from a | avoid complexity and unintended side effects with per-flow rate | |||
| basic AQM, as an option, under policy control. On this basis, the | policing, it needs to be separable from a basic AQM, as an option, | |||
| DualQ Coupled AQM provides low delay without prejudging the question | under policy control. On this basis, the DualQ Coupled AQM provides | |||
| of per-flow policing. | low delay without prejudging the question of per-flow rate policing. | |||
| Nonetheless, the interests of users or flows might conflict, e.g. in | Nonetheless, the interests of users or flows might conflict, e.g. in | |||
| case of accident or malice. Then per-flow control could be | case of accident or malice. Then per-flow rate control could be | |||
| necessary. If flow-rate control is needed, it can be provided as a | necessary. If flow-rate control is needed, it can be provided as a | |||
| modular addition to a DualQ. And similarly, if protection against | modular addition to a DualQ. And similarly, if protection against | |||
| excessive queue delay is needed, a per-flow queue protection option | excessive queue delay is needed, a per-flow queue protection option | |||
| can be added to a DualQ (e.g. [I-D.briscoe-docsis-q-protection]). | can be added to a DualQ (e.g. [I-D.briscoe-docsis-q-protection]). | |||
| 4.2. Handling Unresponsive Flows and Overload | 4.2. Handling Unresponsive Flows and Overload | |||
| In the absence of any per-flow control, it is important that the | In the absence of any per-flow control, it is important that the | |||
| basic DualQ Coupled AQM gives unresponsive flows no more throughput | basic DualQ Coupled AQM gives unresponsive flows no more throughput | |||
| advantage than a single-queue AQM would, and that it at least handles | advantage than a single-queue AQM would, and that it at least handles | |||
| skipping to change at page 25, line 19 ¶ | skipping to change at page 25, line 19 ¶ | |||
| Section 2.5.1) to avoid short-term starvation of Classic. Otherwise, | Section 2.5.1) to avoid short-term starvation of Classic. Otherwise, | |||
| as explained in Section 2.4, even a lone responsive L4S flow could | as explained in Section 2.4, even a lone responsive L4S flow could | |||
| temporarily block a small finite set of C packets (e.g. an initial | temporarily block a small finite set of C packets (e.g. an initial | |||
| window or DNS request). The blockage would only be brief, but it | window or DNS request). The blockage would only be brief, but it | |||
| could be longer for certain AQM implementations that can only | could be longer for certain AQM implementations that can only | |||
| increase the congestion signal coupled from the C queue when C | increase the congestion signal coupled from the C queue when C | |||
| packets are actually being dequeued. There is then the question of | packets are actually being dequeued. There is then the question of | |||
| whether to sacrifice L4S throughput or L4S delay (or some other | whether to sacrifice L4S throughput or L4S delay (or some other | |||
| policy) to make the priority conditional: | policy) to make the priority conditional: | |||
| Sacrifice L4S throughput: By using weighted round robin as the | Sacrifice L4S throughput: By using weighted round-robin as the | |||
| conditional priority scheduler, the L4S service can sacrifice some | conditional priority scheduler, the L4S service can sacrifice some | |||
| throughput during overload. This can either be thought of as | throughput during overload. This can either be thought of as | |||
| guaranteeing a minimum throughput service for Classic traffic, or | guaranteeing a minimum throughput service for Classic traffic, or | |||
| as guaranteeing a maximum delay for a packet at the head of the | as guaranteeing a maximum delay for a packet at the head of the | |||
| Classic queue. | Classic queue. | |||
| Cautionary note: a WRR scheduler can only guarantee Classic | Cautionary note: a WRR scheduler can only guarantee Classic | |||
| throughput if Classic sources are sending enough to use it -- | throughput if Classic sources are sending enough to use it -- | |||
| congestion signals can undermine scheduling because they determine | congestion signals can undermine scheduling because they determine | |||
| how much responsive traffic of each class arrives for scheduling | how much responsive traffic of each class arrives for scheduling | |||
| skipping to change at page 28, line 33 ¶ | skipping to change at page 28, line 33 ¶ | |||
| addressing the saturation problem. At saturation, DualPI2 switches | addressing the saturation problem. At saturation, DualPI2 switches | |||
| into overload mode, where the base AQM is driven by the max delay of | into overload mode, where the base AQM is driven by the max delay of | |||
| both queues and it introduces probabilistic drop to both queues | both queues and it introduces probabilistic drop to both queues | |||
| equally. It leaves only a small range of congestion levels just | equally. It leaves only a small range of congestion levels just | |||
| below saturation where unresponsive traffic gains any advantage from | below saturation where unresponsive traffic gains any advantage from | |||
| using the ECN capability (relative to being unresponsive without | using the ECN capability (relative to being unresponsive without | |||
| ECN), and the advantage is hardly detectable (see [DualQ-Test] and | ECN), and the advantage is hardly detectable (see [DualQ-Test] and | |||
| section IV-E of [DCttH19]. Also overload with an unresponsive ECT(1) | section IV-E of [DCttH19]. Also overload with an unresponsive ECT(1) | |||
| flow gets no more bandwidth advantage than with ECT(0). | flow gets no more bandwidth advantage than with ECT(0). | |||
| 5. Acknowledgements | 5. References | |||
| Thanks to Anil Agarwal, Sowmini Varadhan's, Gabi Bracha, Nicolas | ||||
| Kuhn, Greg Skinner, Tom Henderson, David Pullen, Mirja Kuehlewind, | ||||
| Gorry Fairhurst, Pete Heist, Ermin Sakic and Martin Duke for detailed | ||||
| review comments particularly of the appendices and suggestions on how | ||||
| to make the explanations clearer. Thanks also to Tom Henderson for | ||||
| insights on the choice of schedulers and queue delay measurement | ||||
| techniques. | ||||
| The early contributions of Koen De Schepper, Bob Briscoe, Olga | ||||
| Bondarenko and Inton Tsang were part-funded by the European Community | ||||
| under its Seventh Framework Programme through the Reducing Internet | ||||
| Transport Latency (RITE) project (ICT-317700). Contributions of Koen | ||||
| De Schepper and Olivier Tilmans were also part-funded by the 5Growth | ||||
| and DAEMON EU H2020 projects. Bob Briscoe's contribution was also | ||||
| part-funded by the Comcast Innovation Fund and the Research Council | ||||
| of Norway through the TimeIn project. The views expressed here are | ||||
| solely those of the authors. | ||||
| 6. Contributors | ||||
| The following contributed implementations and evaluations that | ||||
| validated and helped to improve this specification: | ||||
| Olga Albisser <olga@albisser.org> of Simula Research Lab, Norway | ||||
| (Olga Bondarenko during early drafts) implemented the prototype | ||||
| DualPI2 AQM for Linux with Koen De Schepper and conducted | ||||
| extensive evaluations as well as implementing the live performance | ||||
| visualization GUI [L4Sdemo16]. | ||||
| Olivier Tilmans <olivier.tilmans@nokia-bell-labs.com> of Nokia | ||||
| Bell Labs, Belgium prepared and maintains the Linux implementation | ||||
| of DualPI2 for upstreaming. | ||||
| Shravya K.S. wrote a model for the ns-3 simulator based on the -01 | ||||
| version of this Internet-Draft. Based on this initial work, Tom | ||||
| Henderson <tomh@tomh.org> updated that earlier model and created a | ||||
| model for the DualQ variant specified as part of the Low Latency | ||||
| DOCSIS specification, as well as conducting extensive evaluations. | ||||
| Ing Jyh (Inton) Tsang of Nokia, Belgium built the End-to-End Data | ||||
| Centre to the Home broadband testbed on which DualQ Coupled AQM | ||||
| implementations were tested. | ||||
| 7. References | ||||
| 7.1. Normative References | 5.1. Normative References | |||
| [I-D.ietf-tsvwg-ecn-l4s-id] | [I-D.ietf-tsvwg-ecn-l4s-id] | |||
| Schepper, K. D. and B. Briscoe, "Explicit Congestion | Schepper, K. D. and B. Briscoe, "Explicit Congestion | |||
| Notification (ECN) Protocol for Very Low Queuing Delay | Notification (ECN) Protocol for Very Low Queuing Delay | |||
| (L4S)", Work in Progress, Internet-Draft, draft-ietf- | (L4S)", Work in Progress, Internet-Draft, draft-ietf- | |||
| tsvwg-ecn-l4s-id-26, 7 July 2022, | tsvwg-ecn-l4s-id-28, 8 August 2022, | |||
| <https://datatracker.ietf.org/doc/html/draft-ietf-tsvwg- | <https://datatracker.ietf.org/api/v1/doc/document/draft- | |||
| ecn-l4s-id-26>. | ietf-tsvwg-ecn-l4s-id/>. | |||
| [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | |||
| Requirement Levels", BCP 14, RFC 2119, | Requirement Levels", BCP 14, RFC 2119, | |||
| DOI 10.17487/RFC2119, March 1997, | DOI 10.17487/RFC2119, March 1997, | |||
| <https://www.rfc-editor.org/info/rfc2119>. | <https://www.rfc-editor.org/info/rfc2119>. | |||
| [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition | [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition | |||
| of Explicit Congestion Notification (ECN) to IP", | of Explicit Congestion Notification (ECN) to IP", | |||
| RFC 3168, DOI 10.17487/RFC3168, September 2001, | RFC 3168, DOI 10.17487/RFC3168, September 2001, | |||
| <https://www.rfc-editor.org/info/rfc3168>. | <https://www.rfc-editor.org/info/rfc3168>. | |||
| [RFC8311] Black, D., "Relaxing Restrictions on Explicit Congestion | [RFC8311] Black, D., "Relaxing Restrictions on Explicit Congestion | |||
| Notification (ECN) Experimentation", RFC 8311, | Notification (ECN) Experimentation", RFC 8311, | |||
| DOI 10.17487/RFC8311, January 2018, | DOI 10.17487/RFC8311, January 2018, | |||
| <https://www.rfc-editor.org/info/rfc8311>. | <https://www.rfc-editor.org/info/rfc8311>. | |||
| 7.2. Informative References | 5.2. Informative References | |||
| [Alizadeh-stability] | [Alizadeh-stability] | |||
| Alizadeh, M., Javanmard, A., and B. Prabhakar, "Analysis | Alizadeh, M., Javanmard, A., and B. Prabhakar, "Analysis | |||
| of DCTCP: Stability, Convergence, and Fairness", ACM | of DCTCP: Stability, Convergence, and Fairness", ACM | |||
| SIGMETRICS 2011 , June 2011, | SIGMETRICS 2011 , June 2011, | |||
| <https://dl.acm.org/citation.cfm?id=1993753>. | <https://dl.acm.org/citation.cfm?id=1993753>. | |||
| [AQMmetrics] | [AQMmetrics] | |||
| Kwon, M. and S. Fahmy, "A Comparison of Load-based and | Kwon, M. and S. Fahmy, "A Comparison of Load-based and | |||
| Queue- based Active Queue Management Algorithms", Proc. | Queue- based Active Queue Management Algorithms", Proc. | |||
| Int'l Soc. for Optical Engineering (SPIE) 4866:35--46 DOI: | Int'l Soc. for Optical Engineering (SPIE) 4866:35--46 DOI: | |||
| 10.1117/12.473021, 2002, | 10.1117/12.473021, 2002, | |||
| <https://www.cs.purdue.edu/homes/fahmy/papers/ldc.pdf>. | <https://www.cs.purdue.edu/homes/fahmy/papers/ldc.pdf>. | |||
| [ARED01] Floyd, S., Gummadi, R., and S. Shenker, "Adaptive RED: An | [ARED01] Floyd, S., Gummadi, R., and S. Shenker, "Adaptive RED: An | |||
| Algorithm for Increasing the Robustness of RED's Active | Algorithm for Increasing the Robustness of RED's Active | |||
| Queue Management", ACIRI Technical Report , August 2001, | Queue Management", ACIRI Technical Report , August 2001, | |||
| <http://www.icir.org/floyd/red.html>. | <https://www.icir.org/floyd/red.html>. | |||
| [BBRv2] Cardwell, N., "BRTCP BBR v2 Alpha/Preview Release", github | [BBRv2] Cardwell, N., "BRTCP BBR v2 Alpha/Preview Release", GitHub | |||
| repository; Linux congestion control module, | repository; Linux congestion control module, | |||
| <https://github.com/google/bbr/blob/v2alpha/README.md>. | <https://github.com/google/bbr/blob/v2alpha/README.md>. | |||
| [Boru20] Boru Oljira, D., Grinnemo, K-J., Brunstrom, A., and J. | [Boru20] Boru Oljira, D., Grinnemo, K-J., Brunstrom, A., and J. | |||
| Taheri, "Validating the Sharing Behavior and Latency | Taheri, "Validating the Sharing Behavior and Latency | |||
| Characteristics of the L4S Architecture", ACM CCR | Characteristics of the L4S Architecture", ACM CCR | |||
| 50(2):37--44, May 2020, | 50(2):37--44, May 2020, | |||
| <https://dl.acm.org/doi/abs/10.1145/3402413.3402419>. | <https://dl.acm.org/doi/abs/10.1145/3402413.3402419>. | |||
| [CCcensus19] | [CCcensus19] | |||
| Mishra, A., Sun, X., Jain, A., Pande, S., Joshi, R., and | Mishra, A., Sun, X., Jain, A., Pande, S., Joshi, R., and | |||
| B. Leong, "The Great Internet TCP Congestion Control | B. Leong, "The Great Internet TCP Congestion Control | |||
| Census", Proc. ACM on Measurement and Analysis of | Census", Proc. ACM on Measurement and Analysis of | |||
| Computing Systems 3(3), December 2019, | Computing Systems 3(3), December 2019, | |||
| <https://doi.org/10.1145/3366693>. | <https://doi.org/10.1145/3366693>. | |||
| [CoDel] Nichols, K. and V. Jacobson, "Controlling Queue Delay", | [CoDel] Nichols, K. and V. Jacobson, "Controlling Queue Delay", | |||
| ACM Queue 10(5), May 2012, | ACM Queue 10(5), May 2012, | |||
| <http://queue.acm.org/issuedetail.cfm?issue=2208917>. | <https://queue.acm.org/issuedetail.cfm?issue=2208917>. | |||
| [CRED_Insights] | [CRED_Insights] | |||
| Briscoe, B., "Insights from Curvy RED (Random Early | Briscoe, B., "Insights from Curvy RED (Random Early | |||
| Detection)", BT Technical Report TR-TUB8-2015-003 | Detection)", BT Technical Report TR-TUB8-2015-003 | |||
| arXiv:1904.07339 [cs.NI], July 2015, | arXiv:1904.07339 [cs.NI], July 2015, | |||
| <https://arxiv.org/abs/1904.07339>. | <https://arxiv.org/abs/1904.07339>. | |||
| [DCttH19] De Schepper, K., Bondarenko, O., Tilmans, O., and B. | [DCttH19] De Schepper, K., Bondarenko, O., Tilmans, O., and B. | |||
| Briscoe, "`Data Centre to the Home': Ultra-Low Latency for | Briscoe, "`Data Centre to the Home': Ultra-Low Latency for | |||
| All", Updated RITE project Technical Report , July 2019, | All", Updated RITE project Technical Report , July 2019, | |||
| skipping to change at page 31, line 32 ¶ | skipping to change at page 30, line 36 ¶ | |||
| [DualPI2Linux] | [DualPI2Linux] | |||
| Albisser, O., De Schepper, K., Briscoe, B., Tilmans, O., | Albisser, O., De Schepper, K., Briscoe, B., Tilmans, O., | |||
| and H. Steen, "DUALPI2 - Low Latency, Low Loss and | and H. Steen, "DUALPI2 - Low Latency, Low Loss and | |||
| Scalable (L4S) AQM", Proc. Linux Netdev 0x13 , March 2019, | Scalable (L4S) AQM", Proc. Linux Netdev 0x13 , March 2019, | |||
| <https://www.netdevconf.org/0x13/session.html?talk- | <https://www.netdevconf.org/0x13/session.html?talk- | |||
| DUALPI2-AQM>. | DUALPI2-AQM>. | |||
| [DualQ-Test] | [DualQ-Test] | |||
| Steen, H., "Destruction Testing: Ultra-Low Delay using | Steen, H., "Destruction Testing: Ultra-Low Delay using | |||
| Dual Queue Coupled Active Queue Management", Masters | Dual Queue Coupled Active Queue Management", Master's | |||
| Thesis, Dept of Informatics, Uni Oslo , May 2017, | Thesis, Dept of Informatics, Uni Oslo , May 2017, | |||
| <https://www.duo.uio.no/bitstream/handle/10852/57424/ | <https://www.duo.uio.no/bitstream/handle/10852/57424/ | |||
| thesis-henrste.pdf?sequence=1>. | thesis-henrste.pdf?sequence=1>. | |||
| [Heist21] Heist, P. and J. Morton, "L4S Tests", github README, | [Dukkipati06] | |||
| Dukkipati, N. and N. McKeown, "Why Flow-Completion Time is | ||||
| the Right Metric for Congestion Control", ACM CCR | ||||
| 36(1):59--62, January 2006, | ||||
| <https://dl.acm.org/doi/10.1145/1111322.1111336>. | ||||
| [Heist21] Heist, P. and J. Morton, "L4S Tests", GitHub README, | ||||
| August 2021, <https://github.com/heistp/l4s- | August 2021, <https://github.com/heistp/l4s- | |||
| tests/#underutilization-with-bursty-traffic>. | tests/#underutilization-with-bursty-traffic>. | |||
| [I-D.briscoe-docsis-q-protection] | [I-D.briscoe-docsis-q-protection] | |||
| Briscoe, B. and G. White, "The DOCSIS(r) Queue Protection | Briscoe, B. and G. White, "The DOCSIS(r) Queue Protection | |||
| Algorithm to Preserve Low Latency", Work in Progress, | Algorithm to Preserve Low Latency", Work in Progress, | |||
| Internet-Draft, draft-briscoe-docsis-q-protection-06, 13 | Internet-Draft, draft-briscoe-docsis-q-protection-06, 13 | |||
| May 2022, <https://datatracker.ietf.org/doc/html/draft- | May 2022, | |||
| briscoe-docsis-q-protection-06>. | <https://datatracker.ietf.org/api/v1/doc/document/draft- | |||
| briscoe-docsis-q-protection/>. | ||||
| [I-D.briscoe-iccrg-prague-congestion-control] | [I-D.briscoe-iccrg-prague-congestion-control] | |||
| Schepper, K. D., Tilmans, O., and B. Briscoe, "Prague | Schepper, K. D., Tilmans, O., and B. Briscoe, "Prague | |||
| Congestion Control", Work in Progress, Internet-Draft, | Congestion Control", Work in Progress, Internet-Draft, | |||
| draft-briscoe-iccrg-prague-congestion-control-00, 9 March | draft-briscoe-iccrg-prague-congestion-control-01, 11 July | |||
| 2021, <https://datatracker.ietf.org/doc/html/draft- | 2022, <https://datatracker.ietf.org/api/v1/doc/document/ | |||
| briscoe-iccrg-prague-congestion-control-00>. | draft-briscoe-iccrg-prague-congestion-control/>. | |||
| [I-D.briscoe-tsvwg-l4s-diffserv] | [I-D.briscoe-tsvwg-l4s-diffserv] | |||
| Briscoe, B., "Interactions between Low Latency, Low Loss, | Briscoe, B., "Interactions between Low Latency, Low Loss, | |||
| Scalable Throughput (L4S) and Differentiated Services", | Scalable Throughput (L4S) and Differentiated Services", | |||
| Work in Progress, Internet-Draft, draft-briscoe-tsvwg-l4s- | Work in Progress, Internet-Draft, draft-briscoe-tsvwg-l4s- | |||
| diffserv-02, 4 November 2018, | diffserv-02, 2 July 2018, | |||
| <https://datatracker.ietf.org/doc/html/draft-briscoe- | <https://datatracker.ietf.org/api/v1/doc/document/draft- | |||
| tsvwg-l4s-diffserv-02>. | briscoe-tsvwg-l4s-diffserv/>. | |||
| [I-D.cardwell-iccrg-bbr-congestion-control] | [I-D.cardwell-iccrg-bbr-congestion-control] | |||
| Cardwell, N., Cheng, Y., Yeganeh, S. H., Swett, I., and V. | Cardwell, N., Cheng, Y., Yeganeh, S. H., Swett, I., and V. | |||
| Jacobson, "BBR Congestion Control", Work in Progress, | Jacobson, "BBR Congestion Control", Work in Progress, | |||
| Internet-Draft, draft-cardwell-iccrg-bbr-congestion- | Internet-Draft, draft-cardwell-iccrg-bbr-congestion- | |||
| control-02, 7 March 2022, | control-02, 7 March 2022, | |||
| <https://datatracker.ietf.org/doc/html/draft-cardwell- | <https://datatracker.ietf.org/api/v1/doc/document/draft- | |||
| iccrg-bbr-congestion-control-02>. | cardwell-iccrg-bbr-congestion-control/>. | |||
| [I-D.ietf-tsvwg-l4s-arch] | [I-D.ietf-tsvwg-l4s-arch] | |||
| Briscoe, B., Schepper, K. D., Bagnulo, M., and G. White, | Briscoe, B., Schepper, K. D., Bagnulo, M., and G. White, | |||
| "Low Latency, Low Loss, Scalable Throughput (L4S) Internet | "Low Latency, Low Loss, Scalable Throughput (L4S) Internet | |||
| Service: Architecture", Work in Progress, Internet-Draft, | Service: Architecture", Work in Progress, Internet-Draft, | |||
| draft-ietf-tsvwg-l4s-arch-18, 7 July 2022, | draft-ietf-tsvwg-l4s-arch-19, 27 July 2022, | |||
| <https://datatracker.ietf.org/doc/html/draft-ietf-tsvwg- | <https://datatracker.ietf.org/api/v1/doc/document/draft- | |||
| l4s-arch-18>. | ietf-tsvwg-l4s-arch/>. | |||
| [I-D.mathis-iccrg-relentless-tcp] | ||||
| Mathis, M., "Relentless Congestion Control", Work in | ||||
| Progress, Internet-Draft, draft-mathis-iccrg-relentless- | ||||
| tcp-00, 4 March 2009, <https://www.ietf.org/archive/id/ | ||||
| draft-mathis-iccrg-relentless-tcp-00.txt>. | ||||
| [L4Sdemo16] | [L4Sdemo16] | |||
| Bondarenko, O., De Schepper, K., Tsang, I., and B. | Bondarenko, O., De Schepper, K., Tsang, I., and B. | |||
| Briscoe, "Ultra-Low Delay for All: Live Experience, Live | Briscoe, "Ultra-Low Delay for All: Live Experience, Live | |||
| Analysis", Proc. MMSYS'16 pp33:1--33:4, May 2016, | Analysis", Proc. MMSYS'16 pp33:1--33:4, May 2016, | |||
| <http://dl.acm.org/citation.cfm?doid=2910017.2910633 | <https//dl.acm.org/citation.cfm?doid=2910017.2910633 | |||
| (videos of demos: | (videos of demos: | |||
| https://riteproject.eu/dctth/#1511dispatchwg )>. | https://riteproject.eu/dctth/#1511dispatchwg )>. | |||
| [L4S_5G] Willars, P., Wittenmark, E., Ronkainen, H., Östberg, C., | [L4S_5G] Willars, P., Wittenmark, E., Ronkainen, H., Östberg, C., | |||
| Johansson, I., Strand, J., Lédl, P., and D. Schnieders, | Johansson, I., Strand, J., Lédl, P., and D. Schnieders, | |||
| "Enabling time-critical applications over 5G with rate | "Enabling time-critical applications over 5G with rate | |||
| adaptation", Ericsson - Deutsche Telekom White Paper BNEW- | adaptation", Ericsson - Deutsche Telekom White Paper BNEW- | |||
| 21:025455 Uen, May 2021, <https://www.ericsson.com/en/ | 21:025455 Uen, May 2021, <https://www.ericsson.com/en/ | |||
| reports-and-papers/white-papers/enabling-time-critical- | reports-and-papers/white-papers/enabling-time-critical- | |||
| applications-over-5g-with-rate-adaptation>. | applications-over-5g-with-rate-adaptation>. | |||
| skipping to change at page 33, line 16 ¶ | skipping to change at page 32, line 28 ¶ | |||
| Labovitz, C., Iekel-Johnson, S., McPherson, D., Oberheide, | Labovitz, C., Iekel-Johnson, S., McPherson, D., Oberheide, | |||
| J., and F. Jahanian, "Internet Inter-Domain Traffic", Proc | J., and F. Jahanian, "Internet Inter-Domain Traffic", Proc | |||
| ACM SIGCOMM; ACM CCR 40(4):75--86, August 2010, | ACM SIGCOMM; ACM CCR 40(4):75--86, August 2010, | |||
| <https://doi.org/10.1145/1851275.1851194>. | <https://doi.org/10.1145/1851275.1851194>. | |||
| [LLD] White, G., Sundaresan, K., and B. Briscoe, "Low Latency | [LLD] White, G., Sundaresan, K., and B. Briscoe, "Low Latency | |||
| DOCSIS: Technology Overview", CableLabs White Paper , | DOCSIS: Technology Overview", CableLabs White Paper , | |||
| February 2019, <https://cablela.bs/low-latency-docsis- | February 2019, <https://cablela.bs/low-latency-docsis- | |||
| technology-overview-february-2019>. | technology-overview-february-2019>. | |||
| [Mathis09] Mathis, M., "Relentless Congestion Control", PFLDNeT'09 , | ||||
| May 2009, <http://www.hpcc.jp/pfldnet2009/ | ||||
| Program_files/1569198525.pdf>. | ||||
| [MEDF] Menth, M., Schmid, M., Heiss, H., and T. Reim, "MEDF - a | [MEDF] Menth, M., Schmid, M., Heiss, H., and T. Reim, "MEDF - a | |||
| simple scheduling algorithm for two real-time transport | simple scheduling algorithm for two real-time transport | |||
| service classes with application in the UTRAN", Proc. IEEE | service classes with application in the UTRAN", Proc. IEEE | |||
| Conference on Computer Communications (INFOCOM'03) Vol.2 | Conference on Computer Communications (INFOCOM'03) Vol.2 | |||
| pp.1116-1122, March 2003, | pp.1116-1122, March 2003, | |||
| <http://infocom2003.ieee-infocom.org/papers/27_04.PDF>. | <https://infocom2003.ieee-infocom.org/papers/27_04.PDF>. | |||
| [PI2] De Schepper, K., Bondarenko, O., Briscoe, B., and I. | [PI2] De Schepper, K., Bondarenko, O., Briscoe, B., and I. | |||
| Tsang, "PI2: A Linearized AQM for both Classic and | Tsang, "PI2: A Linearized AQM for both Classic and | |||
| Scalable TCP", ACM CoNEXT'16 , December 2016, | Scalable TCP", ACM CoNEXT'16 , December 2016, | |||
| <https://riteproject.files.wordpress.com/2015/10/ | <https://riteproject.files.wordpress.com/2015/10/ | |||
| pi2_conext.pdf>. | pi2_conext.pdf>. | |||
| [PI2param] Briscoe, B., "PI2 Parameters", Technical Report TR-BB- | [PI2param] Briscoe, B., "PI2 Parameters", Technical Report TR-BB- | |||
| 2021-001 arXiv:2107.01003 [cs.NI], July 2021, | 2021-001 arXiv:2107.01003 [cs.NI], July 2021, | |||
| <https://arxiv.org/abs/2107.01003>. | <https://arxiv.org/abs/2107.01003>. | |||
| skipping to change at page 35, line 11 ¶ | skipping to change at page 34, line 22 ¶ | |||
| Lightweight Control Scheme to Address the Bufferbloat | Lightweight Control Scheme to Address the Bufferbloat | |||
| Problem", RFC 8033, DOI 10.17487/RFC8033, February 2017, | Problem", RFC 8033, DOI 10.17487/RFC8033, February 2017, | |||
| <https://www.rfc-editor.org/info/rfc8033>. | <https://www.rfc-editor.org/info/rfc8033>. | |||
| [RFC8034] White, G. and R. Pan, "Active Queue Management (AQM) Based | [RFC8034] White, G. and R. Pan, "Active Queue Management (AQM) Based | |||
| on Proportional Integral Controller Enhanced PIE) for | on Proportional Integral Controller Enhanced PIE) for | |||
| Data-Over-Cable Service Interface Specifications (DOCSIS) | Data-Over-Cable Service Interface Specifications (DOCSIS) | |||
| Cable Modems", RFC 8034, DOI 10.17487/RFC8034, February | Cable Modems", RFC 8034, DOI 10.17487/RFC8034, February | |||
| 2017, <https://www.rfc-editor.org/info/rfc8034>. | 2017, <https://www.rfc-editor.org/info/rfc8034>. | |||
| [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC | ||||
| 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, | ||||
| May 2017, <https://www.rfc-editor.org/info/rfc8174>. | ||||
| [RFC8257] Bensley, S., Thaler, D., Balasubramanian, P., Eggert, L., | [RFC8257] Bensley, S., Thaler, D., Balasubramanian, P., Eggert, L., | |||
| and G. Judd, "Data Center TCP (DCTCP): TCP Congestion | and G. Judd, "Data Center TCP (DCTCP): TCP Congestion | |||
| Control for Data Centers", RFC 8257, DOI 10.17487/RFC8257, | Control for Data Centers", RFC 8257, DOI 10.17487/RFC8257, | |||
| October 2017, <https://www.rfc-editor.org/info/rfc8257>. | October 2017, <https://www.rfc-editor.org/info/rfc8257>. | |||
| [RFC8290] Hoeiland-Joergensen, T., McKenney, P., Taht, D., Gettys, | [RFC8290] Hoeiland-Joergensen, T., McKenney, P., Taht, D., Gettys, | |||
| J., and E. Dumazet, "The Flow Queue CoDel Packet Scheduler | J., and E. Dumazet, "The Flow Queue CoDel Packet Scheduler | |||
| and Active Queue Management Algorithm", RFC 8290, | and Active Queue Management Algorithm", RFC 8290, | |||
| DOI 10.17487/RFC8290, January 2018, | DOI 10.17487/RFC8290, January 2018, | |||
| <https://www.rfc-editor.org/info/rfc8290>. | <https://www.rfc-editor.org/info/rfc8290>. | |||
| skipping to change at page 35, line 36 ¶ | skipping to change at page 35, line 5 ¶ | |||
| [RFC8312] Rhee, I., Xu, L., Ha, S., Zimmermann, A., Eggert, L., and | [RFC8312] Rhee, I., Xu, L., Ha, S., Zimmermann, A., Eggert, L., and | |||
| R. Scheffenegger, "CUBIC for Fast Long-Distance Networks", | R. Scheffenegger, "CUBIC for Fast Long-Distance Networks", | |||
| RFC 8312, DOI 10.17487/RFC8312, February 2018, | RFC 8312, DOI 10.17487/RFC8312, February 2018, | |||
| <https://www.rfc-editor.org/info/rfc8312>. | <https://www.rfc-editor.org/info/rfc8312>. | |||
| [RFC8404] Moriarty, K., Ed. and A. Morton, Ed., "Effects of | [RFC8404] Moriarty, K., Ed. and A. Morton, Ed., "Effects of | |||
| Pervasive Encryption on Operators", RFC 8404, | Pervasive Encryption on Operators", RFC 8404, | |||
| DOI 10.17487/RFC8404, July 2018, | DOI 10.17487/RFC8404, July 2018, | |||
| <https://www.rfc-editor.org/info/rfc8404>. | <https://www.rfc-editor.org/info/rfc8404>. | |||
| [SCReAM] Johansson, I., "SCReAM", github repository; , | [SCReAM] Johansson, I., "SCReAM", GitHub repository; , | |||
| <https://github.com/EricssonResearch/scream/blob/master/ | <https://github.com/EricssonResearch/scream/blob/master/ | |||
| README.md>. | README.md>. | |||
| [SigQ-Dyn] Briscoe, B., "Rapid Signalling of Queue Dynamics", | [SigQ-Dyn] Briscoe, B., "Rapid Signalling of Queue Dynamics", | |||
| Technical Report TR-BB-2017-001 arXiv:1904.07044 [cs.NI], | Technical Report TR-BB-2017-001 arXiv:1904.07044 [cs.NI], | |||
| September 2017, <https://arxiv.org/abs/1904.07044>. | September 2017, <https://arxiv.org/abs/1904.07044>. | |||
| Appendix A. Example DualQ Coupled PI2 Algorithm | Appendix A. Example DualQ Coupled PI2 Algorithm | |||
| As a first concrete example, the pseudocode below gives the DualPI2 | As a first concrete example, the pseudocode below gives the DualPI2 | |||
| skipping to change at page 39, line 43 ¶ | skipping to change at page 39, line 9 ¶ | |||
| 28: } | 28: } | |||
| 29: return FALSE | 29: return FALSE | |||
| 30: } | 30: } | |||
| Figure 4: Example Dequeue Pseudocode for DualQ Coupled PI2 AQM | Figure 4: Example Dequeue Pseudocode for DualQ Coupled PI2 AQM | |||
| When packets arrive, first a common queue limit is checked as shown | When packets arrive, first a common queue limit is checked as shown | |||
| in line 2 of the enqueuing pseudocode in Figure 3. This assumes a | in line 2 of the enqueuing pseudocode in Figure 3. This assumes a | |||
| shared buffer for the two queues (Note b discusses the merits of | shared buffer for the two queues (Note b discusses the merits of | |||
| separate buffers). In order to avoid any bias against larger | separate buffers). In order to avoid any bias against larger | |||
| packets, 1 MTU of space is always allowed and the limit is | packets, 1 MTU of space is always allowed, and the limit is | |||
| deliberately tested before enqueue. | deliberately tested before enqueue. | |||
| If limit is not exceeded, the packet is timestamped in line 4 (only | If limit is not exceeded, the packet is timestamped in line 4 (only | |||
| if the sojourn time technique is being used to measure queue delay; | if the sojourn time technique is being used to measure queue delay; | |||
| see Note a for alternatives). | see Note a for alternatives). | |||
| At lines 5-9, the packet is classified and enqueued to the Classic or | At lines 5-9, the packet is classified and enqueued to the Classic or | |||
| L4S queue dependent on the least significant bit of the ECN field in | L4S queue dependent on the least significant bit of the ECN field in | |||
| the IP header (line 6). Packets with a codepoint having an LSB of 0 | the IP header (line 6). Packets with a codepoint having an LSB of 0 | |||
| (Not-ECT and ECT(0)) will be enqueued in the Classic queue. | (Not-ECT and ECT(0)) will be enqueued in the Classic queue. | |||
| skipping to change at page 43, line 20 ¶ | skipping to change at page 42, line 32 ¶ | |||
| significant outlier and, on reflection, the experimental technique | significant outlier and, on reflection, the experimental technique | |||
| seemed inappropriate to the CDN market in China. | seemed inappropriate to the CDN market in China. | |||
| * g is taken as 0.38. The factor g is a geometry factor that | * g is taken as 0.38. The factor g is a geometry factor that | |||
| characterizes the shape of the sawteeth of prevalent Classic | characterizes the shape of the sawteeth of prevalent Classic | |||
| congestion controllers. The geometry factor is the fraction of | congestion controllers. The geometry factor is the fraction of | |||
| the amplitude of the sawtooth variability in queue delay that lies | the amplitude of the sawtooth variability in queue delay that lies | |||
| below the AQM's target. For instance, at low bit rate, the | below the AQM's target. For instance, at low bit rate, the | |||
| geometry factor of standard Reno is 0.5, but at higher rates it | geometry factor of standard Reno is 0.5, but at higher rates it | |||
| tends to just under 1. According to the census of congestion | tends to just under 1. According to the census of congestion | |||
| controllers conducted by Mishra _et al_ in Jul-Oct | controllers conducted by Mishra et al. in Jul-Oct | |||
| 2019 [CCcensus19], most Classic TCP traffic uses Cubic. And, | 2019 [CCcensus19], most Classic TCP traffic uses Cubic. And, | |||
| according to the analysis in [PI2param], if running over a PI2 | according to the analysis in [PI2param], if running over a PI2 | |||
| AQM, a large proportion of this Cubic traffic would be in its | AQM, a large proportion of this Cubic traffic would be in its | |||
| Reno-Friendly mode, which has a geometry factor of ~0.39 (all | Reno-Friendly mode, which has a geometry factor of ~0.39 (all | |||
| known implementations). The rest of the Cubic traffic would be in | known implementations). The rest of the Cubic traffic would be in | |||
| true Cubic mode, which has a geometry factor of ~0.36. Without | true Cubic mode, which has a geometry factor of ~0.36. Without | |||
| modelling the sawtooth profiles from all the other less prevalent | modelling the sawtooth profiles from all the other less prevalent | |||
| congestion controllers, we estimate a 7:3 weighted average of | congestion controllers, we estimate a 7:3 weighted average of | |||
| these two, resulting in an average geometry factor of 0.38. | these two, resulting in an average geometry factor of 0.38. | |||
| * f is taken as 2. The factor f is a safety factor that increases | * f is taken as 2. The factor f is a safety factor that increases | |||
| the target queue to allow for the distribution of RTT_typ around | the target queue to allow for the distribution of RTT_typ around | |||
| its mean. Otherwise the target queue would only avoid | its mean. Otherwise, the target queue would only avoid | |||
| underutilization for those users below the mean. It also provides | underutilization for those users below the mean. It also provides | |||
| a safety margin for the proportion of paths in use that span | a safety margin for the proportion of paths in use that span | |||
| beyond the distance between a user and their local CDN. Currently | beyond the distance between a user and their local CDN. | |||
| no data is available on the variance of queue delay around the | Currently, no data is available on the variance of queue delay | |||
| mean in each region, so there is plenty of room for this guess to | around the mean in each region, so there is plenty of room for | |||
| become more educated. | this guess to become more educated. | |||
| * [PI2param] recommends target = RTT_typ * g * f = 25ms * 0.38 * 2 = | * [PI2param] recommends target = RTT_typ * g * f = 25ms * 0.38 * 2 = | |||
| 19 ms. However a further adjustment is warranted, because target | 19 ms. However, a further adjustment is warranted, because target | |||
| is moving year on year. The paper is based on data collected in | is moving year-on-year. The paper is based on data collected in | |||
| 2019, and it mentions evidence from speedtest.net that suggests | 2019, and it mentions evidence from speedtest.net that suggests | |||
| RTT_typ reduced by 17% (fixed) or 12% (mobile) between 2020 and | RTT_typ reduced by 17% (fixed) or 12% (mobile) between 2020 and | |||
| 2021. Therefore we recommend a default of target = 15 ms at the | 2021. Therefore, we recommend a default of target = 15 ms at the | |||
| time of writing (2021). | time of writing (2021). | |||
| Operators can always use the data and discussion in [PI2param] to | Operators can always use the data and discussion in [PI2param] to | |||
| configure a more appropriate target for their environment. For | configure a more appropriate target for their environment. For | |||
| instance, an operator might wish to question the assumptions called | instance, an operator might wish to question the assumptions called | |||
| out in that paper, such as the goal of no underutilization for a | out in that paper, such as the goal of no underutilization for a | |||
| large majority of single flow transfers (given many large transfers | large majority of single flow transfers (given many large transfers | |||
| use multiple flows to avoid the scaling limitations of Classic | use multiple flows to avoid the scaling limitations of Classic | |||
| flows). | flows). | |||
| skipping to change at page 44, line 41 ¶ | skipping to change at page 44, line 4 ¶ | |||
| The choice of alpha and beta also determines the AQM's stable | The choice of alpha and beta also determines the AQM's stable | |||
| operating range. The AQM ought to change p' as fast as possible in | operating range. The AQM ought to change p' as fast as possible in | |||
| response to changes in load without over-compensating and therefore | response to changes in load without over-compensating and therefore | |||
| causing oscillations in the queue. Therefore, the values of alpha | causing oscillations in the queue. Therefore, the values of alpha | |||
| and beta also depend on the RTT of the expected worst-case flow | and beta also depend on the RTT of the expected worst-case flow | |||
| (RTT_max). | (RTT_max). | |||
| The maximum RTT of a PI controller (RTT_max in line 10 of Figure 2) | The maximum RTT of a PI controller (RTT_max in line 10 of Figure 2) | |||
| is not an absolute maximum, but more instability (more queue | is not an absolute maximum, but more instability (more queue | |||
| variability) sets in for long-running flows with an RTT above this | variability) sets in for long-running flows with an RTT above this | |||
| value. The propagation delay half way round the planet and back in | value. The propagation delay halfway round the planet and back in | |||
| glass fibre is 200 ms. However, hardly any traffic traverses such | glass fibre is 200 ms. However, hardly any traffic traverses such | |||
| extreme paths and, since the significant consolidation of Internet | extreme paths and, since the significant consolidation of Internet | |||
| traffic between 2007 and 2009 [Labovitz10], a high and growing | traffic between 2007 and 2009 [Labovitz10], a high and growing | |||
| proportion of all Internet traffic (roughly two-thirds at the time of | proportion of all Internet traffic (roughly two-thirds at the time of | |||
| writing) has been served from content distribution networks (CDNs) or | writing) has been served from content distribution networks (CDNs) or | |||
| 'cloud' services distributed close to end-users. The Internet might | 'cloud' services distributed close to end-users. The Internet might | |||
| change again, but for now, designing for a maximum RTT of 100ms is a | change again, but for now, designing for a maximum RTT of 100ms is a | |||
| good compromise between faster queue control at low RTT and some | good compromise between faster queue control at low RTT and some | |||
| instability on the occasions when a longer path is necessary. | instability on the occasions when a longer path is necessary. | |||
| skipping to change at page 46, line 14 ¶ | skipping to change at page 45, line 29 ¶ | |||
| Notes: | Notes: | |||
| a. The drain rate of the queue can vary if it is scheduled relative | a. The drain rate of the queue can vary if it is scheduled relative | |||
| to other queues, or to cater for fluctuations in a wireless | to other queues, or to cater for fluctuations in a wireless | |||
| medium. To auto-adjust to changes in drain rate, the queue needs | medium. To auto-adjust to changes in drain rate, the queue needs | |||
| to be measured in time, not bytes or packets [AQMmetrics], | to be measured in time, not bytes or packets [AQMmetrics], | |||
| [CoDel]. Queuing delay could be measured directly as the sojourn | [CoDel]. Queuing delay could be measured directly as the sojourn | |||
| time (aka. service time) of the queue, by storing a per-packet | time (aka. service time) of the queue, by storing a per-packet | |||
| time-stamp as each packet is enqueued, and subtracting this from | time-stamp as each packet is enqueued, and subtracting this from | |||
| the system time when the packet is dequeued. If time- stamping | the system time when the packet is dequeued. If time-stamping is | |||
| is not easy to introduce with certain hardware, queuing delay | not easy to introduce with certain hardware, queuing delay could | |||
| could be predicted indirectly by dividing the size of the queue | be predicted indirectly by dividing the size of the queue by the | |||
| by the predicted departure rate, which might be known precisely | predicted departure rate, which might be known precisely for some | |||
| for some link technologies (see for example in DOCSIS PIE | link technologies (see for example in DOCSIS PIE [RFC8034]). | |||
| [RFC8034]). | ||||
| However, sojourn time is slow to detect bursts. For instance, if | However, sojourn time is slow to detect bursts. For instance, if | |||
| a burst arrives at an empty queue, the sojourn time only fully | a burst arrives at an empty queue, the sojourn time only fully | |||
| measures the burst's delay when its last packet is dequeued, even | measures the burst's delay when its last packet is dequeued, even | |||
| though the queue has known the size of the burst since its last | though the queue has known the size of the burst since its last | |||
| packet was enqueued - so it could have signalled congestion | packet was enqueued - so it could have signalled congestion | |||
| earlier. To remedy this, each head packet can be marked when it | earlier. To remedy this, each head packet can be marked when it | |||
| is dequeued based on the expected delay of the tail packet behind | is dequeued based on the expected delay of the tail packet behind | |||
| it, as explained below, rather than based on the head packet's | it, as explained below, rather than based on the head packet's | |||
| own delay due to the packets in front of it. [Heist21] identifies | own delay due to the packets in front of it. [Heist21] identifies | |||
| a specific scenario where bursty traffic significantly hits | a specific scenario where bursty traffic significantly hits | |||
| utilization of the L queue. If this effect proves to be more | utilization of the L queue. If this effect proves to be more | |||
| widely applicable, it is believed that using the delay behind the | widely applicable, using the delay behind the head could improve | |||
| head would improve performance. | performance. | |||
| The delay behind the head can be implemented by dividing the | The delay behind the head can be implemented by dividing the | |||
| backlog at dequeue by the link rate or equivalently multiplying | backlog at dequeue by the link rate or equivalently multiplying | |||
| the backlog by the delay per unit of backlog. The implementation | the backlog by the delay per unit of backlog. The implementation | |||
| details will depend on whether the link rate is known; if it is | details will depend on whether the link rate is known; if it is | |||
| not, a moving average of the delay per unit backlog can be | not, a moving average of the delay per unit backlog can be | |||
| maintained. This delay consists of serialization as well as | maintained. This delay consists of serialization as well as | |||
| media acquisition for shared media. So the details will depend | media acquisition for shared media. So the details will depend | |||
| strongly on the specific link technology, This approach should be | strongly on the specific link technology, This approach should be | |||
| less sensitive to timing errors and cost less in operations and | less sensitive to timing errors and cost less in operations and | |||
| memory than the otherwise equivalent 'scaled sojourn time' | memory than the otherwise equivalent 'scaled sojourn time' | |||
| metric, which is the sojourn time of a packet scaled by the ratio | metric, which is the sojourn time of a packet scaled by the ratio | |||
| of the queue sizes when the packet departed and | of the queue sizes when the packet departed and | |||
| arrived [SigQ-Dyn]. | arrived [SigQ-Dyn]. | |||
| b. Line 2 of the dualpi2_enqueue() function (Figure 3) assumes an | b. Line 2 of the dualpi2_enqueue() function (Figure 3) assumes an | |||
| implementation where lq and cq share common buffer memory. An | implementation where lq and cq share common buffer memory. An | |||
| alternative implementation could use separate buffers for each | alternative implementation could use separate buffers for each | |||
| queue, in which case the arriving packet would have to be | queue, in which case the arriving packet would have to be | |||
| classified first to determine which buffer to check for available | classified first to determine which buffer to check for available | |||
| space. The choice is a trade off; a shared buffer can use less | space. The choice is a trade-off; a shared buffer can use less | |||
| memory whereas separate buffers isolate the L4S queue from tail- | memory whereas separate buffers isolate the L4S queue from tail- | |||
| drop due to large bursts of Classic traffic (e.g. a Classic Reno | drop due to large bursts of Classic traffic (e.g. a Classic Reno | |||
| TCP during slow-start over a long RTT). | TCP during slow-start over a long RTT). | |||
| c. There has been some concern that using the step function of DCTCP | c. There has been some concern that using the step function of DCTCP | |||
| for the Native L4S AQM requires end-systems to smooth the signal | for the Native L4S AQM requires end-systems to smooth the signal | |||
| for an unnecessarily large number of round trips to ensure | for an unnecessarily large number of round trips to ensure | |||
| sufficient fidelity. A ramp is no worse than a step in initial | sufficient fidelity. A ramp is no worse than a step in initial | |||
| experiments with existing DCTCP. Therefore, it is recommended | experiments with existing DCTCP. Therefore, it is recommended | |||
| that a ramp is configured in place of a step, which will allow | that a ramp is configured in place of a step, which will allow | |||
| skipping to change at page 47, line 30 ¶ | skipping to change at page 46, line 44 ¶ | |||
| effectively turn the ramp into a step function, as used by DCTCP, | effectively turn the ramp into a step function, as used by DCTCP, | |||
| by setting the range to zero. There will not be a divide by zero | by setting the range to zero. There will not be a divide by zero | |||
| problem at line 5 of Figure 5 because, if minTh is equal to | problem at line 5 of Figure 5 because, if minTh is equal to | |||
| maxTh, the condition for this ramp calculation cannot arise. | maxTh, the condition for this ramp calculation cannot arise. | |||
| A.2. Pass #2: Edge-Case Details | A.2. Pass #2: Edge-Case Details | |||
| This section takes a second pass through the pseudocode adding | This section takes a second pass through the pseudocode adding | |||
| details of two edge-cases: low link rate and overload. Figure 7 | details of two edge-cases: low link rate and overload. Figure 7 | |||
| repeats the dequeue function of Figure 4, but with details of both | repeats the dequeue function of Figure 4, but with details of both | |||
| edge-cases added. Similarly Figure 8 repeats the core PI algorithm | edge-cases added. Similarly, Figure 8 repeats the core PI algorithm | |||
| of Figure 6, but with overload details added. The initialization, | of Figure 6, but with overload details added. The initialization, | |||
| enqueue, L4S AQM and recur functions are unchanged. | enqueue, L4S AQM and recur functions are unchanged. | |||
| The link rate can be so low that it takes a single packet queue | The link rate can be so low that it takes a single packet queue | |||
| longer to serialize than the threshold delay at which ECN marking | longer to serialize than the threshold delay at which ECN marking | |||
| starts to be applied in the L queue. Therefore, a minimum marking | starts to be applied in the L queue. Therefore, a minimum marking | |||
| threshold parameter in units of packets rather than time is necessary | threshold parameter in units of packets rather than time is necessary | |||
| (Th_len, default 1 packet in line 19 of Figure 2) to ensure that the | (Th_len, default 1 packet in line 19 of Figure 2) to ensure that the | |||
| ramp does not trigger excessive marking on slow links. Where an | ramp does not trigger excessive marking on slow links. Where an | |||
| implementation knows the link rate, it can set up this minimum at the | implementation knows the link rate, it can set up this minimum at the | |||
| skipping to change at page 51, line 10 ¶ | skipping to change at page 50, line 10 ¶ | |||
| 4: p_CL = p' * k % Coupled L4S prob = base prob * coupling factor | 4: p_CL = p' * k % Coupled L4S prob = base prob * coupling factor | |||
| 5: p_C = p'^2 % Classic prob = (base prob)^2 | 5: p_C = p'^2 % Classic prob = (base prob)^2 | |||
| 6: prevq = curq | 6: prevq = curq | |||
| 7: } | 7: } | |||
| Figure 8: Example PI-Update Pseudocode for DualQ Coupled PI2 AQM | Figure 8: Example PI-Update Pseudocode for DualQ Coupled PI2 AQM | |||
| (Including Overload Code) | (Including Overload Code) | |||
| The choice of scheduler technology is critical to overload protection | The choice of scheduler technology is critical to overload protection | |||
| (see Section 4.2.2). | (see Section 4.2.2). | |||
| * A well-understood weighted scheduler such as weighted round robin | * A well-understood weighted scheduler such as weighted round-robin | |||
| (WRR) is recommended. As long as the scheduler weight for Classic | (WRR) is recommended. As long as the scheduler weight for Classic | |||
| is small (e.g. 1/16), its exact value is unimportant because it | is small (e.g. 1/16), its exact value is unimportant because it | |||
| does not normally determine capacity shares. The weight is only | does not normally determine capacity shares. The weight is only | |||
| important to prevent unresponsive L4S traffic starving Classic | important to prevent unresponsive L4S traffic starving Classic | |||
| traffic in the short term (see Section 4.2.2). This is because | traffic in the short term (see Section 4.2.2). This is because | |||
| capacity sharing between the queues is normally determined by the | capacity sharing between the queues is normally determined by the | |||
| coupled congestion signal, which overrides the scheduler, by | coupled congestion signal, which overrides the scheduler, by | |||
| making L4S sources leave roughly equal per-flow capacity available | making L4S sources leave roughly equal per-flow capacity available | |||
| for Classic flows. | for Classic flows. | |||
| skipping to change at page 59, line 45 ¶ | skipping to change at page 58, line 45 ¶ | |||
| 13: continue % continue to the top of the while loop | 13: continue % continue to the top of the while loop | |||
| 14: } | 14: } | |||
| 15: mark(pkt) | 15: mark(pkt) | |||
| 16: } | 16: } | |||
| 17: } | 17: } | |||
| 18: return(pkt) % return the packet and stop here | 18: return(pkt) % return the packet and stop here | |||
| 19: } | 19: } | |||
| 20: return(NULL) % no packet to dequeue | 20: return(NULL) % no packet to dequeue | |||
| 21: } | 21: } | |||
| Figure 11: Optimised Example Dequeue Pseudocode for Coupled DualQ | Figure 11: Optimised Example Dequeue Pseudocode for DualQ Coupled | |||
| AQM using Integer Arithmetic | AQM using Integer Arithmetic | |||
| The two ranges, range_L and range_C are expressed as powers of 2 so | The two ranges, range_L and range_C are expressed as powers of 2 so | |||
| that division can be implemented as a right bit-shift (>>) in lines 5 | that division can be implemented as a right bit-shift (>>) in lines 5 | |||
| and 10 of the integer variant of the pseudocode (Figure 11). | and 10 of the integer variant of the pseudocode (Figure 11). | |||
| For the integer variant of the pseudocode, an integer version of the | For the integer variant of the pseudocode, an integer version of the | |||
| rand() function used at line 25 of the maxrand(function) in Figure 10 | rand() function used at line 25 of the maxrand(function) in Figure 10 | |||
| would be arranged to return an integer in the range 0 <= maxrand() < | would be arranged to return an integer in the range 0 <= maxrand() < | |||
| 2^32 (not shown). This would scale up all the floating point | 2^32 (not shown). This would scale up all the floating point | |||
| skipping to change at page 62, line 38 ¶ | skipping to change at page 61, line 38 ¶ | |||
| p_C = ( p_CL / k )^2 (1) | p_C = ( p_CL / k )^2 (1) | |||
| k* = 1.64 * (R_C / R_L) (7) | k* = 1.64 * (R_C / R_L) (7) | |||
| We say that this coupling factor is theoretical, because it is in | We say that this coupling factor is theoretical, because it is in | |||
| terms of two RTTs, which raises two practical questions: i) for | terms of two RTTs, which raises two practical questions: i) for | |||
| multiple flows with different RTTs, the RTT for each traffic class | multiple flows with different RTTs, the RTT for each traffic class | |||
| would have to be derived from the RTTs of all the flows in that class | would have to be derived from the RTTs of all the flows in that class | |||
| (actually the harmonic mean would be needed); ii) a network node | (actually the harmonic mean would be needed); ii) a network node | |||
| cannot easily know the RTT of any of the flows anyway. | cannot easily know the RTT of the flows anyway. | |||
| RTT-dependence is caused by window-based congestion control, so it | RTT-dependence is caused by window-based congestion control, so it | |||
| ought to be reversed there, not in the network. Therefore, we use a | ought to be reversed there, not in the network. Therefore, we use a | |||
| fixed coupling factor in the network, and reduce RTT-dependence in | fixed coupling factor in the network, and reduce RTT-dependence in | |||
| L4S senders. We cannot expect Classic senders to all be updated to | L4S senders. We cannot expect Classic senders to all be updated to | |||
| reduce their RTT-dependence. But solely addressing the problem in | reduce their RTT-dependence. But solely addressing the problem in | |||
| L4S senders at least makes RTT-dependence no worse - not just between | L4S senders at least makes RTT-dependence no worse - not just between | |||
| L4S senders, but also between L4S and Classic senders. | L4S senders, but also between L4S and Classic senders. | |||
| Traditionally, throughput equivalence has been defined for flows | Traditionally, throughput equivalence has been defined for flows | |||
| skipping to change at page 64, line 34 ¶ | skipping to change at page 63, line 34 ¶ | |||
| ~= 0.85 * (R_bC + target) / (1.22 * max(R_bL, R_typ)) | ~= 0.85 * (R_bC + target) / (1.22 * max(R_bL, R_typ)) | |||
| ~= (R_bC + target) / (1.4 * max(R_bL, R_typ)) | ~= (R_bC + target) / (1.4 * max(R_bL, R_typ)) | |||
| It can be seen that, for base RTTs below target (15 ms), both the | It can be seen that, for base RTTs below target (15 ms), both the | |||
| numerator and the denominator plateau, which has the desired effect | numerator and the denominator plateau, which has the desired effect | |||
| of limiting RTT-dependence. | of limiting RTT-dependence. | |||
| At the start of the above derivations, an explanation was promised | At the start of the above derivations, an explanation was promised | |||
| for why the L4S throughput equation in equation (6) did not need to | for why the L4S throughput equation in equation (6) did not need to | |||
| model RTT-independence. This is because we only use one point - at | model RTT-independence. This is because we only use one point - at | |||
| the the typical base RTT where the operator chooses to calculate the | the typical base RTT where the operator chooses to calculate the | |||
| coupling factor. Then, throughput equivalence will at least hold at | coupling factor. Then, throughput equivalence will at least hold at | |||
| that chosen point. Nonetheless, assuming Prague senders implement | that chosen point. Nonetheless, assuming Prague senders implement | |||
| RTT-independence over a range of RTTs below this, the throughput | RTT-independence over a range of RTTs below this, the throughput | |||
| equivalence will then extend over that range as well. | equivalence will then extend over that range as well. | |||
| Congestion control designers can choose different ways to reduce RTT- | Congestion control designers can choose different ways to reduce RTT- | |||
| dependence. And each operator can make a policy choice to decide on | dependence. And each operator can make a policy choice to decide on | |||
| a different base RTT, and therefore a different k, at which it wants | a different base RTT, and therefore a different k, at which it wants | |||
| throughput equivalence. Nonetheless, for the Internet, it makes | throughput equivalence. Nonetheless, for the Internet, it makes | |||
| sense to choose what is believed to be the typical RTT most users | sense to choose what is believed to be the typical RTT most users | |||
| skipping to change at page 65, line 8 ¶ | skipping to change at page 64, line 8 ¶ | |||
| derived from a typical RTT for the Internet. | derived from a typical RTT for the Internet. | |||
| As a non-Internet example, for localized traffic from a particular | As a non-Internet example, for localized traffic from a particular | |||
| ISP's data centre, using the measured RTTs, it was calculated that a | ISP's data centre, using the measured RTTs, it was calculated that a | |||
| value of k = 8 would achieve throughput equivalence, and experiments | value of k = 8 would achieve throughput equivalence, and experiments | |||
| verified the formula very closely. | verified the formula very closely. | |||
| But, for a typical mix of RTTs across the general Internet, a value | But, for a typical mix of RTTs across the general Internet, a value | |||
| of k=2 is recommended as a good workable compromise. | of k=2 is recommended as a good workable compromise. | |||
| Acknowledgements | ||||
| Thanks to Anil Agarwal, Sowmini Varadhan, Gabi Bracha, Nicolas Kuhn, | ||||
| Greg Skinner, Tom Henderson, David Pullen, Mirja Kuehlewind, Gorry | ||||
| Fairhurst, Pete Heist, Ermin Sakic and Martin Duke for detailed | ||||
| review comments particularly of the appendices and suggestions on how | ||||
| to make the explanations clearer. Thanks also to Tom Henderson for | ||||
| insights on the choice of schedulers and queue delay measurement | ||||
| techniques. And thanks to the area reviewers Christer Holmberg, Lars | ||||
| Eggert and Roman Danyliw. | ||||
| The early contributions of Koen De Schepper, Bob Briscoe, Olga | ||||
| Bondarenko and Inton Tsang were part-funded by the European Community | ||||
| under its Seventh Framework Programme through the Reducing Internet | ||||
| Transport Latency (RITE) project (ICT-317700). Contributions of Koen | ||||
| De Schepper and Olivier Tilmans were also part-funded by the 5Growth | ||||
| and DAEMON EU H2020 projects. Bob Briscoe's contribution was also | ||||
| part-funded by the Comcast Innovation Fund and the Research Council | ||||
| of Norway through the TimeIn project. The views expressed here are | ||||
| solely those of the authors. | ||||
| Contributors | ||||
| The following contributed implementations and evaluations that | ||||
| validated and helped to improve this specification: | ||||
| Olga Albisser <olga@albisser.org> of Simula Research Lab, Norway | ||||
| (Olga Bondarenko during early drafts) implemented the prototype | ||||
| DualPI2 AQM for Linux with Koen De Schepper and conducted | ||||
| extensive evaluations as well as implementing the live performance | ||||
| visualization GUI [L4Sdemo16]. | ||||
| Olivier Tilmans <olivier.tilmans@nokia-bell-labs.com> of Nokia | ||||
| Bell Labs, Belgium prepared and maintains the Linux implementation | ||||
| of DualPI2 for upstreaming. | ||||
| Shravya K.S. wrote a model for the ns-3 simulator based on the -01 | ||||
| version of this Internet-Draft. Based on this initial work, Tom | ||||
| Henderson <tomh@tomh.org> updated that earlier model and created a | ||||
| model for the DualQ variant specified as part of the Low Latency | ||||
| DOCSIS specification, as well as conducting extensive evaluations. | ||||
| Ing Jyh (Inton) Tsang of Nokia, Belgium built the End-to-End Data | ||||
| Centre to the Home broadband testbed on which DualQ Coupled AQM | ||||
| implementations were tested. | ||||
| Authors' Addresses | Authors' Addresses | |||
| Koen De Schepper | Koen De Schepper | |||
| Nokia Bell Labs | Nokia Bell Labs | |||
| Antwerp | Antwerp | |||
| Belgium | Belgium | |||
| Email: koen.de_schepper@nokia.com | Email: koen.de_schepper@nokia.com | |||
| URI: https://www.bell-labs.com/usr/koen.de_schepper | URI: https://www.bell-labs.com/about/researcher-profiles/ | |||
| koende_schepper/ | ||||
| Bob Briscoe (editor) | Bob Briscoe (editor) | |||
| Independent | Independent | |||
| United Kingdom | United Kingdom | |||
| Email: ietf@bobbriscoe.net | Email: ietf@bobbriscoe.net | |||
| URI: http://bobbriscoe.net/ | URI: https://bobbriscoe.net/ | |||
| Greg White | Greg White | |||
| CableLabs | CableLabs | |||
| Louisville, CO, | Louisville, CO, | |||
| United States of America | United States of America | |||
| Email: G.White@CableLabs.com | Email: G.White@CableLabs.com | |||
| End of changes. 78 change blocks. | ||||
| 235 lines changed or deleted | 243 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ | ||||