<?xml version="1.0" encoding="US-ASCII"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd">
<rfc category="std" docName="draft-briscoe-tsvwg-ecn-tunnel-00" ipr="full3978">
  <?xml-stylesheet type='text/xsl' href='http://xml.resource.org/authoring/rfc2629.xslt' ?>

  <!-- Alterations to I-D/RFC boilerplate -->

  <?rfc private="" ?>

  <!-- Default private="" Produce an internal memo 2.5pp shorter than an I-D or RFC -->

  <?rfc rfcprocack="yes" ?>

  <!-- Default rfcprocack="no" add a short sentence acknowledging xml2rfc -->

  <?rfc strict="no" ?>

  <!-- Default strict="no" Don't check I-D nits -->

  <?rfc rfcedstyle="no" ?>

  <!-- Default rfcedstyle="yes" attempt to closely follow finer details from the latest observable RFC-Editor style -->

  <!-- IETF process -->

  <?rfc iprnotified="no" ?>

  <!-- Default iprnotified="no" I haven't disclosed existence of IPR to IETF -->

  <!-- ToC format -->

  <?rfc toc="yes" ?>

  <!-- Default toc="no" No Table of Contents -->

  <!-- Cross referencing, footnotes, comments -->

  <?rfc symrefs="yes"?>

  <!-- Default symrefs="no" Don't use anchors, but use numbers for refs -->

  <?rfc sortrefs="yes"?>

  <!-- Default sortrefs="no" Don't sort references into order -->

  <?rfc comments="yes" ?>

  <!-- Default comments="no" Don't render comments -->

  <?rfc inline="yes" ?>

  <!-- Default inline="no" if comments is "yes", then render comments inline; otherwise render them in an `Editorial Comments' section -->

  <!-- Pagination control -->

  <?rfc compact="yes"?>

  <!-- Default compact="no" Start sections on new pages -->

  <?rfc subcompact="no"?>

  <!-- Default subcompact="(as compact setting)" yes/no is not quite as compact as yes/yes -->

  <!-- HTML formatting control -->

  <?rfc emoticonic="yes" ?>

  <!-- Default emoticonic="no" Doesn't prettify HTML format -->

  <front>
    <title abbrev="ECN Tunnelling">Layered Encapsulation of Congestion
    Notification</title>

    <author fullname="Bob Briscoe" initials="B." surname="Briscoe">
      <organization>BT</organization>

      <address>
        <postal>
          <street>B54/77, Adastral Park</street>

          <street>Martlesham Heath</street>

          <city>Ipswich</city>

          <code>IP5 3RE</code>

          <country>UK</country>
        </postal>

        <phone>+44 1473 645196</phone>

        <email>bob.briscoe@bt.com</email>

        <uri>http://www.cs.ucl.ac.uk/staff/B.Briscoe/</uri>
      </address>
    </author>

    <date day="30" month="June" year="2007" />

    <area>Transport</area>

    <workgroup>Transport Area Working Group</workgroup>

    <keyword>Congestion Control and Management</keyword>

    <keyword>Congestion Notification</keyword>

    <keyword>Information Security</keyword>

    <keyword>Tunnelling</keyword>

    <keyword>Protocol</keyword>

    <keyword>ECN</keyword>

    <keyword>IPsec</keyword>

    <abstract>
      <t>This document redefines how the explicit congestion notification
      (ECN) field of the outer IP header of a tunnel should be constructed. It
      brings all IP in IP tunnels (v4 or v6) into line with the way IPsec
      tunnels now construct the ECN field, ensuring that the outer header
      reveals any congestion experienced so far on the path. It specifies the
      default ECN tunneling behaviour for any Diffserv per-hop behaviour
      (PHB), but also gives general principles to guide the design of
      alternate congestion marking behaviours for specific PHBs and for lower
      layer congestion notification schemes.</t>
    </abstract>

    <!-- ================================================================ -->
  </front>

  <middle>
    <!-- ================================================================ -->

    <section anchor="ecnpush_Introduction" title="Introduction">
      <t>This document redefines how the explicit congestion notification
      (ECN) field <xref target="RFC3168"></xref> of the outer IP header of a
      tunnel should be constructed. It brings all IP in IP tunnels (v4 or v6)
      into line with the way IPsec tunnels <xref target="RFC4301"></xref> now
      construct the ECN field, ensuring that the outer header reveals any
      congestion experienced so far on the path. Although this memo focuses on
      IP in IP tunnelling it also gives generalised advice for any
      encapsulation by lower layer headers. </t>

      <t>ECN allows a congested resource to notify the onset of congestion
      without having to drop packets, by explicitly marking a proportion of
      packets with the congestion experienced (CE) codepoint. Congestion
      notification is unusual in that it propagates from the physical layer
      upwards to the transport layer, because congestion is exhaustion of a
      physical resource. The transport layer can directly detect loss of a
      packet (or frame) by a lower layer. But if a lower layer marks a packet
      (or frame) to notify incipient congestion, this marking has to be
      explicitly copied up the layers at every header decapsulation. So, at
      each decapsulation of an outer (lower layer) header a congestion marking
      has to be arranged to propagate into the forwarded (upper layer) header.
      It must continue upwards until it reaches the destination transport,
      which should feed congestion notification back to the source
      transport.</t>

      <t>Note that often lower layer resources are arranged to be protected by
      higher layer buffers, so instead of blocking occurring at the lower
      layer, it occurs when the higher layer queue overflows. Thus,
      non-blocking link and physical layer technologies do not have to
      implement congestion notification, which can be introduced solely in IP
      layer active queue management (AQM). However, if we want to use
      congestion notification, we have to arrange for it to be explicitly
      copied up the layers when IP is tunnelled in IP (and if a particular
      link layer technology isn't protected from blocking by network layer
      queues).</t>

      <t>IPsec tunnel mode is a specific form of tunnelling that can hide the
      inner headers. Because the ECN field has to be mutable, it cannot be
      covered by IPsec encryption or authentication calculations. Therefore
      concern has been raised in the past that the ECN field could be used as
      a low bandwidth covert channel to communicate with someone on the
      unprotected public Internet even if an end-host is restricted to only
      communicate with the public Internet through an IPsec gateway. However,
      the recently updated version of IPsec <xref target="RFC4301"></xref>
      chose not to block this covert channel, deciding that the threat could
      be managed given the channel bandwidth is so limited (ECN is a 2-bit
      field).</t>

      <t>An unfortunate sequence of standards actions leading up to this
      latest change in IPsec has left us with nearly the worst of all possible
      combinations of outcomes, despite the best endeavours of everyone
      concerned. Even though information about congestion experienced on the
      upstream path has various uses if it is revealed in the outer header of
      a tunnel, when ECN was standardised<xref target="RFC3168"></xref> it was
      decided that all IP in IP tunnels should hide upstream congestion
      information simply to avoid the extra complexity of two different
      mechanisms for IPsec and non-IPsec tunnels. However, now that <xref
      target="RFC4301"></xref> IPsec tunnels deliberately no longer hide this
      information, we are left in the perverse position where non-IPsec
      tunnels still hide congestion information unnecessarily. This document
      is designed to correct that anomaly.</t>

      <t>Specifically, RFC3168 says that, if a tunnel supports ECN (termed a
      'full-functionality' ECN tunnel), the tunnel ingress must not copy a CE
      marking from the inner header into the outer header that it creates.
      Instead the tunnel ingress has to set the ECN field of the outer header
      to ECT(0) (i.e. codepoint 10). We term this 'resetting' a CE codepoint.
      However, RFC4301 reverses this, stating that the tunnel ingress must
      simply copy the ECN field from the inner to the outer header. The main
      purpose of this document is to carry over this new relaxed attitude to
      covert channels from IPsec to all IP in IP tunnels, so all tunnel
      ingress nodes consistently copy the ECN field.</t>

      <t>The rest of the document deals with the knock-on effects of this
      apparently minor change. It is organised as follows: </t>

      <t><list style="symbols">
          <t>&sect;5 of RFC3168 permits the Diffserv codepoint (DSCP)<xref
          target="RFC2474"></xref> to 'switch in' different behaviours for
          marking the ECN field, just as it switches in different per-hop
          behaviours (PHBs) for scheduling. Therefore we cannot only discuss
          the ECN protocol that RFC3168 gives as a default. We need to also
          give guidance for possible different marking schemes. Therefore in
          <xref target="ecnpush_Design_Constraints"></xref> we lay out the
          design constraints when tunneling congestion notification.</t>

          <t>Then in <xref target="ecnpush_Design_Principles"></xref> we
          resolve the tensions between these constraints to give general
          design principles on how a tunnel should process congestion
          notification; principles that could apply to any marking behaviour
          for any PHB, not just the default in RFC3168. In particular, we
          examine the underlying principles behind whether CE should be reset
          or copied into the outer header at the ingress to a tunnel&mdash;or
          indeed at the ingress of any layered encapsulation of headers with
          congestion notification fields. </t>

          <t><xref target="ecnpush_ECN_Tunnel_Rules"></xref> then confirms the
          precise rules for the default ECN tunnelling behaviour based on the
          above design principles. These rules apply to all PHBs, unless
          stated otherwise in the specification of a PHB. There is no
          requirement for a PHB to state anything about ECN behaviour if the
          default behaviour is sufficient.</t>

          <t>Extending the new IPsec tunnel ingress behaviour to all IP in IP
          tunnels causes one further knock-on effect that is dealt with in
          <xref target="ecnpush_Backward_Compatibility"></xref> on Backward
          Compatibility. If one end of an IPsec tunnel is compliant with <xref
          target="RFC4301"></xref>, assuming IKEv2 key management is used, the
          other end can be guaranteed to also be <xref
          target="RFC4301"></xref> compliant. So there is no backward
          compatibility problem with IKEv2 RFC4301 IPsec tunnels. But once we
          extend our scope to any IP in IP tunnel, we have to cater for the
          possibility that a tunnel ingress compliant with this specification
          is sending to an egress that doesn't even understand ECN (e.g. a
          legacy <xref target="RFC2003"></xref> tunnel egress). If a tunnel
          ingress copied incoming ECN-capable headers into outer headers, then
          a legacy tunnel egress would discard any congestion markings added
          to the outer header within the tunnel. ECN-capable traffic sources
          would not see any congestion feedback and instead continually
          ratchet up their share of the bandwidth without realising that
          cross-flows from other ECN sources were continually having to
          ratchet down.</t>
        </list></t>

      <t>The scope of this document is all IP in IP tunnelling, irrespective
      of whether IPv4 or IPv6 is used for either of the inner and outer
      headers. The document only concerns wire protocol processing at tunnel
      endpoints and makes no changes or recommendations concerning algorithms
      for congestion marking or congestion response. The general design
      principles of <xref target="ecnpush_Design_Principles"></xref> may also
      be useful when any datagram/packet/frame with a congestion notification
      capability is encapsulated by a connectionless outer header <xref
      target="BBnet"></xref> that might also support a congestion notification
      capability in the future as discussed in &sect;9.3 of <xref
      target="RFC3168"></xref> (e.g. IP encapsulated in L2TP <xref
      target="RFC2661"></xref>, GRE <xref target="RFC1701"></xref> or PPTP
      <xref target="RFC2637"></xref>). However, of course, the IETF does not
      have standards authority over every link or tunnel protocol, so this
      document focuses only on IP in IP. <xref
      target="I-D.ietf-tsvwg-ecn-mpls"></xref> applies these principles to IP
      in MPLS and to MPLS in MPLS.</t>
    </section>

    <!-- ================================================================ -->

    <section anchor="ecnpush_Reqs_notation" title="Requirements notation">
      <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
      "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
      document are to be interpreted as described in <xref
      target="RFC2119"></xref>.</t>
    </section>

    <!-- ================================================================ -->

    <section anchor="ecnpush_Design_Constraints" title="Design Constraints">
      <t>Tunnel processing of a congestion notification field has to meet
      congestion control needs without creating new information security
      vulnerabilities (if information security is required).</t>

      <section anchor="ecnpush_Security_Constraints"
               title="Security Constraints">
        <t>Information security can be assured by using various end to end
        security solutions (including IPsec in transport mode <xref
        target="RFC4301"></xref>), but a commonly used scenario involves the
        need to communicate between two physically protected domains across
        the public Internet. In this case there are certain management
        advantages to using IPsec in tunnel mode solely across the publicly
        accessible part of the path. The path followed by a packet then
        crosses security 'domains'; the ones protected by physical or other
        means before and after the tunnel and the one protected by an IPsec
        tunnel across the otherwise unprotected domain. We will use the
        scenario in <xref target="ecnpush_Fig_IPsec_Tunnel_Scenario"></xref>
        where endpoints 'A' and 'B' communicate through a tunnel with ingress
        'I' and egress 'E' within physically protected edge domains across an
        unprotected internetwork where there may be 'men in the middle',
        M.</t>

        <?rfc needLines="12"?>

        <figure anchor="ecnpush_Fig_IPsec_Tunnel_Scenario"
                title="IPsec Tunnel Scenario">
          <preamble></preamble>

          <artwork><![CDATA[          physically       unprotected     physically 
      <-protected domain-><--domain--><-protected domain->
      +------------------+            +------------------+
      |                  |      M     |                  |
      |    A-------->I=========>==========>E-------->B   |
      |                  |            |                  |
      +------------------+            +------------------+
                     <----IPsec secured---->
                             tunnel
]]></artwork>

          <postamble></postamble>
        </figure>

        <t>IPsec encryption is typically used to prevent 'M' seeing messages
        from 'A' to 'B'. IPsec authentication is used to prevent 'M'
        masquerading as the sender of messages from 'A' to 'B' or altering
        their contents. But 'I' can also use IPsec tunnel mode to allow 'A' to
        communicate with 'B', but impose encryption to prevent 'A' leaking
        information to 'M'. Or 'E' can insist that 'I' uses tunnel mode
        authentication to prevent 'M' communicating information to 'B'.
        Mutable IP header fields such as the ECN field (as well as the TTL/Hop
        Limit and DS fields) cannot be included in the cryptographic
        calculations of IPsec. Therefore, if 'I' encrypts but copies these
        mutable fields into the outer header that is exposed across the tunnel
        it will have allowed a covert channel from 'A' to M. And if 'E' copies
        these fields from the outer header to the inner, even if it validates
        authentication from 'I', it will have allowed a covert channel from
        'M' to 'B'.</t>

        <t>ECN at the IP layer is designed to carry information about
        congestion from a congested resource to some downstream node that will
        feed the information back somehow to the point upstream of the
        congestion that can regulate the load on the congested resource. In
        terms of the above scenario, ECN is effectively intended to create an
        information channel from 'M' to 'B', for 'B' to forward to 'A'.
        Therefore the goals of IPsec and ECN are mutually incompatible.</t>

        <!--{ToDo: Include also the issue of ECN capability, not just CE markings, in the trade-off discussion above.}-->

        <t>With respect to the DS or ECN fields, &sect;5.1.2 of RFC4301 says,
        "controls are provided to manage the bandwidth of this [covert]
        channel". Using the ECN processing rules of RFC4301, the channel
        bandwidth is two bits per datagram from 'A' to 'M' and one bit per
        datagram from 'M' to 'A' because 'E' limits the combinations it will
        copy. In both cases the covert channel bandwidth is further reduced by
        noise from any real congestion marking. RFC4301 therefore implies that
        these covert channels are sufficiently limited to be considered a
        manageable threat. However, with respect to the larger (6b) DS field,
        the same section of RFC4301 says not copying is the default, but a
        configuration option can allow copying "to allow a local administrator
        to decide whether the covert channel provided by copying these bits
        outweighs the benefits of copying". Of course, an administrator
        considering copying of the DS field has to take into account that it
        could be concatenated with the ECN field giving an 8b per datagram
        channel.</t>
      </section>

      <section anchor="ecnpush_Ctrl_Constraints" title="Control Constraints">
        <t>Congestion control requires that any congestion notification marked
        into packets by a resource will be able to traverse a feedback loop
        back to a node capable of controlling the load on that resource. To
        avoid ambiguity later rather than calling this node the data source we
        will call it the Load Regulator. This will allow us to deal with
        exceptional cases where load is not regulated by the data source, but
        usually the two will be synonymous. Note the term "a node <spanx
        style="emph">capable of</spanx> controlling the load" deliberately
        includes a source application that doesn't actually control the load
        but ought to (e.g. an application without congestion control that uses
        UDP).</t>

        <?rfc needLines="4"?>

        <figure anchor="ecnpush_Fig_Tunnel_Scenario"
                title="Simple Tunnel Scenario">
          <preamble></preamble>

          <artwork><![CDATA[           A--->R--->I=========>M=========>E-------->B
]]></artwork>

          <postamble></postamble>
        </figure>

        <t>We now consider a similar tunneling scenario to the IPsec one just
        described, but without the different security domains so we can just
        focus on ensuring the control loop and management monitoring can work
        (<xref target="ecnpush_Fig_Tunnel_Scenario"></xref>). If we want
        resources in the tunnel to be able to explicitly notify congestion and
        the feedback loop is from 'B' to 'A', it will certainly be necessary
        for 'E' to copy any CE marking from the outer header to the inner
        header for onward transmission to 'B', otherwise congestion
        notification from resources like 'M' cannot be fed back to the Load
        Regulator ('A'). But it doesn't seem necessary for 'I' to copy CE
        markings from the inner to the outer header. For instance, if resource
        'R' is congested, it can send congestion information to 'B' using the
        congestion field in the inner header without 'I' copying the
        congestion field into the outer header and 'E' copying it back to the
        inner header. 'E' can then write any additional congestion marking
        introduced across the tunnel into the congestion field of the inner
        header.</t>

        <t>Indeed, this arrangement can be extended to multi-level congestion
        marking (such as that proposed for PCN <xref
        target="PCN-arch"></xref>) as long as all the marks have unambiguously
        ranked values. For instance, if a hypothetical multi-level marking
        scheme for PCN had PCN-capable codepoints ranked 1, 2 and 3, then, if
        'I' reset the outer congestion field to the lowest ranked value that
        is PCN-capable (1), 'E' would simply write the highest ranked of the
        inner and outer congestion markings into the forwarded header. For
        instance, if the inner marking on arrival at 'I' was 3 and 'I' reset
        the outer to 1, but 'M' subsequently set it to 2, then the header
        forwarded by 'E' would be max(3,2) = 3.</t>

        <t>It might be useful for the tunnel egress to be able to tell whether
        congestion occurred across a tunnel or upstream of it. If outer header
        congestion marking was reset at the tunnel ingress ('I'), by the end
        of a tunnel ('E') the outer headers would indicate congestion
        experienced across the tunnel ('I' to 'E'), while the inner header
        would indicate congestion upstream of 'I'. But the same information
        could be gleaned even if the tunnel ingress copied the inner to the
        outer headers. By the end of the tunnel ('E'), any packet with an
        <spanx style="emph">extra</spanx> mark in the outer header relative to
        the inner header would indicate congestion across the tunnel ('I' to
        'E'), while the inner header would still indicate congestion upstream
        of ('I').</t>

        <t>All this shows that 'E' can preserve the control loop irrespective
        of whether 'I' copies congestion notification into the outer header or
        resets it.</t>
      </section>

      <section anchor="ecnpush_Mgmt_Constraints"
               title="Management Constraints">
        <t>As well as control, there are also management constraints.
        Specifically, a management system may monitor congestion markings in
        passing packets, perhaps at the border between networks as part of a
        service level agreement. For instance, monitors at the borders of
        autonomous systems may need to measure how much congestion has
        accumulated since the original source to determine between them how
        much of the congestion is contributed by each domain.</t>

        <t>Therefore it should be clear how far back in the path the
        congestion markings have accumulated from. In this document we term
        this the baseline of the congestion marking, i.e. the source of the
        layer that last reset rather than copied the congestion notification
        field when creating an outer header. Given some tunnels cross domain
        borders (e.g. consider M in <xref
        target="ecnpush_Fig_Tunnel_Scenario"></xref> is monitoring a border),
        it is therefore desirable for 'I' to copy congestion accumulated so
        far into the outer headers exposed across the tunnel.</t>

        <t><xref target="ecnpush_In-path_Load_Regulation"></xref> discusses
        various scenarios where the Load Regulator lies in-path, not at the
        source host as we would typically expect. It concludes that the
        baseline for congestion notification should be determined by where the
        Load Regulator function is, whether it is at the source host or within
        the path. Therefore every tunnel ingress should copy the ECN field
        into the outer header it creates unless it is also a Load Regulator,
        in which case it should reset any CE markings, which is an exception
        to the normal copying rule for a tunnel ingress.</t>
      </section>
    </section>

    <section anchor="ecnpush_Design_Principles" title="Design Principles">
      <t>The constraints from the three perspectives of security, control and
      management in <xref target="ecnpush_Design_Constraints"></xref> are
      somewhat in tension as to whether a tunnel ingress should copy
      congestion markings into the outer header it creates or reset them. From
      the control perspective either copying or resetting works. From the
      management perspective copying is preferable (with the exception of an
      in-path load regulator). From the security perspective resetting is
      preferable but copying is now considered acceptable given the bandwidth
      of a 2-bit covert channel can be managed.</t>

      <t>Therefore an outer encapsulating header capable of carrying
      congestion markings SHOULD reflect accumulated congestion since the last
      interface designed to regulate load (the Load Regulator). This implies
      congestion notification SHOULD be copied into the outer header of each
      new encapsulating header that supports it&mdash;except at an in-path
      Load Regulator. An in-path Load Regulator knows its function is to
      regulate load, so if it also acts as the ingress to a tunnel, in every
      new outer header it creates it MUST reset any congestion marking.</t>

      <t>The Load Regulator is the node to which congestion feedback should be
      returned by the next downstream node with a transport layer function
      (typically but not always the data receiver). The Load Regulator is not
      always (or even typically) the same thing as the node identified by the
      source address of the outermost exposed header. In general the
      addressing of the outermost encapsulation header says nothing about the
      identifiers of either the upstream or the downstream transport layer
      functions. As long as the transport functions know each other's
      addresses, they don't have to be identified in the network layer or in
      any link layer. It was only a convenience that a TCP receiver assumed
      that the address of the source transport is the same as the network
      layer source address of a packet it receives.</t>

      <t>More generally, the return transport address could be identified
      solely in the transport layer protocol. For instance, a signalling
      protocol like RSVP <xref target="RFC2205"></xref> breaks up a path into
      transport layer hops and informs each hop of the address of its
      transport layer neighbour without any need to identify these hops in the
      network layer. RSVP can be arranged so that these transport layer hops
      are bigger than the underlying network layer hops. The host identity
      protocol (HIP) architecture <xref target="RFC4423"></xref> also supports
      the same principled separation (for mobility amongst other things),
      where the transport layer receiver identifies the transport layer sender
      using an identifier provided by the transport layer, which gets mapped
      to a network layer address below the transport layer.</t>

      <t>Note that this principle deliberately doesn't require a packet header
      to reveal the origin address of the baseline that congestion
      notification has accumulated from. It is not necessary for the network
      and lower layers to know the address of the Load Regulator. Only the
      destination transport needs to know that. With congestion notification,
      the network and link layers only notify congestion forwards, they aren't
      involved in feeding it backwards. If they are, e.g. backward congestion
      notification (BCN) in Ethernet <xref target="802.1au"></xref>, that
      should be considered as a transport function added to the lower layer,
      which must sort out its own addressing. Indeed, this is one reason why
      ICMP source quench is now deprecated <xref target="RFC1254"></xref>;
      when congestion occurs within a tunnel it is complex (particularly in
      the case of IPsec tunnels) to return the ICMP messages beyond the tunnel
      ingress back to the Load Regulator .</t>

      <t>Similarly, if a management system is monitoring congestion and needs
      to know the baseline of congestion notification, the management system
      has to find this out from the transport; in general it cannot tell
      solely by looking at the network or link layer headers.</t>

      <t>We have said that a tunnel ingress that is not a Load Regulator
      SHOULD (as opposed to MUST) copy incoming congestion notification into
      an outer encapsulating header that supports it. In the case of 2-bit
      ECN, the IETF security area have deemed the benefit always outweighs the
      risk. Therefore for 2-bit ECN we can and we will say 'MUST' (<xref
      target="ecnpush_ECN_Tunnel_Rules"></xref>). But in this section where we
      are setting down general design principles, we leave it as a 'SHOULD'.
      This allows for future multi-bit congestion notification fields where
      the risk from the covert channel created by copying congestion
      notification might outweigh the congestion control benefit of
      copying.</t>
    </section>

    <section anchor="ecnpush_ECN_Tunnel_Rules"
             title="Default ECN Tunnelling Rules">
      <t>The following ECN tunnel processing rules are the default for a
      packet with any DSCP. If required, different ECN processing rules MAY be
      defined for the appropriate Diffserv PHB using the guidelines in <xref
      target="ecnpush_Design_Principles"></xref>.</t>

      <t>When a tunnel ingress creates an encapsulating IP header, the 2-bit
      ECN field of the inner IP header MUST be copied into the outer IP
      header, for all types of IP in IP tunnel (except if the tunnel ingress
      is in compatibility mode&mdash;see <xref
      target="ecnpush_Backward_Compatibility"></xref>). If the tunnel ingress
      is also a Load Regulator, it MUST instead reset the outer header to
      ECT(0).</t>

      <t>To decapsulate the inner header at the tunnel egress, the outgoing
      inner header MUST be calculated from the combination of the incoming
      inner and outer headers setting the outgoing ECN field to the codepoints
      displayed in the body of <xref
      target="ecnpush_Tab_IP_IP_Decapsulation"></xref>.</t>

      <texttable align="center" anchor="ecnpush_Tab_IP_IP_Decapsulation"
                 title="IP in IP Decapsulation">
        <preamble>+--Incoming Outer Header---</preamble>

        <ttcol align="center">Incoming Inner Header</ttcol>

        <ttcol align="center">Not-ECT</ttcol>

        <ttcol align="center">ECT(0)</ttcol>

        <ttcol align="center">ECT(1)</ttcol>

        <ttcol align="center">CE</ttcol>

        <c>Not-ECT</c>

        <c>Not-ECT</c>

        <c>drop (!!!)</c>

        <c>drop(!!!)</c>

        <c>drop(!!!)</c>

        <c>ECT(0)</c>

        <c>ECT(0)</c>

        <c>ECT(0)</c>

        <c>ECT(0)</c>

        <c>CE</c>

        <c>ECT(1)</c>

        <c>ECT(1)</c>

        <c>ECT(1)</c>

        <c>ECT(1)</c>

        <c>CE</c>

        <c>CE</c>

        <c>CE</c>

        <c>CE (!!!)</c>

        <c>CE (!!!)</c>

        <c>CE</c>

        <postamble>+-----Outgoing Header------</postamble>
      </texttable>

      <t>The exclamation marks '(!!!)' in <xref
      target="ecnpush_Tab_IP_IP_Decapsulation"></xref> indicate that this
      combination of inner and outer headers should not be possible if only
      legal transitions have taken place. So, the decapsulator should drop or
      mark the ECN field as the table specifies, but it MAY also raise an
      appropriate alarm. It MUST NOT raise an alarm so often that the illegal
      combinations would amplify into a flood of alarm messages.</t>
    </section>

    <section anchor="ecnpush_Backward_Compatibility"
             title="Backward Compatibility">
      <t>A legacy tunnel egress may not know how to process an ECN field, so
      it will most likely simply disregard all outer headers. Therefore,
      unless a compliant tunnel ingress has established that the tunnel egress
      understands ECN processing, it MUST only send packets with the ECN field
      set to Not-ECT in the outer header. Otherwise, if ECN capable outer
      headers were sent towards a legacy egress, it would dangerously remove
      information about congestion experienced within the tunnel.</t>

      <t>A tunnel ingress may establish whether its tunnel egress will
      understand ECN processing by configuration or by negotiation. Note that
      a <xref target="RFC4301"></xref> tunnel ingress that has used IKEv2 key
      management <xref target="RFC4306"></xref> can guarantee that the tunnel
      egress is also RFC4301-compliant and therefore need not negotiate ECN
      capabilities.</t>

      <t>To be compliant with this specification a tunnel ingress that does
      not know the egress ECN capability (e.g. by configuration) MUST
      implement a 'normal' mode and a 'compatibility' mode, and it MUST
      initiate each negotiated tunnel in compatibility mode. On the other
      hand, a compliant tunnel egress MUST merely implement the one behaviour
      in <xref target="ecnpush_ECN_Tunnel_Rules"></xref>, which we term
      'full-functionality' mode.</t>

      <t>Before switching to normal mode, a compliant tunnel ingress that does
      not know the egress ECN capability (e.g. by configuration) MUST
      negotiate with the tunnel egress to establish whether the egress is in
      full functionality mode. If the egress is in full functionality mode,
      the ingress puts itself into normal mode. In normal mode the ingress
      follows the encapsulation rule in <xref
      target="ecnpush_ECN_Tunnel_Rules"></xref> (i.e. it copies the inner ECN
      field into the outer header). If the egress is not in full-functionality
      mode or doesn't understand the question, the tunnel ingress MUST remain
      in compatibility mode.</t>

      <t>A tunnel ingress in compatibility mode MUST set all outer headers to
      Not-ECT.</t>

      <t>The decapsulation rules for the egress of the tunnel in <xref
      target="ecnpush_ECN_Tunnel_Rules"></xref> have been defined in such a
      way that congestion control will still work safely if any of the earlier
      versions of ECN processing are used unilaterally at the encapsulating
      ingress of the tunnel. If a tunnel ingress tries to negotiate to use
      limited functionality mode or full functionality mode, a decapsulating
      tunnel egress compliant with this specification MUST agree to the
      request, even though its behaviour will be the same in both cases. For
      'forward compatibility', a compliant tunnel egress MUST raise a warning
      about any requests to enter modes it doesn't recognise, but it can
      continue operating. If no ECN-related mode is requested, no error or
      warning need be raised as the egress behaviour is compatible with all
      the legacy ingress behaviours that don't negotiate capabilities.</t>

      <!--{ToDo: Consider whether we can make life easier for further aspects of forward compatibility}-->

      <t>Note that if a compliant node is the ingress for multiple tunnels, a
      mode setting will need to be stored for each tunnel ingress. However, if
      a node is the egress for multiple tunnels, none of the tunnels will need
      to store a mode setting, because a compliant egress can only be in one
      mode.</t>
    </section>

    <section anchor="ecnpush_RFC_Changes" title="Changes from Earlier RFCs">
      <t>The rule that a tunnel ingress MUST copy any ECN field into the outer
      header is a change to RFC3168 (unless it is a Load Regulator as well, in
      which case there is no change).</t>

      <t>The rules for calculating the outgoing ECN field on decapsulation at
      a tunnel egress are in line with the full functionality mode of ECN in
      RFC3168 and with RFC4301, except that neither identified the need to
      raise an alarm if the inner header was CE but the outer header was
      ECT.</t>

      <t>The rules for how a tunnel establishes whether the egress has full
      functionality ECN capabilities are an update to RFC3168. For all the
      typical cases, RFC4301 is not updated by the ECN capability check in
      this specification, because a typical RFC4301 tunnel ingress will have
      already established that it is talking to an RFC4301 tunnel egress (e.g.
      if it uses IKEv2). However, there may be some corner cases (e.g. manual
      keying) where an RFC4301 tunnel ingress talks with an egress with
      limited functionality ECN handling. For such corner cases, the
      requirement to use compatibility mode in this specification updates
      RFC4301.</t>

      <t>The optional ECN Tunnel field in the IPsec security association
      database (SAD) and the optional ECN Tunnel Security Association
      Attribute defined in RFC3168 are no longer needed. The security
      association (SA) has no policy on ECN usage, because all RFC4301 tunnels
      now support ECN without any policy choice. </t>

      <t>RFC3168 defines a (required) limited functionality mode and an
      (optional) full functionality mode for a tunnel, but RFC4301 doesn't
      need modes. In this specification only the ingress might need two modes,
      unlike the modes of RFC3168 that were properties of the pair of tunnel
      endpoints after negotiation.</t>

      <!--{ToDo: Might not need this next line.}-->

      <t>All these ECN processing rules update RFC2003 on IP in IP
      tunnelling.</t>
    </section>

    <!-- ================================================================ -->

    <section anchor="ecnpush_IANA_Considerations" title="IANA Considerations">
      <t>This memo includes no request to IANA.</t>
    </section>

    <!-- ================================================================ -->

    <section anchor="ecnpush_Security_Considerations"
             title="Security Considerations">
      <t><xref target="ecnpush_Security_Constraints"></xref> discusses the
      security constraints imposed on ECN tunnel processing. The Design
      Principles of <xref target="ecnpush_Design_Principles"></xref> trade-off
      between security (covert channels) and congestion monitoring &amp;
      control. In fact, ensuring congestion markings are not lost is itself
      another aspect of security, because if we allowed congestion
      notification to be lost, any attempt to enforce a response to congestion
      would be much harder.</t>

      <t>We keep the behaviour defined in both RFC3168 and RFC4301 where, if
      the inner and outer headers carry contradictory ECT values the inner
      header is preserved for onward forwarding. However, in writing this
      document we noticed this behaviour would hide illegal suppression of
      congestion notification from the detection mechanism designed for this
      attack. One reason two ECT codepoints were defined was to enable the
      source to detect if a CE marking had been applied then subsequently
      removed. The source could detect this by weaving a pseudo-random
      sequence of ECT(0) and ECT(1) values into a stream of packets <xref
      target="RFC3540"></xref>. With the rules as they stand in RFC3168 and
      RFC4301, within a tunnel a CE marking could be added and subsequently
      removed by a non-compliant node without detection, because the evidence
      of such misbehaviour is removed by the decapsulator.</t>

      <t>We could have specified that an outer header value of ECT should
      overwrite a contradictory ECT value in the inner header to close this
      loophole. But we chose not to for two reasons: i) we wanted to avoid any
      changes to IPsec tunnelling behaviour; ii) allowing ECT values in the
      outer header to override the inner header would have increased the
      bandwidth of the covert channel through the egress gateway from 1 to 1.5
      bit per datagram, potentially threatening to upset the consensus
      established in the security area that says that the bandwidth of this
      covert channel can now be safely managed.</t>

      <!--If a security policy configures a legacy tunnel ingress to negotiate to turn off ECN processing, a compliant tunnel egress will say that it has turned off ECN processing but it will still copy CE markings from the outer to the forwarded header. Although the tunnel ingress 'I' will set all ECN fields in outer headers to Not-ECT, 'M' could still toggle CE on and off to communicate covertly with 'B', because we have specified that 'E' only has one mode regardless of what mode it says it has negotiated. We could have specified that 'E' should have a limited functionality mode and check for such behaviour, but we decided to avoid extra complexity on a compliant tunnel egress just to cater for a legacy security concern that is now considered manageable.-->
    </section>

    <!-- ================================================================ -->

    <section anchor="ecnpush_Conclusions" title="Conclusions">
      <t>This document updates the tunnelling treatment of RFC3168 ECN for all
      IP in IP tunnels to bring it into line with the new behaviour in the
      IPsec architecture of RFC4301.</t>

      <t>At the tunnel egress, header decapsulation for the default ECN
      marking behaviour is broadly unchanged except that one exceptional case
      has been catered for. At the ingress, for all forms of IP in IP tunnel,
      encapsulation has been brought into line with the new IPsec rules in
      RFC4301 which copy rather than reset CE markings when creating outer
      headers. Previously, upstream congestion information was not revealed in
      the outer header, which limited the scope of some management monitoring
      techniques and prevented certain active queue management algorithms from
      taking account of upstream congestion markings. The change ensures all
      IP in IP tunnels reflect the more relaxed attitude to revealing
      congestion information in the new IPsec architecture, which now deems
      that the threat from 2-bit covert channels can be managed without
      disabling ECN. </t>

      <t>Also, this document defines more generic principles to guide the
      design of alternate forms of tunnel processing of congestion
      notification, if required for specific Diffserv PHBs (such as will be
      required for the PCN working group) or for other lower layer
      encapsulating protocols that might support congestion notification in
      the future (e.g. MPLS).</t>
    </section>

    <!-- ================================================================ -->

    <section anchor="ecnpush_Acknowledgements" title="Acknowledgements">
      <t>Thanks to David Black, Bruce Davie, Toby Moncaster and Gabriele
      Corliano for their careful review comments.</t>
    </section>

    <!-- ================================================================ -->

    <section anchor="ecnpush_Comments_Solicited" title="Comments Solicited">
      <t>Comments and questions are encouraged and very welcome. They can be
      addressed to the IETF Transport Area working group mailing list
      &lt;tsvwg@ietf.org&gt;, and/or to the authors.</t>
    </section>
  </middle>

  <back>
    <!-- ================================================================ -->

    <references title="Normative References">
      <?rfc include="reference.RFC.2003" ?>

      <?rfc include="reference.RFC.2119" ?>

      <?rfc include='reference.RFC.2474'?>

      <?rfc include='reference.RFC.3168'?>

      <?rfc include='reference.RFC.4301'?>
    </references>

    <references title="Informative References">
      <?rfc include='reference.RFC.1254'?>

      <?rfc include='reference.RFC.2205'?>

      <?rfc include='reference.RFC.2637'?>

      <?rfc include='reference.RFC.2661'?>

      <?rfc include='reference.RFC.1701'?>

      <?rfc include='reference.RFC.3426'?>

      <?rfc include='reference.RFC.3540'?>

      <?rfc include='reference.RFC.4423'?>

      <?rfc include='reference.RFC.4306'?>

      <?rfc include='localref.Reid97.ATM_SDH_Sonet.xml'?>

      <?rfc include='localref.IEEE802.1auCongNotif.xml'?>

      <?rfc include='reference.I-D.ietf-tsvwg-ecn-mpls.xml'?>

      <?rfc include='localref.I-D.eardley-pcn-architecture'?>

      <?rfc include='localref.I-D.shayman-ecn-mpls'?>

      <?rfc include="localref.IESG.PCN_charter" ?>

      <?rfc include='reference.I-D.rosen-pwe3-congestion'?>
    </references>

    <section anchor="ecnpush_In-path_Load_Regulation"
             title="In-path Load Regulation">
      <t>In the traditional Internet architecture one tends to think of the
      source host as the Load Regulator for a path. It is generally not
      desirable or practical for a node part way along the path to regulate
      the load. However, various reasonable proposals for in-path load
      regulation have been made from time to time (e.g. fair queuing, traffic
      engineering). Also the IETF has recently chartered a working group to
      standardise admission control across a part of a path using
      pre-congestion notification (PCN) <xref target="PCNcharter"></xref>,
      which involves in-path load regulation. This is of particular relevance
      here because it involves congestion notification with an in-path Load
      Regulator and it can involve tunnelling.</t>

      <t>We will use the more complex scenario in <xref
      target="ecnpush_Fig_Complex_Tunnel_Scenario"></xref> to tease out all
      the issues that arise when combining congestion notification and
      tunnelling with various possible in-path load regulation schemes. In
      this case 'I1' and 'E2' break up the path into three separate congestion
      control loops. The feedback for these loops is shown going right to left
      across the top of the figure. The 'V's are arrow heads representing the
      direction of feedback, not letters. But there are also two tunnels
      within the middle control loop: 'I1' to 'E1' and 'I2' to 'E2'. The two
      tunnels might be VPNs, perhaps over two MPLS core networks. M is a
      congestion monitoring point, perhaps between two border routers where
      the same tunnel continues unbroken across the border.</t>

      <figure anchor="ecnpush_Fig_Complex_Tunnel_Scenario"
              title="complex Tunnel Scenario">
        <artwork><![CDATA[      ______     _______________________________________      _____
     /      \   /                                        \   /     \
    V        \ V                                M         \ V       \
    A--->R--->I1===========>E1----->I2=========>==========>E2------->B
]]></artwork>

        <postamble></postamble>
      </figure>

      <t>The question is, should the congestion markings in the outer exposed
      headers of a tunnel represent congestion only since the tunnel ingress
      or over the whole upstream path from the source of the inner header
      (whatever that may mean)? Or put another way, should 'I1' and 'I2' copy
      or reset CE markings?</t>

      <!--{ToDo: In the management monitoring part, I have implied (but not highlighted) that if a non-IP protocol 
exposed header doesn't support congestion notification, then clearly a monitoring system won't need to 
know where the baseline is. But if it is encapsulated later on by a header that does support congestion 
notification (ie a sandwich of capable-incapable-capable), it won't be able to copy CE from the innermost 
to the outermost.}-->

      <t>The answer is that the baseline of congestion marking should be the
      nearest upstream interface designed to regulate traffic load&mdash;the
      Load Regulator. In <xref
      target="ecnpush_Fig_Complex_Tunnel_Scenario"></xref> 'A', 'I1' or 'E2'
      are all Load Regulators. We have shown the feedback loops returning to
      each of these nodes so that they can regulate the load causing the
      congestion notification. So the baseline for congestion markings exposed
      to M should be 'I1' (the Load Regulator), not 'I2'. That is, 'I2' SHOULD
      copy any CE marking into the outer header it creates, while 'I1' is an
      exception because it is an in-path load regulator, so it should reset
      the ECN field in the outer header it creates.</t>

      <t>The following further examples illustrate how this answer might be
      applied:</t>

      <t><list style="symbols">
          <t>Preemption marking is currently defined for PCN <xref
          target="PCN-arch"></xref> so that the rate of unmarked packets at
          the end of a path of multiple bottlenecks determines the maximum
          sustainable aggregate bit rate over that path. To produce the
          correct marking by the end, each congested node must only consider
          packets to be eligible for marking if they have not already been
          marked by any previous bottleneck along a path that may span
          multiple tunnels (including MPLS encapsulations etc.). This scheme
          only results in the correct marking rate if the markings accumulated
          so far along the path are copied into the outer exposed header of
          each tunnel or encapsulation. Consider that 'I1' and 'E2' in the
          complex scenario of <xref
          target="ecnpush_Fig_Complex_Tunnel_Scenario"></xref> are edge
          gateways of a PCN region. Admission control based on PCN
          measurements is a form of load regulation, so 'I1' regulates the
          load on the PCN region. Therefore 'I1' should be the baseline of
          congestion marking for <spanx style="emph">both</spanx> tunnels
          within the scope of its feedback loop. Therefore 'I2' should follow
          the normal rules and copy congestion marking into the outer tunnel
          header, while 'I1' is an exception because it is also a load
          regulator, so it should reset CE markings in the outer header.</t>

          <t><xref target="Shayman"></xref> suggested feedback of ECN
          accumulated across an MPLS domain could cause the ingress to trigger
          re-routing to mitigate congestion. This case is more like the simple
          scenario of <xref target="ecnpush_Fig_Tunnel_Scenario"></xref>, with
          a feedback loop across the MPLS domain ('E' back to 'I'). The
          baseline for congestion exposed in outer headers in this case will
          be the tunnel ingress, which should therefore reset the ECN field in
          the outer headers it creates. But the reason it should act as the
          baseline is because it is an in-path load regulator (re-routing
          around congestion is a load regulation function), not just because
          it is a tunnel ingress.</t>

          <t>The PWE3 working group of the IETF is considering the problem of
          how and whether an aggregate private wire emulation should respond
          to congestion <xref target="I-D.rosen-pwe3-congestion"></xref>.
          Although the study is still at the requirements stage, some
          (controversial) solution proposals include in-path load regulation
          at the ingress to the tunnel that could lead to tunnel arrangements
          with similar complexity to that of <xref
          target="ecnpush_Fig_Complex_Tunnel_Scenario"></xref>.</t>
        </list></t>

      <t>These are not contrived scenarios&mdash;they could be a lot worse.
      For instance, a host may create a tunnel for IPsec which is placed
      inside a tunnel for Mobile IP over a remote part of its path. And around
      this all we may have MPLS labels being pushed and popped as packets pass
      across different core networks. Similarly, it is possible that subnets
      could be built from link technology (e.g. ethernet switches) so that
      link headers being added and removed could involve congestion
      notification in future link headers with all the same issues as with IP
      in IP tunnels.</t>

      <t>The reason we introduced the concept of a Load Regulator was to allow
      for in-path load regulation. In the traditional Internet architecture
      one tends to think of a host and a Load Regulator as synonymous, but
      when considering tunnelling, even the definition of a host is too fuzzy,
      whereas a Load Regulator is a clearly defined function. Similarly, the
      concept of innermost header is too fuzzy to be able to (wrongly) say
      that the source address of the innermost header should be the baseline.
      Which is the innermost header when multiple encapsulations may be in
      use? Where do we stop? If we say the original source in the above
      IPsec-Mobile IP case is the host, how do we know it isn't tunnelling an
      encrypted packet stream on behalf of another host in a p2p network?</t>

      <t>The reason there has been so much confusion over the question of
      whether a tunnel ingress should copy or reset CE markings is that we
      have become used to thinking that only hosts regulate load. The end to
      end design principle advises that this is a good idea <xref
      target="RFC3426"></xref>, but it also advises that it is only a guiding
      principle intended to make the designer think very carefully before
      breaking it. We do have proposals where load regulation functions sit
      within a network path for good, if sometimes controversial, reasons,
      e.g. PCN edge admission control gateways <xref target="PCN-arch"></xref>
      or traffic engineering functions at domain borders to re-route around
      congestion <xref target="Shayman"></xref>.</t>
    </section>
  </back>
</rfc>