LEFT | RIGHT |
| 1 .. include:: replace.txt |
| 2 .. highlight:: cpp |
| 3 |
| 4 TCP models in ns-3 |
| 5 ------------------ |
| 6 |
| 7 This chapter describes the TCP models available in |ns3|. |
| 8 |
| 9 Generic support for TCP |
| 10 *********************** |
| 11 |
| 12 |ns3| was written to support multiple TCP implementations. The implementations |
| 13 inherit from a few common header classes in the ``src/network`` directory, so th
at |
| 14 user code can swap out implementations with minimal changes to the scripts. |
| 15 |
| 16 There are two important abstract base classes: |
| 17 |
| 18 * class :cpp:class:`TcpSocket`: This is defined in |
| 19 ``src/internet/model/tcp-socket.{cc,h}``. This class exists for hosting TcpSoc
ket |
| 20 attributes that can be reused across different implementations. For instance, |
| 21 the attribute ``InitialCwnd`` can be used for any of the implementations |
| 22 that derive from class :cpp:class:`TcpSocket`. |
| 23 * class :cpp:class:`TcpSocketFactory`: This is used by the layer-4 protocol |
| 24 instance to create TCP sockets of the right type. |
| 25 |
| 26 There are presently three implementations of TCP available for |ns3|. |
| 27 |
| 28 * a natively implemented TCP for ns-3 |
| 29 * support for the `Network Simulation Cradle (NSC) <http://www.wand.net.nz/~stj2
/nsc/>`__ |
| 30 * support for `Direct Code Execution (DCE) <https://www.nsnam.org/overview/proje
cts/direct-code-execution/>`__ |
| 31 |
| 32 It should also be mentioned that various ways of combining virtual machines |
| 33 with |ns3| makes available also some additional TCP implementations, but |
| 34 those are out of scope for this chapter. |
| 35 |
| 36 ns-3 TCP |
| 37 ******** |
| 38 |
| 39 In brief, the native |ns3| TCP model supports a full bidirectional TCP with |
| 40 connection setup and close logic. Several congestion control algorithms |
| 41 are supported, with NewReno the default, and Westwood, Hybla, HighSpeed, |
| 42 Vegas, Scalable, Veno, Binary Increase Congestion Control (BIC), Yet Another |
| 43 HighSpeed TCP (YeAH), Illinois, H-TCP and Low Extra Delay Background Transport |
| 44 (LEDBAT) also supported. The model also supports Selective Acknowledgements |
| 45 (SACK). Multipath-TCP is not yet supported in the |ns3| releases. |
| 46 |
| 47 Model history |
| 48 +++++++++++++ |
| 49 |
| 50 Until the ns-3.10 release, |ns3| contained a port of the TCP model from `GTNetS |
| 51 <http://www.ece.gatech.edu/research/labs/MANIACS/GTNetS/index.html>`_.· |
| 52 This implementation was substantially rewritten by Adriam Tam for ns-3.10. |
| 53 In 2015, the TCP module has been redesigned in order to create a better· |
| 54 environment for creating and carrying out automated tests. One of the main· |
| 55 changes involves congestion control algorithms, and how they are implemented. |
| 56 |
| 57 Before ns-3.25 release, a congestion control was considered as a stand-alone TCP |
| 58 through an inheritance relation: each congestion control (e.g. TcpNewReno) was |
| 59 a subclass of TcpSocketBase, reimplementing some inherited methods. The |
| 60 architecture was redone to avoid this inheritance, |
| 61 the fundamental principle of the GSoC proposal was avoiding this inheritance, |
| 62 by making each congestion control a separate class, and making an interface |
| 63 to exchange important data between TcpSocketBase and the congestion modules. |
| 64 For instance, similar modularity is used in Linux. |
| 65 |
| 66 Along with congestion control, Fast Retransmit and Fast Recovery algorithms |
| 67 have been modified; in previous releases, these algorithms were demanded to |
| 68 TcpSocketBase subclasses. Starting from ns-3.25, they have been merged inside |
| 69 TcpSocketBase. In future releases, they can be extracted as separate modules, |
| 70 following the congestion control design. |
| 71 |
| 72 Usage |
| 73 +++++ |
| 74 |
| 75 In many cases, usage of TCP is set at the application layer by telling |
| 76 the |ns3| application which kind of socket factory to use. |
| 77 |
| 78 Using the helper functions defined in ``src/applications/helper`` and |
| 79 ``src/network/helper``, here is how one would create a TCP receiver:: |
| 80 |
| 81 // Create a packet sink on the star "hub" to receive these packets |
| 82 uint16_t port = 50000; |
| 83 Address sinkLocalAddress(InetSocketAddress (Ipv4Address::GetAny (), port)); |
| 84 PacketSinkHelper sinkHelper ("ns3::TcpSocketFactory", sinkLocalAddress); |
| 85 ApplicationContainer sinkApp = sinkHelper.Install (serverNode); |
| 86 sinkApp.Start (Seconds (1.0)); |
| 87 sinkApp.Stop (Seconds (10.0)); |
| 88 |
| 89 Similarly, the below snippet configures OnOffApplication traffic source to use |
| 90 TCP:: |
| 91 |
| 92 // Create the OnOff applications to send TCP to the server |
| 93 OnOffHelper clientHelper ("ns3::TcpSocketFactory", Address ()); |
| 94 |
| 95 The careful reader will note above that we have specified the TypeId of an |
| 96 abstract base class :cpp:class:`TcpSocketFactory`. How does the script tell |
| 97 |ns3| that it wants the native |ns3| TCP vs. some other one? Well, when |
| 98 internet stacks are added to the node, the default TCP implementation that is |
| 99 aggregated to the node is the |ns3| TCP. This can be overridden as we show |
| 100 below when using Network Simulation Cradle. So, by default, when using the |ns3| |
| 101 helper API, the TCP that is aggregated to nodes with an Internet stack is the |
| 102 native |ns3| TCP. |
| 103 |
| 104 To configure behavior of TCP, a number of parameters are exported through the |
| 105 |ns3| attribute system. These are documented in the `Doxygen |
| 106 <http://www.nsnam.org/doxygen/classns3_1_1_tcp_socket.html>` for class |
| 107 :cpp:class:`TcpSocket`. For example, the maximum segment size is a |
| 108 settable attribute. |
| 109 |
| 110 To set the default socket type before any internet stack-related objects are |
| 111 created, one may put the following statement at the top of the simulation |
| 112 program::· |
| 113 |
| 114 Config::SetDefault ("ns3::TcpL4Protocol::SocketType", StringValue ("ns3::TcpNe
wReno"));· |
| 115 |
| 116 For users who wish to have a pointer to the actual socket (so that |
| 117 socket operations like Bind(), setting socket options, etc. can be |
| 118 done on a per-socket basis), Tcp sockets can be created by using the· |
| 119 ``Socket::CreateSocket()`` method. The TypeId passed to CreateSocket() |
| 120 must be of type :cpp:class:`ns3::SocketFactory`, so configuring the underlying· |
| 121 socket type must be done by twiddling the attribute associated with the |
| 122 underlying TcpL4Protocol object. The easiest way to get at this would be· |
| 123 through the attribute configuration system. In the below example, |
| 124 the Node container "n0n1" is accessed to get the zeroth element, and a socket is |
| 125 created on this node:: |
| 126 |
| 127 // Create and bind the socket... |
| 128 TypeId tid = TypeId::LookupByName ("ns3::TcpNewReno"); |
| 129 Config::Set ("/NodeList/*/$ns3::TcpL4Protocol/SocketType", TypeIdValue (tid)); |
| 130 Ptr<Socket> localSocket = |
| 131 Socket::CreateSocket (n0n1.Get (0), TcpSocketFactory::GetTypeId ()); |
| 132 |
| 133 Above, the "*" wild card for node number is passed to the attribute |
| 134 configuration system, so that all future sockets on all nodes are set to· |
| 135 NewReno, not just on node 'n0n1.Get (0)'. If one wants to limit it to just· |
| 136 the specified node, one would have to do something like:: |
| 137 |
| 138 // Create and bind the socket... |
| 139 TypeId tid = TypeId::LookupByName ("ns3::TcpNewReno"); |
| 140 std::stringstream nodeId; |
| 141 nodeId << n0n1.Get (0)->GetId (); |
| 142 std::string specificNode = "/NodeList/" + nodeId.str () + "/$ns3::TcpL4Protoco
l/SocketType"; |
| 143 Config::Set (specificNode, TypeIdValue (tid)); |
| 144 Ptr<Socket> localSocket = |
| 145 Socket::CreateSocket (n0n1.Get (0), TcpSocketFactory::GetTypeId ());· |
| 146 |
| 147 Once a TCP socket is created, one will want to follow conventional socket logic |
| 148 and either connect() and send() (for a TCP client) or bind(), listen(), and |
| 149 accept() (for a TCP server). |
| 150 Please note that applications usually create the sockets they use automatically, |
| 151 and so is not straightforward to connect direcly to them using pointers. Please |
| 152 refer to the source code of your preferred application to discover how and when |
| 153 it creates the socket. |
| 154 |
| 155 TCP Socket interaction and interface with Application layer |
| 156 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 157 |
| 158 In the following there is an analysis on the public interface of the TCP socket, |
| 159 and how it can be used to interact with the socket itself. An analysis of the |
| 160 callback fired by the socket is also carried out. Please note that, for |
| 161 the sake of clarity, we will use the terminology "Sender" and "Receiver" to clea
rly |
| 162 divide the functionality of the socket. However, in TCP these two roles can be |
| 163 applied at the same time (i.e. a socket could be a sender and a receiver at the |
| 164 same time): our distinction does not lose generality, since the following |
| 165 definition can be applied to both sockets in case of full-duplex mode. |
| 166 |
| 167 ---------- |
| 168 |
| 169 **TCP state machine (for commodity use)** |
| 170 |
| 171 .. _fig-tcp-state-machine: |
| 172 |
| 173 .. figure:: figures/tcp-state-machine.* |
| 174 :align: center |
| 175 |
| 176 TCP State machine |
| 177 |
| 178 In ns-3 we are fully compliant with the state machine depicted in· |
| 179 Figure :ref:`fig-tcp-state-machine`. |
| 180 |
| 181 ---------- |
| 182 |
| 183 **Public interface for receivers (e.g. servers receiving data)** |
| 184 |
| 185 *Bind()* |
| 186 Bind the socket to an address, or to a general endpoint. A general endpoint |
| 187 is an endpoint with an ephemeral port allocation (that is, a random port |
| 188 allocation) on the 0.0.0.0 IP address. For instance, in current applications, |
| 189 data senders usually binds automatically after a *Connect()* over a random |
| 190 port. Consequently, the connection will start from this random port towards |
| 191 the well-defined port of the receiver. The IP 0.0.0.0 is then translated by |
| 192 lower layers into the real IP of the device. |
| 193 |
| 194 *Bind6()* |
| 195 Same as *Bind()*, but for IPv6. |
| 196 |
| 197 *BindToNetDevice()* |
| 198 Bind the socket to the specified NetDevice, creating a general endpoint. |
| 199 |
| 200 *Listen()* |
| 201 Listen on the endpoint for an incoming connection. Please note that this |
| 202 function can be called only in the TCP CLOSED state, and transit in the |
| 203 LISTEN state. When an incoming request for connection is detected (i.e. the |
| 204 other peer invoked *Connect()*) the application will be signaled with the |
| 205 callback *NotifyConnectionRequest* (set in *SetAcceptCallback()* beforehand). |
| 206 If the connection is accepted (the default behavior, when the associated |
| 207 callback is a null one) the Socket will fork itself, i.e. a new socket is |
| 208 created to handle the incoming data/connection, in the state SYN_RCVD. Please |
| 209 note that this newly created socket is not connected anymore to the callbacks |
| 210 on the "father" socket (e.g. DataSent, Recv); the pointer of the newly |
| 211 created socket is provided in the Callback *NotifyNewConnectionCreated* (set |
| 212 beforehand in *SetAcceptCallback*), and should be used to connect new |
| 213 callbacks to interesting events (e.g. Recv callback). After receiving the ACK |
| 214 of the SYN-ACK, the socket will set the congestion control, move into |
| 215 ESTABLISHED state, and then notify the application with |
| 216 *NotifyNewConnectionCreated*. |
| 217 |
| 218 *ShutdownSend()* |
| 219 Signal a termination of send, or in other words revents data from being added |
| 220 to the buffer. After this call, if buffer is already empty, the socket |
| 221 will send a FIN, otherwise FIN will go when buffer empties. Please note |
| 222 that this is useful only for modeling "Sink" applications. If you have |
| 223 data to transmit, please refer to the *Send()* / *Close()* combination of |
| 224 API. |
| 225 |
| 226 *GetRxAvailable()* |
| 227 Get the amount of data that could be returned by the Socket in one or multiple |
| 228 call to Recv or RecvFrom. Please use the Attribute system to configure the |
| 229 maximum available space on the receiver buffer (property "RcvBufSize"). |
| 230 |
| 231 *Recv()* |
| 232 Grab data from the TCP socket. Please remember that TCP is a stream socket, |
| 233 and it is allowed to concatenate multiple packets into bigger ones. If no data |
| 234 is present (i.e. *GetRxAvailable* returns 0) an empty packet is returned. |
| 235 Set the callback *RecvCallback* through *SetRecvCallback()* in order to have |
| 236 the application automatically notified when some data is ready to be read. |
| 237 It's important to connect that callback to the newly created socket in case |
| 238 of forks. |
| 239 |
| 240 *RecvFrom()* |
| 241 Same as Recv, but with the source address as parameter. |
| 242 |
| 243 ------------------- |
| 244 |
| 245 **Public interface for senders (e.g. clients uploading data)** |
| 246 |
| 247 *Connect()* |
| 248 Set the remote endpoint, and try to connect to it. The local endpoint should |
| 249 be set before this call, or otherwise an ephemeral one will be created. The |
| 250 TCP then will be in the SYN_SENT state. If a SYN-ACK is received, the TCP will |
| 251 setup the congestion control, and then call the callback |
| 252 *ConnectionSucceeded*. |
| 253 |
| 254 *GetTxAvailable()* |
| 255 Return the amount of data that can be stored in the TCP Tx buffer. Set this |
| 256 property through the Attribute system ("SndBufSize"). |
| 257 |
| 258 *Send()* |
| 259 Send the data into the TCP Tx buffer. From there, the TCP rules will decide |
| 260 if, and when, this data will be transmitted. Please note that, if the tx |
| 261 buffer has enough data to fill the congestion (or the receiver) window, dynami
cally |
| 262 varying the rate at which data is injected in the TCP buffer does not have any |
| 263 noticeable effect on the amount of data transmitted on the wire, that will |
| 264 continue to be decided by the TCP rules. |
| 265 |
| 266 *SendTo()* |
| 267 Same as *Send()*. |
| 268 |
| 269 *Close()* |
| 270 Terminate the local side of the connection, by sending a FIN (after all data |
| 271 in the tx buffer has been transmitted). This does not prevent the socket in |
| 272 receiving data, and employing retransmit mechanism if losses are detected. If |
| 273 the application calls *Close()* with unread data in its rx buffer, the socket |
| 274 will send a reset. If the socket is in the state SYN_SENT, CLOSING, LISTEN or |
| 275 LAST_ACK, after that call the application will be notified with |
| 276 *NotifyNormalClose()*. In all the other cases, the notification is delayed |
| 277 (see *NotifyNormalClose()*). |
| 278 |
| 279 ----------------------------------------- |
| 280 |
| 281 **Public callbacks** |
| 282 |
| 283 These callbacks are called by the TCP socket to notify the application of |
| 284 interesting events. We will refer to these with the protected name used in |
| 285 socket.h, but we will provide the API function to set the pointers to these |
| 286 callback as well. |
| 287 |
| 288 *NotifyConnectionSucceeded*: *SetConnectCallback*, 1st argument |
| 289 Called in the SYN_SENT state, before moving to ESTABLISHED. In other words, we |
| 290 have sent the SYN, and we received the SYN-ACK: the socket prepare the |
| 291 sequence numbers, send the ACK for the SYN-ACK, try to send out more data (in |
| 292 another segment) and then invoke this callback. After this callback, it |
| 293 invokes the NotifySend callback. |
| 294 |
| 295 *NotifyConnectionFailed*: *SetConnectCallback*, 2nd argument |
| 296 Called after the SYN retransmission count goes to 0. SYN packet is lost |
| 297 multiple time, and the socket give up. |
| 298 |
| 299 *NotifyNormalClose*: *SetCloseCallbacks*, 1st argument |
| 300 A normal close is invoked. A rare case is when we receive an RST segment (or a |
| 301 segment with bad flags) in normal states. All other cases are: |
| 302 - The application tries to *Connect()* over an already connected socket |
| 303 - Received an ACK for the FIN sent, with or without the FIN bit set (we are in
LAST_ACK) |
| 304 - The socket reaches the maximum amount of retries in retransmitting the SYN (
*) |
| 305 - We receive a timeout in the LAST_ACK state |
| 306 - After 2*Maximum Segment Lifetime seconds passed since the socket entered the
TIME_WAIT state. |
| 307 |
| 308 *NotifyErrorClose*: *SetCloseCallbacks*, 2nd argument |
| 309 Invoked when we send an RST segment (for whatever reason) or we reached the |
| 310 maximum amount of data retries. |
| 311 |
| 312 *NotifyConnectionRequest*: *SetAcceptCallback*, 1st argument |
| 313 Invoked in the LISTEN state, when we receive a SYN. The return value indicates |
| 314 if the socket should accept the connection (return true) or should ignore it |
| 315 (return false). |
| 316 |
| 317 *NotifyNewConnectionCreated*: *SetAcceptCallback*, 2nd argument |
| 318 Invoked when from SYN_RCVD the socket passes to ESTABLISHED, and after setting |
| 319 up the congestion control, the sequence numbers, and processed the incoming |
| 320 ACK. If there is some space in the buffer, *NotifySend* is called shortly |
| 321 after this callback. The Socket pointer, passed with this callback, is the |
| 322 newly created socket, after a Fork(). |
| 323 |
| 324 *NotifyDataSent*: *SetDataSentCallback* |
| 325 The Socket notifies the application that some bytes has been transmitted on |
| 326 the IP level. These bytes could still be lost in the node (traffic control |
| 327 layer) or in the network. |
| 328 |
| 329 *NotifySend*: *SetSendCallback* |
| 330 Invoked if there is some space in the tx buffer when entering the ESTABLISHED |
| 331 state (e.g. after the ACK for SYN-ACK is received), after the connection |
| 332 succeeds (e.g. after the SYN-ACK is received) and after each new ack (i.e. |
| 333 that advances SND.UNA). |
| 334 |
| 335 *NotifyDataRecv*: *SetRecvCallback* |
| 336 Called when in the receiver buffere there are in-order bytes, and when in |
| 337 FIN_WAIT_1 or FIN_WAIT_2 the socket receive a in-sequence FIN (that can carry |
| 338 data). |
| 339 |
| 340 |
| 341 Congestion Control Algorithms |
| 342 +++++++++++++++++++++++++++++ |
| 343 Here follows a list of supported TCP congestion control algorithms. For an |
| 344 academic peer-reviewed paper on these congestion control algorithms, see |
| 345 http://dl.acm.org/citation.cfm?id=2756518 . |
| 346 |
| 347 New Reno |
| 348 ^^^^^^^^ |
| 349 New Reno algorithm introduces partial ACKs inside the well-established Reno |
| 350 algorithm. This and other modifications are described in RFC 6582. We have two |
| 351 possible congestion window increment strategy: slow start and congestion |
| 352 avoidance. Taken from RFC 5681: |
| 353 |
| 354 During slow start, a TCP increments cwnd by at most SMSS bytes for |
| 355 each ACK received that cumulatively acknowledges new data. Slow |
| 356 start ends when cwnd exceeds ssthresh (or, optionally, when it |
| 357 reaches it, as noted above) or when congestion is observed. While |
| 358 traditionally TCP implementations have increased cwnd by precisely |
| 359 SMSS bytes upon receipt of an ACK covering new data, we RECOMMEND |
| 360 that TCP implementations increase cwnd, per Equation :eq:`newrenocongavoid`, |
| 361 where N is the number of previously unacknowledged bytes acknowledged |
| 362 in the incoming ACK. |
| 363 |
| 364 .. math:: cwnd += min (N, SMSS) |
| 365 :label: newrenocongavoid |
| 366 |
| 367 During congestion avoidance, cwnd is incremented by roughly 1 full-sized |
| 368 segment per round-trip time (RTT), and for each congestion event, the slow |
| 369 start threshold is halved. |
| 370 |
| 371 High Speed |
| 372 ^^^^^^^^^^ |
| 373 TCP HighSpeed is designed for high-capacity channels or, in general, for |
| 374 TCP connections with large congestion windows. |
| 375 Conceptually, with respect to the standard TCP, HighSpeed makes the |
| 376 cWnd grow faster during the probing phases and accelerates the |
| 377 cWnd recovery from losses. |
| 378 This behavior is executed only when the window grows beyond a |
| 379 certain threshold, which allows TCP Highspeed to be friendly with standard |
| 380 TCP in environments with heavy congestion, without introducing new dangers |
| 381 of congestion collapse. |
| 382 |
| 383 Mathematically: |
| 384 |
| 385 .. math:: cWnd = cWnd + \frac{a(cWnd)}{cWnd} |
| 386 |
| 387 The function a() is calculated using a fixed RTT the value 100 ms (the |
| 388 lookup table for this function is taken from RFC 3649). For each congestion |
| 389 event, the slow start threshold is decreased by a value that depends on the |
| 390 size of the slow start threshold itself. Then, the congestion window is set |
| 391 to such value. |
| 392 |
| 393 .. math:: cWnd = (1-b(cWnd)) \cdot cWnd |
| 394 |
| 395 The lookup table for the function b() is taken from the same RFC. |
| 396 More informations at: http://dl.acm.org/citation.cfm?id=2756518 |
| 397 |
| 398 Hybla |
| 399 ^^^^^ |
| 400 The key idea behind TCP Hybla is to obtain for long RTT connections the same |
| 401 instantaneous transmission rate of a reference TCP connection with lower RTT. |
| 402 With analytical steps, it is shown that this goal can be achieved by |
| 403 modifying the time scale, in order for the throughput to be independent from |
| 404 the RTT. This independence is obtained through the use of a coefficient rho. |
| 405 |
| 406 This coefficient is used to calculate both the slow start threshold |
| 407 and the congestion window when in slow start and in congestion avoidance, |
| 408 respectively. |
| 409 |
| 410 More informations at: http://dl.acm.org/citation.cfm?id=2756518 |
| 411 |
| 412 Westwood |
| 413 ^^^^^^^^ |
| 414 Westwood and Westwood+ employ the AIAD (Additive Increase/Adaptive Decrease)· |
| 415 congestion control paradigm. When a congestion episode happens,· |
| 416 instead of halving the cwnd, these protocols try to estimate the network's |
| 417 bandwidth and use the estimated value to adjust the cwnd.· |
| 418 While Westwood performs the bandwidth sampling every ACK reception,· |
| 419 Westwood+ samples the bandwidth every RTT. |
| 420 |
| 421 More informations at: http://dl.acm.org/citation.cfm?id=381704 and |
| 422 http://dl.acm.org/citation.cfm?id=2512757 |
| 423 |
| 424 Vegas |
| 425 ^^^^^ |
| 426 TCP Vegas is a pure delay-based congestion control algorithm implementing a |
| 427 proactive scheme that tries to prevent packet drops by maintaining a small |
| 428 backlog at the bottleneck queue. Vegas continuously samples the RTT and computes |
| 429 the actual throughput a connection achieves using Equation (1) and compares it |
| 430 with the expected throughput calculated in Equation (2). The difference between |
| 431 these 2 sending rates in Equation (3) reflects the amount of extra packets being |
| 432 queued at the bottleneck. |
| 433 |
| 434 .. math:: |
| 435 |
| 436 actual &= \frac{cWnd}{RTT} \\ |
| 437 expected &= \frac{cWnd}{BaseRTT} \\ |
| 438 diff &= expected - actual |
| 439 |
| 440 To avoid congestion, Vegas linearly increases/decreases its congestion window |
| 441 to ensure the diff value fall between the two predefined thresholds, alpha and |
| 442 beta. diff and another threshold, gamma, are used to determine when Vegas |
| 443 should change from its slow-start mode to linear increase/decrease mode. |
| 444 Following the implementation of Vegas in Linux, we use 2, 4, and 1 as the |
| 445 default values of alpha, beta, and gamma, respectively, but they can be |
| 446 modified through the Attribute system. |
| 447 |
| 448 More informations at: http://dx.doi.org/10.1109/49.464716 |
| 449 |
| 450 Scalable |
| 451 ^^^^^^^^ |
| 452 Scalable improves TCP performance to better utilize the available bandwidth of |
| 453 a highspeed wide area network by altering NewReno congestion window adjustment |
| 454 algorithm. When congestion has not been detected, for each ACK received in an |
| 455 RTT, Scalable increases its cwnd per: |
| 456 |
| 457 .. math:: cwnd = cwnd + 0.01 |
| 458 |
| 459 Following Linux implementation of Scalable, we use 50 instead of 100 to account |
| 460 for delayed ACK. |
| 461 |
| 462 On the first detection of congestion in a given RTT, cwnd is reduced based on |
| 463 the following equation: |
| 464 |
| 465 .. math:: cwnd = cwnd - ceil(0.125 \cdot cwnd) |
| 466 |
| 467 More informations at: http://dl.acm.org/citation.cfm?id=956989 |
| 468 |
| 469 Veno |
| 470 ^^^^ |
| 471 |
| 472 TCP Veno enhances Reno algorithm for more effectively dealing with random |
| 473 packet loss in wireless access networks by employing Vegas's method in |
| 474 estimating the backlog at the bottleneck queue to distinguish between |
| 475 congestive and non-congestive states. |
| 476 |
| 477 The backlog (the number of packets accumulated at the bottleneck queue) is |
| 478 calculated using Equation (1): |
| 479 |
| 480 .. math:: |
| 481 N &= Actual \cdot (RTT - BaseRTT) \\ |
| 482 &= Diff \cdot BaseRTT |
| 483 |
| 484 where: |
| 485 |
| 486 .. math:: |
| 487 Diff &= Expected - Actual \\ |
| 488 &= \frac{cWnd}{BaseRTT} - \frac{cWnd}{RTT} |
| 489 |
| 490 Veno makes decision on cwnd modification based on the calculated N and its |
| 491 predefined threshold beta. |
| 492 |
| 493 Specifically, it refines the additive increase algorithm of Reno so that the |
| 494 connection can stay longer in the stable state by incrementing cwnd by |
| 495 1/cwnd for every other new ACK received after the available bandwidth has |
| 496 been fully utilized, i.e. when N exceeds beta. Otherwise, Veno increases |
| 497 its cwnd by 1/cwnd upon every new ACK receipt as in Reno. |
| 498 |
| 499 In the multiplicative decrease algorithm, when Veno is in the non-congestive |
| 500 state, i.e. when N is less than beta, Veno decrements its cwnd by only 1/5 |
| 501 because the loss encountered is more likely a corruption-based loss than a |
| 502 congestion-based. Only when N is greater than beta, Veno halves its sending |
| 503 rate as in Reno. |
| 504 |
| 505 More informations at: http://dx.doi.org/10.1109/JSAC.2002.807336 |
| 506 |
| 507 Bic |
| 508 ^^^ |
| 509 |
| 510 In TCP Bic the congestion control problem is viewed as a search |
| 511 problem. Taking as a starting point the current window value |
| 512 and as a target point the last maximum window value |
| 513 (i.e. the cWnd value just before the loss event) a binary search |
| 514 technique can be used to update the cWnd value at the midpoint between |
| 515 the two, directly or using an additive increase strategy if the distance from |
| 516 the current window is too large. |
| 517 |
| 518 This way, assuming a no-loss period, the congestion window logarithmically |
| 519 approaches the maximum value of cWnd until the difference between it and cWnd |
| 520 falls below a preset threshold. After reaching such a value (or the maximum |
| 521 window is unknown, i.e. the binary search does not start at all) the algorithm |
| 522 switches to probing the new maximum window with a 'slow start' strategy. |
| 523 |
| 524 If a loss occur in either these phases, the current window (before the loss) |
| 525 can be treated as the new maximum, and the reduced (with a multiplicative |
| 526 decrease factor Beta) window size can be used as the new minimum. |
| 527 |
| 528 More informations at: http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber
=1354672 |
| 529 |
| 530 YeAH |
| 531 ^^^^ |
| 532 |
| 533 YeAH-TCP (Yet Another HighSpeed TCP) is a heuristic designed to balance various |
| 534 requirements of a state-of-the-art congestion control algorithm: |
| 535 |
| 536 |
| 537 1. fully exploit the link capacity of high BDP networks while inducing a small n
umber of congestion events |
| 538 2. compete friendly with Reno flows |
| 539 3. achieve intra and RTT fairness |
| 540 4. robust to random losses |
| 541 5. achieve high performance regardless of buffer size |
| 542 |
| 543 YeAH operates between 2 modes: Fast and Slow mode. In the Fast mode when the qu
eue |
| 544 occupancy is small and the network congestion level is low, YeAH increments |
| 545 its congestion window according to the aggressive STCP rule. When the number of
packets |
| 546 in the queue grows beyond a threshold and the network congestion level is high,
YeAH enters |
| 547 its Slow mode, acting as Reno with a decongestion algorithm. YeAH employs Vegas
' mechanism |
| 548 for calculating the backlog as in Equation :eq:`q_yeah`. The estimation of the
network congestion |
| 549 level is shown in Equation :eq:`l_yeah`. |
| 550 |
| 551 .. math:: Q = (RTT - BaseRTT) \cdot \frac{cWnd}{RTT} |
| 552 :label: q_yeah |
| 553 |
| 554 .. math:: L = \frac{RTT - BaseRTT}{BaseRTT} |
| 555 :label: l_yeah |
| 556 |
| 557 To ensure TCP friendliness, YeAH also implements an algorithm to detect the pres
ence of legacy |
| 558 Reno flows. Upon the receipt of 3 duplicate ACKs, YeAH decreases its slow start
threshold |
| 559 according to Equation (3) if it's not competing with Reno flows. Otherwise, th
e ssthresh is |
| 560 halved as in Reno: |
| 561 |
| 562 .. math:: ssthresh = min(max(\frac{cWnd}{8}, Q), \frac{cWnd}{2}) |
| 563 |
| 564 More information: http://www.csc.lsu.edu/~sjpark/cs7601/4-YeAH_TCP.pdf |
| 565 |
| 566 Illinois |
| 567 ^^^^^^^^ |
| 568 |
| 569 TCP Illinois is a hybrid congestion control algorithm designed for |
| 570 high-speed networks. Illinois implements a Concave-AIMD (or C-AIMD) |
| 571 algorithm that uses packet loss as the primary congestion signal to |
| 572 determine the direction of window update and queueing delay as the |
| 573 secondary congestion signal to determine the amount of change. |
| 574 |
| 575 The additive increase and multiplicative decrease factors (denoted as |
| 576 alpha and beta, respectively) are functions of the current average queueing |
| 577 delay da as shown in Equations (1) and (2). To improve the protocol |
| 578 robustness against sudden fluctuations in its delay sampling, |
| 579 Illinois allows the increment of alpha to alphaMax |
| 580 only if da stays below d1 for a some (theta) amount of time. |
| 581 |
| 582 .. math:: |
| 583 alpha &= |
| 584 \begin{cases} |
| 585 \quad alphaMax & \quad \text{if } da <= d1 \\ |
| 586 \quad k1 / (k2 + da) & \quad \text{otherwise} \\ |
| 587 \end{cases} \\ |
| 588 \\ |
| 589 beta &= |
| 590 \begin{cases} |
| 591 \quad betaMin & \quad \text{if } da <= d2 \\ |
| 592 \quad k3 + k4 \, da & \quad \text{if } d2 < da < d3 \\ |
| 593 \quad betaMax & \quad \text{otherwise} |
| 594 \end{cases} |
| 595 ····························· |
| 596 where the calculations of k1, k2, k3, and k4 are shown in the following: |
| 597 |
| 598 .. math:: |
| 599 |
| 600 k1 &= \frac{(dm - d1) \cdot alphaMin \cdot alphaMax}{alphaMax - alphaMin} \\ |
| 601 \\ |
| 602 k2 &= \frac{(dm - d1) \cdot alphaMin}{alphaMax - alphaMin} - d1 \\ |
| 603 \\ |
| 604 k3 &= \frac{alphaMin \cdot d3 - alphaMax \cdot d2}{d3 - d2} \\ |
| 605 \\ |
| 606 k4 &= \frac{alphaMax - alphaMin}{d3 - d2} |
| 607 |
| 608 Other parameters include da (the current average queueing delay), and |
| 609 Ta (the average RTT, calculated as sumRtt / cntRtt in the implementation) and |
| 610 Tmin (baseRtt in the implementation) which is the minimum RTT ever seen. |
| 611 dm is the maximum (average) queueing delay, and Tmax (maxRtt in the |
| 612 implementation) is the maximum RTT ever seen. |
| 613 |
| 614 .. math:: |
| 615 |
| 616 da &= Ta - Tmin |
| 617 |
| 618 dm &= Tmax - Tmin |
| 619 |
| 620 d_i &= eta_i \cdot dm |
| 621 |
| 622 Illinois only executes its adaptation of alpha and beta when cwnd exceeds a thre
shold |
| 623 called winThresh. Otherwise, it sets alpha and beta to the base values of 1 and
0.5, |
| 624 respectively. |
| 625 |
| 626 Following the implementation of Illinois in the Linux kernel, we use the followi
ng |
| 627 default parameter settings: |
| 628 |
| 629 * alphaMin = 0.3 (0.1 in the Illinois paper) |
| 630 * alphaMax = 10.0 |
| 631 * betaMin = 0.125 |
| 632 * betaMax = 0.5 |
| 633 * winThresh = 15 (10 in the Illinois paper) |
| 634 * theta = 5 |
| 635 * eta1 = 0.01 |
| 636 * eta2 = 0.1 |
| 637 * eta3 = 0.8 |
| 638 |
| 639 More information: http://www.doi.org/10.1145/1190095.1190166 |
| 640 |
| 641 H-TCP |
| 642 ^^^^^ |
| 643 |
| 644 H-TCP has been designed for high BDP (Bandwidth-Delay Product) paths. It is· |
| 645 a dual mode protocol. In normal conditions, it works like traditional TCP· |
| 646 with the same rate of increment and decrement for the congestion window.· |
| 647 However, in high BDP networks, when it finds no congestion on the path· |
| 648 after ``deltal`` seconds, it increases the window size based on the alpha· |
| 649 function in the following: |
| 650 |
| 651 .. math:: |
| 652 |
| 653 alpha(delta)=1+10(delta-deltal)+0.5(delta-deltal)^2· |
| 654 |
| 655 where ``deltal`` is a threshold in seconds for switching between the modes and· |
| 656 ``delta`` is the elapsed time from the last congestion. During congestion,· |
| 657 it reduces the window size by multiplying by beta function provided· |
| 658 in the reference paper. The calculated throughput between the last two· |
| 659 consecutive congestion events is considered for beta calculation.· |
| 660 |
| 661 The transport ``TcpHtcp`` can be selected in the program· |
| 662 ``examples/tcp/tcp-variants/comparison`` to perform an experiment with H-TCP, |
| 663 although it is useful to increase the bandwidth in this example (e.g. |
| 664 to 20 Mb/s) to create a higher BDP link, such as |
| 665 |
| 666 :: |
| 667 |
| 668 ./waf --run "tcp-variants-comparison --transport_prot=TcpHtcp --bandwidth=20Mb
ps --duration=10" |
| 669 |
| 670 More information (paper): http://www.hamilton.ie/net/htcp3.pdf |
1 | 671 |
2 More information (Internet Draft): https://tools.ietf.org/html/draft-leith-tcp-
htcp-06 | 672 More information (Internet Draft): https://tools.ietf.org/html/draft-leith-tcp-
htcp-06 |
| 673 |
| 674 LEDBAT |
| 675 ^^^^^^ |
| 676 |
| 677 Low Extra Delay Background Transport (LEDBAT) is an experimental delay-based· |
| 678 congestion control algorithm that seeks to utilize the available bandwidth on |
| 679 an end-to-end path while limiting the consequent increase in queueing delay· |
| 680 on that path. LEDBAT uses changes in one-way delay measurements to limit· |
| 681 congestion that the flow itself induces in the network. |
| 682 |
| 683 As a first approximation, the LEDBAT sender operates as shown below: |
| 684 |
| 685 on receipt of an ACK: |
| 686 |
| 687 .. math:: |
| 688 currentdelay = acknowledgement.delay |
| 689 basedelay = min (basedelay, currentdelay) |
| 690 queuingdelay = currentdelay - basedelay |
| 691 offtarget = (TARGET - queuingdelay) / TARGET |
| 692 cWnd += GAIN * offtarget * bytesnewlyacked * MSS / cWnd |
| 693 |
| 694 ``TARGET`` is the maximum queueing delay that LEDBAT itself may introduce in the |
| 695 network, and ``GAIN`` determines the rate at which the cwnd responds to changes
in· |
| 696 queueing delay; ``offtarget`` is a normalized value representing the difference
between |
| 697 the measured current queueing delay and the predetermined TARGET delay. offtarge
t can· |
| 698 be positive or negative; consequently, cwnd increases or decreases in proportion
to· |
| 699 offtarget. |
| 700 |
| 701 Following the recommendation of RFC 6817, the default values of the parameters a
re: |
| 702 |
| 703 * TargetDelay = 100 |
| 704 * baseHistoryLen = 10 |
| 705 * noiseFilterLen = 4 |
| 706 * Gain = 1 |
| 707 |
| 708 To enable LEDBAT on all TCP sockets, the following configuration can be used: |
| 709 |
| 710 :: |
| 711 |
| 712 Config::SetDefault ("ns3::TcpL4Protocol::SocketType", TypeIdValue (TcpLedbat::
GetTypeId ())); |
| 713 |
| 714 To enable LEDBAT on a chosen TCP socket, the following configuration can be used
: |
| 715 |
| 716 :: |
| 717 |
| 718 Config::Set ("$ns3::NodeListPriv/NodeList/1/$ns3::TcpL4Protocol/SocketType", T
ypeIdValue (TcpLedbat::GetTypeId ())); |
| 719 |
| 720 The following unit tests have been written to validate the implementation of LED
BAT: |
| 721 |
| 722 * LEDBAT should operate same as NewReno during slow start |
| 723 * LEDBAT should operate same as NewReno if timestamps are disabled |
| 724 * Test to validate cwnd increment in LEDBAT |
| 725 * Test to validate cwnd decrement in LEDBAT |
| 726 |
| 727 In comparison to RFC 6817, the scope and limitations of the current LEDBAT |
| 728 implementation are: |
| 729 |
| 730 * It assumes that the clocks on the sender side and receiver side are synchronis
ed |
| 731 * In line with Linux implementation, the one-way delay is calculated at the send
er side by using the timestamps option in TCP header |
| 732 * Only the MIN function is used for noise filtering· |
| 733 |
| 734 More information about LEDBAT is available in RFC 6817: https://tools.ietf.org/h
tml/rfc6817 |
| 735 |
| 736 Support for Explicit Congestion Notification (ECN) |
| 737 ++++++++++++++++++++++++++++++++++++++++++++++++++ |
| 738 |
| 739 ECN provides end-to-end notification of network congestion without dropping |
| 740 packets. It uses two bits in the IP header: ECN Capable Transport (ECT bit) |
| 741 and Congestion Experienced (CE bit), and two bits in the TCP header: Congestion |
| 742 Window Reduced (CWR) and ECN Echo (ECE).· |
| 743 |
| 744 More information is available in RFC 3168: https://tools.ietf.org/html/rfc3168 |
| 745 |
| 746 The following ECN states are declared in ``src/internet/model/tcp-socket.h`` |
| 747 |
| 748 :: |
| 749 |
| 750 typedef enum |
| 751 { |
| 752 ECN_DISABLED = 0, //!< ECN disabled traffic· |
| 753 ECN_IDLE, //!< ECN is enabled but currently there is no action per
taining to ECE or CWR to be taken· |
| 754 ECN_CE_RCVD, //!< This state indicates that the receiver has received
a packet with CE bit set in IP header· |
| 755 ECN_ECE_SENT, //!< This state indicates that the receiver has sent an
ACK with ECE bit set in TCP header |
| 756 ECN_ECE_RCVD, //!< This state indicates that the sender has received a
n ACK with ECE bit set in TCP header· |
| 757 ECN_CWR_SENT //!< This state indicates that the sender has reduced th
e congestion window, and sent a packet |
| 758 with CWR bit set in TCP header |
| 759 } EcnStates_t; |
| 760 |
| 761 The following are some important ECN parameters |
| 762 |
| 763 :: |
| 764 |
| 765 // ECN parameters |
| 766 bool m_ecn; //!< Socket ECN capability |
| 767 TracedValue<EcnStates_t> m_ecnState; //!< Current ECN State, represente
d as combination of EcnState values |
| 768 TracedValue<SequenceNumber32> m_ecnEchoSeq; //< Sequence number of the last re
ceived ECN Echo |
| 769 |
| 770 Enabling ECN |
| 771 ^^^^^^^^^^^^ |
| 772 |
| 773 By default, support for ECN is disabled in TCP sockets. To enable, change |
| 774 the value of the attribute ``ns3::TcpSocketBase::UseEcn`` from false to true. |
| 775 |
| 776 ECN negotiation |
| 777 ^^^^^^^^^^^^^^^ |
| 778 |
| 779 ECN capability is negotiated during the three-way TCP handshake: |
| 780 |
| 781 1. Sender sends SYN + CWR + ECE |
| 782 |
| 783 :: |
| 784 |
| 785 if (m_ecn) |
| 786 {· |
| 787 SendEmptyPacket (TcpHeader::SYN | TcpHeader::ECE | TcpHeader::CWR); |
| 788 } |
| 789 else |
| 790 { |
| 791 SendEmptyPacket (TcpHeader::SYN); |
| 792 } |
| 793 m_ecnState = ECN_DISABLED; |
| 794 |
| 795 2. Receiver sends SYN + ACK + ECE |
| 796 |
| 797 :: |
| 798 |
| 799 if (m_ecn && (tcpHeader.GetFlags () & (TcpHeader::CWR | TcpHeader::ECE)) ==
(TcpHeader::CWR | TcpHeader::ECE)) |
| 800 { |
| 801 SendEmptyPacket (TcpHeader::SYN | TcpHeader::ACK |TcpHeader::ECE); |
| 802 m_ecnState = ECN_IDLE; |
| 803 } |
| 804 else |
| 805 { |
| 806 SendEmptyPacket (TcpHeader::SYN | TcpHeader::ACK); |
| 807 m_ecnState = ECN_DISABLED; |
| 808 } |
| 809 |
| 810 3. Sender sends ACK |
| 811 |
| 812 :: |
| 813 |
| 814 if (m_ecn && (tcpHeader.GetFlags () & (TcpHeader::CWR | TcpHeader::ECE)) ==
(TcpHeader::ECE)) |
| 815 { |
| 816 m_ecnState = ECN_IDLE; |
| 817 } |
| 818 else |
| 819 { |
| 820 m_ecnState = ECN_DISABLED; |
| 821 } |
| 822 |
| 823 Once the ECN-negotiation is successful, the sender sends data packets with ECT |
| 824 bits set in the IP header. |
| 825 |
| 826 Note: As mentioned in Section 6.1.1 of RFC 3168, ECT bits should not be set |
| 827 during ECN negotiation. The ECN negotiation implemented in |ns3| follows· |
| 828 this guideline. |
| 829 |
| 830 ECN State Transitions |
| 831 ^^^^^^^^^^^^^^^^^^^^^ |
| 832 |
| 833 1. Initially both sender and receiver have their m_ecnState set as ECN_DISABLED |
| 834 2. Once the ECN negotiation is successful, their states are set to ECN_IDLE· |
| 835 3. Upon receipt of a packet with CE bits set in IP header, the |
| 836 receiver changes its state to ECN_CE_RCVD |
| 837 4. When the receiver sends an ACK with ECE bit set, its state is set as· |
| 838 ECN_ECE_SENT |
| 839 5. When the sender receives an ACK with ECE bit set from receiver, its state· |
| 840 is set as ECN_ECE_RCVD |
| 841 6. When the sender sends the packet with CWR bit set, its state is set as· |
| 842 ECN_CWR_SENT |
| 843 7. When the receiver receives the packet with CWR bit set, its state is set· |
| 844 as ECN_IDLE |
| 845 |
| 846 RFC 3168 compliance |
| 847 ^^^^^^^^^^^^^^^^^^^ |
| 848 |
| 849 Based on the suggestions provided in RFC 3168, the following behavior has |
| 850 been implemented: |
| 851 |
| 852 1. Pure ACK packets should not have the ECT bit set (Section 6.1.4). |
| 853 2. Retransmitted packets should not have the ECT bit set in order to prevent DoS |
| 854 attack (Section 6.1.5).· |
| 855 3. The sender should should reduce the congestion window only once in each· |
| 856 window (Section 6.1.2). |
| 857 4. The receiver should ignore the CE bits set in a packet arriving out of |
| 858 window (Section 6.1.5).· |
| 859 5. The sender should ignore the ECE bits set in the packet arriving out of |
| 860 window (Section 6.1.2). |
| 861 |
| 862 Open issues |
| 863 ^^^^^^^^^^^ |
| 864 |
| 865 The following issues are yet to be addressed: |
| 866 |
| 867 1. Retransmitted packets should not have the CWR bit set (Section 6.1.5). |
| 868 |
| 869 2. Despite the congestion window size being 1 MSS, the sender should reduce its |
| 870 congestion window by half when it receives a packet with the ECE bit set. The |
| 871 sender must reset the retransmit timer on receiving the ECN-Echo packet when |
| 872 the congestion window is one. The sending TCP will then be able to send a |
| 873 new packet only when the retransmit timer expires (Section 6.1.2). |
| 874 ·· |
| 875 3. Support for separately handling the enabling of ECN on the incoming and |
| 876 outgoing TCP sessions (e.g. a TCP may perform ECN echoing but not set the |
| 877 ECT codepoints on its outbound data segments). |
3 | 878 |
4 Validation | 879 Validation |
5 ++++++++++ | 880 ++++++++++ |
6 | 881 |
| 882 The following tests are found in the ``src/internet/test`` directory. In |
| 883 general, TCP tests inherit from a class called :cpp:class:`TcpGeneralTest`, |
| 884 which provides common operations to set up test scenarios involving TCP |
| 885 objects. For more information on how to write new tests, see the |
| 886 section below on :ref:`Writing-tcp-tests`. |
| 887 |
| 888 * **tcp:** Basic transmission of string of data from client to server |
| 889 * **tcp-bytes-in-flight-test:** TCP correctly estimates bytes in flight under lo
ss conditions |
| 890 * **tcp-cong-avoid-test:** TCP congestion avoidance for different packet sizes |
| 891 * **tcp-datasentcb:** Check TCP's 'data sent' callback |
| 892 * **tcp-endpoint-bug2211-test:** A test for an issue that was causing stack over
flow |
| 893 * **tcp-fast-retr-test:** Fast Retransmit testing |
| 894 * **tcp-header:** Unit tests on the TCP header |
| 895 * **tcp-highspeed-test:** Unit tests on the Highspeed congestion control |
| 896 * **tcp-htcp-test:** Unit tests on the H-TCP congestion control |
| 897 * **tcp-hybla-test:** Unit tests on the Hybla congestion control |
| 898 * **tcp-vegas-test:** Unit tests on the Vegas congestion control |
| 899 * **tcp-veno-test:** Unit tests on the Veno congestion control |
| 900 * **tcp-scalable-test:** Unit tests on the Scalable congestion control |
| 901 * **tcp-bic-test:** Unit tests on the BIC congestion control |
| 902 * **tcp-yeah-test:** Unit tests on the YeAH congestion control |
| 903 * **tcp-illinois-test:** Unit tests on the Illinois congestion control |
| 904 * **tcp-ledbat-test:** Unit tests on the LEDBAT congestion control |
| 905 * **tcp-option:** Unit tests on TCP options |
| 906 * **tcp-pkts-acked-test:** Unit test the number of time that PktsAcked is called |
| 907 * **tcp-rto-test:** Unit test behavior after a RTO timeout occurs |
| 908 * **tcp-rtt-estimation-test:** Check RTT calculations, including retransmission
cases |
| 909 * **tcp-slow-start-test:** Check behavior of slow start |
| 910 * **tcp-timestamp:** Unit test on the timestamp option |
| 911 * **tcp-wscaling:** Unit test on the window scaling option |
| 912 * **tcp-zero-window-test:** Unit test persist behavior for zero window condition
s |
| 913 * **tcp-ecn-test:** Unit tests on explicit congestion notification |
| 914 |
| 915 Several tests have dependencies outside of the ``internet`` module, so they |
| 916 are located in a system test directory called ``src/test/ns3tcp``. Three |
| 917 of these six tests involve use of the Network Simulation Cradle, and are |
| 918 disabled if NSC is not enabled in the build.·· |
| 919 |
| 920 * **ns3-tcp-cwnd:** Check to see that ns-3 TCP congestion control works against
liblinux2.6.26.so implementation |
| 921 * **ns3-tcp-interoperability:** Check to see that ns-3 TCP interoperates with li
blinux2.6.26.so implementation |
| 922 * **ns3-tcp-loss:** Check behavior of ns-3 TCP upon packet losses |
| 923 * **nsc-tcp-loss:** Check behavior of NSC TCP upon packet losses |
| 924 * **ns3-tcp-no-delay:** Check that ns-3 TCP Nagle"s algorithm works correctly an
d that it can be disabled |
| 925 * **ns3-tcp-socket:** Check that ns-3 TCP successfully transfers an application
data write of various sizes |
| 926 * **ns3-tcp-state:** Check the operation of the TCP state machine for several ca
ses |
| 927 · |
| 928 Several TCP validation test results can also be found in the |
| 929 `wiki page <http://www.nsnam.org/wiki/New_TCP_Socket_Architecture>`_· |
| 930 describing this implementation. |
| 931 |
| 932 TCP ECN operation is tested in the ARED and RED tests that are documented in the
traffic-control· |
| 933 module documentation. |
| 934 |
| 935 Writing a new congestion control algorithm |
| 936 ++++++++++++++++++++++++++++++++++++++++++ |
| 937 |
| 938 Writing (or porting) a congestion control algorithms from scratch (or from |
| 939 other systems) is a process completely separated from the internals of |
| 940 TcpSocketBase. |
| 941 |
| 942 All operations that are delegated to a congestion control are contained in |
| 943 the class TcpCongestionOps. It mimics the structure tcp_congestion_ops of |
| 944 Linux, and the following operations are defined: |
| 945 |
| 946 .. code-block:: c++ |
| 947 |
| 948 virtual std::string GetName () const; |
| 949 virtual uint32_t GetSsThresh (Ptr<const TcpSocketState> tcb, uint32_t bytesInF
light); |
| 950 virtual void IncreaseWindow (Ptr<TcpSocketState> tcb, uint32_t segmentsAcked); |
| 951 virtual void PktsAcked (Ptr<TcpSocketState> tcb, uint32_t segmentsAcked,const
Time& rtt); |
| 952 virtual Ptr<TcpCongestionOps> Fork (); |
| 953 |
| 954 The most interesting methods to write are GetSsThresh and IncreaseWindow. |
| 955 The latter is called when TcpSocketBase decides that it is time to increase |
| 956 the congestion window. Much information is available in the Transmission |
| 957 Control Block, and the method should increase cWnd and/or ssThresh based |
| 958 on the number of segments acked. |
| 959 |
| 960 GetSsThresh is called whenever the socket needs an updated value of the |
| 961 slow start threshold. This happens after a loss; congestion control algorithms |
| 962 are then asked to lower such value, and to return it. |
| 963 |
| 964 PktsAcked is used in case the algorithm needs timing information (such as |
| 965 RTT), and it is called each time an ACK is received. |
| 966 |
| 967 TCP SACK and non-SACK |
| 968 +++++++++++++++++++++ |
| 969 To avoid code duplication and the effort of maintaining two different versions |
| 970 of the TCP core, namely RFC 6675 (TCP-SACK) and RFC 5681 (TCP congestion |
| 971 control), we have merged RFC 6675 in the current code base. If the receiver |
| 972 supports the option, the sender bases its retransmissions over the received |
| 973 SACK information. However, in the absence of that option, the best it can do is |
| 974 to follow the RFC 5681 specification (on Fast Retransmit/Recovery) and |
| 975 employing NewReno modifications in case of partial ACKs. |
| 976 |
| 977 The merge work consisted in implementing an emulation of fake SACK options in |
| 978 the sender (when the receiver does not support SACK) following RFC 5681 rules. |
| 979 The generation is straightforward: each duplicate ACK (following the definition |
| 980 of RFC 5681) carries a new SACK option, that indicates (in increasing order) |
| 981 the blocks transmitted after the SND.UNA, not including the block starting from |
| 982 SND.UNA itself. |
| 983 |
| 984 With this emulated SACK information, the sender behaviour is unified in these |
| 985 two cases. By carefully generating these SACK block, we are able to employ all |
| 986 the algorithms outlined in RFC 6675 (e.g. Update(), NextSeg(), IsLost()) during |
| 987 non-SACK transfers. Of course, in the case of RTO expiration, no guess about |
| 988 SACK block could be made, and so they are not generated (consequently, the |
| 989 implementation will re-send all segments starting from SND.UNA, even the ones |
| 990 correctly received). Please note that the generated SACK option (in the case of |
| 991 a non-SACK receiver) by the sender never leave the sender node itself; they are |
| 992 created locally by the TCP implementation and then consumed. |
| 993 |
| 994 A similar concept is used in Linux with the function tcp_add_reno_sack. Our |
| 995 implementation resides in the TcpTxBuffer class that implements a scoreboard |
| 996 through two different lists of segments. TcpSocketBase actively uses the API |
| 997 provided by TcpTxBuffer to query the scoreboard; please refer to the Doxygen |
| 998 documentation (and to in-code comments) if you want to learn more about this |
| 999 implementation. |
| 1000 |
| 1001 When SACK attribute is enabled for the receiver socket, the sender will not |
| 1002 craft any SACK option, relying only on what it receives from the network. |
| 1003 |
| 1004 Current limitations |
| 1005 +++++++++++++++++++ |
| 1006 |
| 1007 * TcpCongestionOps interface does not contain every possible Linux operation |
| 1008 * Fast retransmit / fast recovery are bound with TcpSocketBase, thereby preventi
ng easy simulation of TCP Tahoe |
| 1009 |
| 1010 .. _Writing-tcp-tests: |
| 1011 |
| 1012 Writing TCP tests |
| 1013 +++++++++++++++++ |
| 1014 |
| 1015 The TCP subsystem supports automated test |
| 1016 cases on both socket functions and congestion control algorithms. To show |
| 1017 how to write tests for TCP, here we explain the process of creating a test |
| 1018 case that reproduces a bug (#1571 in the project bug tracker). |
| 1019 |
| 1020 The bug concerns the zero window situation, which happens when the receiver can |
| 1021 not handle more data. In this case, it advertises a zero window, which causes |
| 1022 the sender to pause transmission and wait for the receiver to increase the |
| 1023 window. |
| 1024 |
| 1025 The sender has a timer to periodically check the receiver's window: however, in |
| 1026 modern TCP implementations, when the receiver has freed a "significant" amount |
| 1027 of data, the receiver itself sends an "active" window update, meaning that |
| 1028 the transmission could be resumed. Nevertheless, the sender timer is still |
| 1029 necessary because window updates can be lost. |
| 1030 |
| 1031 .. note:: |
| 1032 During the text, we will assume some knowledge about the general design |
| 1033 of the TCP test infrastructure, which is explained in detail into the |
| 1034 Doxygen documentation. As a brief summary, the strategy is to have a class |
| 1035 that sets up a TCP connection, and that calls protected members of itself. |
| 1036 In this way, subclasses can implement the necessary members, which will |
| 1037 be called by the main TcpGeneralTest class when events occour. For example, |
| 1038 after processing an ACK, the method ProcessedAck will be invoked. Subclasses |
| 1039 interested in checking some particular things which must have happened during |
| 1040 an ACK processing, should implement the ProcessedAck method and check |
| 1041 the interesting values inside the method. To get a list of available methods, |
| 1042 please check the Doxygen documentation. |
| 1043 |
| 1044 We describe the writing of two test case, covering both situations: the |
| 1045 sender's zero-window probing and the receiver "active" window update. Our focus |
| 1046 will be on dealing with the reported problems, which are: |
| 1047 |
| 1048 * an ns-3 receiver does not send "active" window update when its receive buffer |
| 1049 is being freed; |
| 1050 * even if the window update is artificially crafted, the transmission does not |
| 1051 resume. |
| 1052 |
| 1053 However, other things should be checked in the test: |
| 1054 |
| 1055 * Persistent timer setup |
| 1056 * Persistent timer teardown if rWnd increases |
| 1057 |
| 1058 To construct the test case, one first derives from the TcpGeneralTest class: |
| 1059 |
| 1060 The code is the following: |
| 1061 |
| 1062 .. code-block:: c++ |
| 1063 |
| 1064 TcpZeroWindowTest::TcpZeroWindowTest (const std::string &desc) |
| 1065 : TcpGeneralTest (desc) |
| 1066 { |
| 1067 } |
| 1068 |
| 1069 Then, one should define the general parameters for the TCP connection, which |
| 1070 will be one-sided (one node is acting as SENDER, while the other is acting as |
| 1071 RECEIVER): |
| 1072 |
| 1073 * Application packet size set to 500, and 20 packets in total (meaning a stream |
| 1074 of 10k bytes) |
| 1075 * Segment size for both SENDER and RECEIVER set to 500 bytes |
| 1076 * Initial slow start threshold set to UINT32_MAX |
| 1077 * Initial congestion window for the SENDER set to 10 segments (5000 bytes) |
| 1078 * Congestion control: NewReno |
| 1079 |
| 1080 We have also to define the link properties, because the above definition does |
| 1081 not work for every combination of propagation delay and sender application behav
ior. |
| 1082 |
| 1083 * Link one-way propagation delay: 50 ms |
| 1084 * Application packet generation interval: 10 ms |
| 1085 * Application starting time: 20 s after the starting point |
| 1086 |
| 1087 To define the properties of the environment (e.g. properties which should be |
| 1088 set before the object creation, such as propagation delay) one next implements |
| 1089 ehe method ConfigureEnvironment: |
| 1090 |
| 1091 .. code-block:: c++ |
| 1092 |
| 1093 void |
| 1094 TcpZeroWindowTest::ConfigureEnvironment () |
| 1095 { |
| 1096 TcpGeneralTest::ConfigureEnvironment (); |
| 1097 SetAppPktCount (20); |
| 1098 SetMTU (500); |
| 1099 SetTransmitStart (Seconds (2.0)); |
| 1100 SetPropagationDelay (MilliSeconds (50)); |
| 1101 } |
| 1102 |
| 1103 For other properties, set after the object creation, one can use· |
| 1104 ConfigureProperties (). |
| 1105 The difference is that some values, such as initial congestion window |
| 1106 or initial slow start threshold, are applicable only to a single instance, not |
| 1107 to every instance we have. Usually, methods that requires an id and a value |
| 1108 are meant to be called inside ConfigureProperties (). Please see the doxygen |
| 1109 documentation for an exhaustive list of the tunable properties. |
| 1110 |
| 1111 .. code-block:: c++ |
| 1112 |
| 1113 void |
| 1114 TcpZeroWindowTest::ConfigureProperties () |
| 1115 { |
| 1116 TcpGeneralTest::ConfigureProperties (); |
| 1117 SetInitialCwnd (SENDER, 10); |
| 1118 } |
| 1119 |
| 1120 To see the default value for the experiment, please see the implementation of |
| 1121 both methods inside TcpGeneralTest class. |
| 1122 |
| 1123 .. note:: |
| 1124 If some configuration parameters are missing, add a method called |
| 1125 "SetSomeValue" which takes as input the value only (if it is meant to be |
| 1126 called inside ConfigureEnvironment) or the socket and the value (if it is |
| 1127 meant to be called inside ConfigureProperties). |
| 1128 |
| 1129 To define a zero-window situation, we choose (by design) to initiate the connect
ion |
| 1130 with a 0-byte rx buffer. This implies that the RECEIVER, in its first SYN-ACK, |
| 1131 advertises a zero window. This can be accomplished by implementing the method |
| 1132 CreateReceiverSocket, setting an Rx buffer value of 0 bytes (at line 6 of the |
| 1133 following code): |
| 1134 |
| 1135 .. code-block:: c++ |
| 1136 :linenos: |
| 1137 :emphasize-lines: 6,7,8 |
| 1138 |
| 1139 Ptr<TcpSocketMsgBase> |
| 1140 TcpZeroWindowTest::CreateReceiverSocket (Ptr<Node> node) |
| 1141 { |
| 1142 Ptr<TcpSocketMsgBase> socket = TcpGeneralTest::CreateReceiverSocket (node); |
| 1143 |
| 1144 socket->SetAttribute("RcvBufSize", UintegerValue (0)); |
| 1145 Simulator::Schedule (Seconds (10.0), |
| 1146 &TcpZeroWindowTest::IncreaseBufSize, this); |
| 1147 |
| 1148 return socket; |
| 1149 } |
| 1150 |
| 1151 Even so, to check the active window update, we should schedule an increase |
| 1152 of the buffer size. We do this at line 7 and 8, scheduling the function |
| 1153 IncreaseBufSize. |
| 1154 |
| 1155 .. code-block:: c++ |
| 1156 |
| 1157 void |
| 1158 TcpZeroWindowTest::IncreaseBufSize () |
| 1159 { |
| 1160 SetRcvBufSize (RECEIVER, 2500); |
| 1161 } |
| 1162 |
| 1163 Which utilizes the SetRcvBufSize method to edit the RxBuffer object of the |
| 1164 RECEIVER. As said before, check the Doxygen documentation for class TcpGeneralTe
st |
| 1165 to be aware of the various possibilities that it offers. |
| 1166 |
| 1167 .. note:: |
| 1168 By design, we choose to mantain a close relationship between TcpSocketBase |
| 1169 and TcpGeneralTest: they are connected by a friendship relation. Since |
| 1170 friendship is not passed through inheritance, if one discovers that one |
| 1171 needs to access or to modify a private (or protected) member of TcpSocketBase
, |
| 1172 one can do so by adding a method in the class TcpGeneralSocket. An example |
| 1173 of such method is SetRcvBufSize, which allows TcpGeneralSocket subclasses |
| 1174 to forcefully set the RxBuffer size. |
| 1175 |
| 1176 .. code-block:: c++ |
| 1177 |
| 1178 void |
| 1179 TcpGeneralTest::SetRcvBufSize (SocketWho who, uint32_t size) |
| 1180 { |
| 1181 if (who == SENDER) |
| 1182 { |
| 1183 m_senderSocket->SetRcvBufSize (size); |
| 1184 } |
| 1185 else if (who == RECEIVER) |
| 1186 { |
| 1187 m_receiverSocket->SetRcvBufSize (size); |
| 1188 } |
| 1189 else |
| 1190 { |
| 1191 NS_FATAL_ERROR ("Not defined"); |
| 1192 } |
| 1193 } |
| 1194 |
| 1195 Next, we can start to follow the TCP connection: |
| 1196 |
| 1197 #. At time 0.0 s the connection is opened sender side, with a SYN packet sent fr
om |
| 1198 SENDER to RECEIVER |
| 1199 #. At time 0.05 s the RECEIVER gets the SYN and replies with a SYN-ACK |
| 1200 #. At time 0.10 s the SENDER gets the SYN-ACK and replies with a SYN. |
| 1201 |
| 1202 While the general structure is defined, and the connection is started, |
| 1203 we need to define a way to check the rWnd field on the segments. To this aim, |
| 1204 we can implement the methods Rx and Tx in the TcpGeneralTest subclass, |
| 1205 checking each time the actions of the RECEIVER and the SENDER. These methods are |
| 1206 defined in TcpGeneralTest, and they are attached to the Rx and Tx traces in the |
| 1207 TcpSocketBase. One should write small tests for every detail that one wants to e
nsure during the |
| 1208 connection (it will prevent the test from changing over the time, and it ensures |
| 1209 that the behavior will stay consistent through releases). We start by ensuring t
hat |
| 1210 the first SYN-ACK has 0 as advertised window size: |
| 1211 |
| 1212 .. code-block:: c++ |
| 1213 |
| 1214 void |
| 1215 TcpZeroWindowTest::Tx(const Ptr<const Packet> p, const TcpHeader &h, SocketWh
o who) |
| 1216 { |
| 1217 ... |
| 1218 else if (who == RECEIVER) |
| 1219 { |
| 1220 NS_LOG_INFO ("\tRECEIVER TX " << h << " size " << p->GetSize()); |
| 1221 |
| 1222 if (h.GetFlags () & TcpHeader::SYN) |
| 1223 { |
| 1224 NS_TEST_ASSERT_MSG_EQ (h.GetWindowSize(), 0, |
| 1225 "RECEIVER window size is not 0 in the SYN-AC
K"); |
| 1226 } |
| 1227 } |
| 1228 .... |
| 1229 } |
| 1230 |
| 1231 Pratically, we are checking that every SYN packet sent by the RECEIVER has the |
| 1232 advertised window set to 0. The same thing is done also by checking, in the Rx |
| 1233 method, that each SYN received by SENDER has the advertised window set to 0. |
| 1234 Thanks to the log subsystem, we can print what is happening through messages. |
| 1235 If we run the experiment, enabling the logging, we can see the following: |
| 1236 |
| 1237 .. code-block:: bash |
| 1238 |
| 1239 ./waf shell |
| 1240 gdb --args ./build/utils/ns3-dev-test-runner-debug --test-name=tcp-zero-windo
w-test --stop-on-failure --fullness=QUICK --assert-on-failure --verbose |
| 1241 (gdb) run |
| 1242 |
| 1243 0.00s TcpZeroWindowTestSuite:Tx(): 0.00 SENDER TX 49153 > 4477 [SYN] Seq
=0 Ack=0 Win=32768 ns3::TcpOptionWinScale(2) ns3::TcpOptionTS(0;0) size 36 |
| 1244 0.05s TcpZeroWindowTestSuite:Rx(): 0.05 RECEIVER RX 49153 > 4477 [SYN] S
eq=0 Ack=0 Win=32768 ns3::TcpOptionWinScale(2) ns3::TcpOptionTS(0;0) ns3::TcpOpt
ionEnd(EOL) size 0 |
| 1245 0.05s TcpZeroWindowTestSuite:Tx(): 0.05 RECEIVER TX 4477 > 49153 [SYN|AC
K] Seq=0 Ack=1 Win=0 ns3::TcpOptionWinScale(0) ns3::TcpOptionTS(50;0) size 36 |
| 1246 0.10s TcpZeroWindowTestSuite:Rx(): 0.10 SENDER RX 4477 > 49153 [SYN|ACK]
Seq=0 Ack=1 Win=0 ns3::TcpOptionWinScale(0) ns3::TcpOptionTS(50;0) ns3::TcpOpti
onEnd(EOL) size 0 |
| 1247 0.10s TcpZeroWindowTestSuite:Tx(): 0.10 SENDER TX 49153 > 4477 [ACK] Seq
=1 Ack=1 Win=32768 ns3::TcpOptionTS(100;50) size 32 |
| 1248 0.15s TcpZeroWindowTestSuite:Rx(): 0.15 RECEIVER RX 49153 > 4477 [ACK] S
eq=1 Ack=1 Win=32768 ns3::TcpOptionTS(100;50) ns3::TcpOptionEnd(EOL) size 0 |
| 1249 (...) |
| 1250 |
| 1251 The output is cut to show the threeway handshake. As we can see from the headers
, |
| 1252 the rWnd of RECEIVER is set to 0, and thankfully our tests are not failing. |
| 1253 Now we need to test for the persistent timer, which sould be started by |
| 1254 the SENDER after it receives the SYN-ACK. Since the Rx method is called before |
| 1255 any computation on the received packet, we should utilize another method, namely |
| 1256 ProcessedAck, which is the method called after each processed ACK. In the |
| 1257 following, we show how to check if the persistent event is running after the |
| 1258 processing of the SYN-ACK: |
| 1259 |
| 1260 .. code-block:: c++ |
| 1261 |
| 1262 void |
| 1263 TcpZeroWindowTest::ProcessedAck (const Ptr<const TcpSocketState> tcb, |
| 1264 const TcpHeader& h, SocketWho who) |
| 1265 { |
| 1266 if (who == SENDER) |
| 1267 { |
| 1268 if (h.GetFlags () & TcpHeader::SYN) |
| 1269 { |
| 1270 EventId persistentEvent = GetPersistentEvent (SENDER); |
| 1271 NS_TEST_ASSERT_MSG_EQ (persistentEvent.IsRunning (), true, |
| 1272 "Persistent event not started"); |
| 1273 } |
| 1274 } |
| 1275 } |
| 1276 |
| 1277 Since we programmed the increase of the buffer size after 10 simulated seconds, |
| 1278 we expect the persistent timer to fire before any rWnd changes. When it fires, |
| 1279 the SENDER should send a window probe, and the receiver should reply reporting |
| 1280 again a zero window situation. At first, we investigates on what the sender send
s: |
| 1281 |
| 1282 .. code-block:: c++ |
| 1283 :linenos: |
| 1284 :emphasize-lines: 1,6,7,11 |
| 1285 |
| 1286 if (Simulator::Now ().GetSeconds () <= 6.0) |
| 1287 { |
| 1288 NS_TEST_ASSERT_MSG_EQ (p->GetSize () - h.GetSerializedSize(), 0, |
| 1289 "Data packet sent anyway"); |
| 1290 } |
| 1291 else if (Simulator::Now ().GetSeconds () > 6.0 && |
| 1292 Simulator::Now ().GetSeconds () <= 7.0) |
| 1293 { |
| 1294 NS_TEST_ASSERT_MSG_EQ (m_zeroWindowProbe, false, "Sent another probe")
; |
| 1295 |
| 1296 if (! m_zeroWindowProbe) |
| 1297 { |
| 1298 NS_TEST_ASSERT_MSG_EQ (p->GetSize () - h.GetSerializedSize(), 1, |
| 1299 "Data packet sent instead of window probe")
; |
| 1300 NS_TEST_ASSERT_MSG_EQ (h.GetSequenceNumber(), SequenceNumber32 (1)
, |
| 1301 "Data packet sent instead of window probe")
; |
| 1302 m_zeroWindowProbe = true; |
| 1303 } |
| 1304 } |
| 1305 |
| 1306 We divide the events by simulated time. At line 1, we check everything that |
| 1307 happens before the 6.0 seconds mark; for instance, that no data packets are sent
, |
| 1308 and that the state remains OPEN for both sender and receiver. |
| 1309 |
| 1310 Since the persist timeout is initialized at 6 seconds (excercise left for the |
| 1311 reader: edit the test, getting this value from the Attribute system), we need |
| 1312 to check (line 6) between 6.0 and 7.0 simulated seconds that the probe is sent. |
| 1313 Only one probe is allowed, and this is the reason for the check at line 11. |
| 1314 |
| 1315 .. code-block:: c++ |
| 1316 :linenos: |
| 1317 :emphasize-lines: 6,7 |
| 1318 |
| 1319 if (Simulator::Now ().GetSeconds () > 6.0 && |
| 1320 Simulator::Now ().GetSeconds () <= 7.0) |
| 1321 { |
| 1322 NS_TEST_ASSERT_MSG_EQ (h.GetSequenceNumber(), SequenceNumber32 (1), |
| 1323 "Data packet sent instead of window probe"); |
| 1324 NS_TEST_ASSERT_MSG_EQ (h.GetWindowSize(), 0, |
| 1325 "No zero window advertised by RECEIVER"); |
| 1326 } |
| 1327 |
| 1328 For the RECEIVER, the interval between 6 and 7 seconds is when the zero-window |
| 1329 segment is sent. |
| 1330 |
| 1331 Other checks are redundant; the safest approach is to deny any other packet |
| 1332 exchange between the 7 and 10 seconds mark. |
| 1333 |
| 1334 .. code-block:: c++ |
| 1335 |
| 1336 else if (Simulator::Now ().GetSeconds () > 7.0 && |
| 1337 Simulator::Now ().GetSeconds () < 10.0) |
| 1338 { |
| 1339 NS_FATAL_ERROR ("No packets should be sent before the window update"); |
| 1340 } |
| 1341 |
| 1342 The state checks are performed at the end of the methods, since they are valid |
| 1343 in every condition: |
| 1344 |
| 1345 .. code-block:: c++ |
| 1346 |
| 1347 NS_TEST_ASSERT_MSG_EQ (GetCongStateFrom (GetTcb(SENDER)), TcpSocketState::CA_
OPEN, |
| 1348 "Sender State is not OPEN"); |
| 1349 NS_TEST_ASSERT_MSG_EQ (GetCongStateFrom (GetTcb(RECEIVER)), TcpSocketState::C
A_OPEN, |
| 1350 "Receiver State is not OPEN"); |
| 1351 |
| 1352 Now, the interesting part in the Tx method is to check that after the 10.0 |
| 1353 seconds mark (when the RECEIVER sends the active window update) the value of |
| 1354 the window should be greater than zero (and precisely, set to 2500): |
| 1355 |
| 1356 .. code-block:: c++ |
| 1357 |
| 1358 else if (Simulator::Now().GetSeconds() >= 10.0) |
| 1359 { |
| 1360 NS_TEST_ASSERT_MSG_EQ (h.GetWindowSize(), 2500, |
| 1361 "Receiver window not updated"); |
| 1362 } |
| 1363 |
| 1364 To be sure that the sender receives the window update, we can use the Rx |
| 1365 method: |
| 1366 |
| 1367 .. code-block:: c++ |
| 1368 :linenos: |
| 1369 :emphasize-lines: 5 |
| 1370 |
| 1371 if (Simulator::Now().GetSeconds() >= 10.0) |
| 1372 { |
| 1373 NS_TEST_ASSERT_MSG_EQ (h.GetWindowSize(), 2500, |
| 1374 "Receiver window not updated"); |
| 1375 m_windowUpdated = true; |
| 1376 } |
| 1377 |
| 1378 We check every packet after the 10 seconds mark to see if it has the |
| 1379 window updated. At line 5, we also set to true a boolean variable, to check |
| 1380 that we effectively reach this test. |
| 1381 |
| 1382 Last but not least, we implement also the NormalClose() method, to check that |
| 1383 the connection ends with a success: |
| 1384 |
| 1385 .. code-block:: c++ |
| 1386 |
| 1387 void |
| 1388 TcpZeroWindowTest::NormalClose (SocketWho who) |
| 1389 { |
| 1390 if (who == SENDER) |
| 1391 { |
| 1392 m_senderFinished = true; |
| 1393 } |
| 1394 else if (who == RECEIVER) |
| 1395 { |
| 1396 m_receiverFinished = true; |
| 1397 } |
| 1398 } |
| 1399 |
| 1400 The method is called only if all bytes are transmitted successfully. Then, in |
| 1401 the method FinalChecks(), we check all variables, which should be true (which |
| 1402 indicates that we have perfectly closed the connection). |
| 1403 |
| 1404 .. code-block:: c++ |
| 1405 |
| 1406 void |
| 1407 TcpZeroWindowTest::FinalChecks () |
| 1408 { |
| 1409 NS_TEST_ASSERT_MSG_EQ (m_zeroWindowProbe, true, |
| 1410 "Zero window probe not sent"); |
| 1411 NS_TEST_ASSERT_MSG_EQ (m_windowUpdated, true, |
| 1412 "Window has not updated during the connection"); |
| 1413 NS_TEST_ASSERT_MSG_EQ (m_senderFinished, true, |
| 1414 "Connection not closed successfully (SENDER)"); |
| 1415 NS_TEST_ASSERT_MSG_EQ (m_receiverFinished, true, |
| 1416 "Connection not closed successfully (RECEIVER)"); |
| 1417 } |
| 1418 |
| 1419 To run the test, the usual way is |
| 1420 |
| 1421 .. code-block:: bash |
| 1422 |
| 1423 ./test.py -s tcp-zero-window-test |
| 1424 |
| 1425 PASS: TestSuite tcp-zero-window-test |
| 1426 1 of 1 tests passed (1 passed, 0 skipped, 0 failed, 0 crashed, 0 valgrind err
ors) |
| 1427 |
| 1428 To see INFO messages, use a combination of ./waf shell and gdb (really useful): |
| 1429 |
| 1430 .. code-block:: bash |
| 1431 |
| 1432 |
| 1433 ./waf shell && gdb --args ./build/utils/ns3-dev-test-runner-debug --test-nam
e=tcp-zero-window-test --stop-on-failure --fullness=QUICK --assert-on-failure --
verbose |
| 1434 |
| 1435 and then, hit "Run". |
| 1436 |
| 1437 .. note:: |
| 1438 This code magically runs without any reported errors; however, in real cases, |
| 1439 when you discover a bug you should expect the existing test to fail (this |
| 1440 could indicate a well-written test and a bad-writted model, or a bad-written |
| 1441 test; hopefull the first situation). Correcting bugs is an iterative |
| 1442 process. For instance, commits created to make this test case running without |
| 1443 errors are 11633:6b74df04cf44, (others to be merged). |
| 1444 |
| 1445 Network Simulation Cradle |
| 1446 ************************* |
| 1447 |
| 1448 The `Network Simulation Cradle (NSC) <http://www.wand.net.nz/~stj2/nsc/>`_ is a |
| 1449 framework for wrapping real-world network code into simulators, allowing |
| 1450 simulation of real-world behavior at little extra cost. This work has been |
| 1451 validated by comparing situations using a test network with the same situations |
| 1452 in the simulator. To date, it has been shown that the NSC is able to produce |
| 1453 extremely accurate results. NSC supports four real world stacks: FreeBSD, |
| 1454 OpenBSD, lwIP and Linux. Emphasis has been placed on not changing any of the |
| 1455 network stacks by hand. Not a single line of code has been changed in the |
| 1456 network protocol implementations of any of the above four stacks. However, a |
| 1457 custom C parser was built to programmatically change source code. |
| 1458 |
| 1459 NSC has previously been ported to |ns2| and OMNeT++, and was· |
| 1460 was added to |ns3| in September 2008 (ns-3.2 release). This section· |
| 1461 describes the |ns3| port of NSC and how to use it. |
| 1462 |
| 1463 To some extent, NSC has been superseded by the Linux kernel support within· |
| 1464 `Direct Code Execution (DCE) <http://www.nsnam.org/docs/dce/manual/singlehtml/in
dex.html>`__. However, NSC is still available through the bake build |
| 1465 system. NSC supports Linux kernels 2.6.18 and 2.6.26, but newer |
| 1466 versions of the kernel have not been ported.·· |
| 1467 |
| 1468 Prerequisites |
| 1469 +++++++++++++ |
| 1470 |
| 1471 Presently, NSC has been tested and shown to work on these platforms: |
| 1472 Linux i386 and Linux x86-64. NSC does not support powerpc. Use on |
| 1473 FreeBSD or OS X is unsupported (although it may be able to work). |
| 1474 |
| 1475 Building NSC requires the packages flex and bison.·· |
| 1476 |
| 1477 Configuring and Downloading |
| 1478 +++++++++++++++++++++++++++ |
| 1479 |
| 1480 As of ns-3.17 or later, NSC must either be downloaded separately from |
| 1481 its own repository, or downloading when using the· |
| 1482 `bake build system <http://www.nsnam.org/docs/tutorial/html/getting-started.html
#downloading-ns3-using-bake>`_ of· |
| 1483 |ns3|.·· |
| 1484 |
| 1485 For ns-3.17 or later releases, when using bake, one must configure NSC as· |
| 1486 part of an "allinone" configuration, such as: |
| 1487 |
| 1488 .. sourcecode:: bash |
| 1489 |
| 1490 $ cd bake |
| 1491 $ python bake.py configure -e ns-allinone-3.19 |
| 1492 $ python bake.py download |
| 1493 $ python bake.py build |
| 1494 |
| 1495 Instead of a released version, one may use the ns-3 development version |
| 1496 by specifying "ns-3-allinone" to the configure step above. |
| 1497 |
| 1498 NSC may also be downloaded from· |
| 1499 `its download site <http://research.wand.net.nz/software/nsc.php>`_· |
| 1500 using Mercurial: |
| 1501 |
| 1502 .. sourcecode:: bash |
| 1503 |
| 1504 $ hg clone https://secure.wand.net.nz/mercurial/nsc |
| 1505 |
| 1506 Prior to the ns-3.17 release, NSC was included in the allinone tarball and |
| 1507 the released version did not need to be separately downloaded. |
| 1508 |
| 1509 Building and validating |
| 1510 +++++++++++++++++++++++ |
| 1511 |
| 1512 NSC may be built as part of the bake build process; alternatively, one |
| 1513 may build NSC by itself using its build system; e.g.: |
| 1514 |
| 1515 .. sourcecode:: bash |
| 1516 |
| 1517 $ cd nsc-dev |
| 1518 $ python scons.py |
| 1519 |
| 1520 Once NSC has been built either manually or through the bake system, change |
| 1521 into the |ns3| source directory and try running the following configuration: |
| 1522 |
| 1523 .. sourcecode:: bash |
| 1524 |
| 1525 $ ./waf configure |
| 1526 |
| 1527 If NSC has been previously built and found by waf, then you will see: |
| 1528 |
| 1529 .. sourcecode:: bash |
| 1530 |
| 1531 Network Simulation Cradle : enabled |
| 1532 |
| 1533 If NSC has not been found, you will see: |
| 1534 |
| 1535 .. sourcecode:: bash |
| 1536 |
| 1537 Network Simulation Cradle : not enabled (NSC not found (see option --with-
nsc)) |
| 1538 |
| 1539 In this case, you must pass the relative or absolute path to the NSC libraries |
| 1540 with the "--with-nsc" configure option; e.g. |
| 1541 |
| 1542 .. sourcecode:: bash |
| 1543 |
| 1544 $ ./waf configure --with-nsc=/path/to/my/nsc/directory |
| 1545 |
| 1546 For |ns3| releases prior to the ns-3.17 release, using the ``build.py``· |
| 1547 script in ns-3-allinone directory, NSC will be built by default unless the· |
| 1548 platform does not support it. To explicitly disable it when building |ns3|,· |
| 1549 type: |
| 1550 |
| 1551 .. sourcecode:: bash |
| 1552 |
| 1553 $ ./waf configure --enable-examples --enable-tests --disable-nsc |
| 1554 |
| 1555 If waf detects NSC, then building |ns3| with NSC is performed the same way |
| 1556 with waf as without it. Once |ns3| is built, try running the following· |
| 1557 test suite: |
| 1558 |
| 1559 .. sourcecode:: bash |
| 1560 |
| 1561 $ ./test.py -s ns3-tcp-interoperability |
| 1562 |
| 1563 If NSC has been successfully built, the following test should show up· |
| 1564 in the results: |
| 1565 |
| 1566 .. sourcecode:: text |
| 1567 |
| 1568 PASS TestSuite ns3-tcp-interoperability |
| 1569 |
| 1570 This confirms that NSC is ready to use. |
| 1571 |
| 1572 Usage |
| 1573 +++++ |
| 1574 |
| 1575 There are a few example files. Try: |
| 1576 |
| 1577 .. sourcecode:: bash |
| 1578 |
| 1579 $ ./waf --run tcp-nsc-zoo |
| 1580 $ ./waf --run tcp-nsc-lfn |
| 1581 |
| 1582 These examples will deposit some ``.pcap`` files in your directory, |
| 1583 which can be examined by tcpdump or wireshark. |
| 1584 |
| 1585 Let's look at the ``examples/tcp/tcp-nsc-zoo.cc`` file for some typical |
| 1586 usage. How does it differ from using native |ns3| TCP? There is one main |
| 1587 configuration line, when using NSC and the |ns3| helper API, that needs to be |
| 1588 set:: |
| 1589 |
| 1590 InternetStackHelper internetStack; |
| 1591 |
| 1592 internetStack.SetNscStack ("liblinux2.6.26.so"); |
| 1593 // this switches nodes 0 and 1 to NSCs Linux 2.6.26 stack. |
| 1594 internetStack.Install (n.Get(0)); |
| 1595 internetStack.Install (n.Get(1)); |
| 1596 |
| 1597 |
| 1598 The key line is the ``SetNscStack``. This tells the InternetStack |
| 1599 helper to aggregate instances of NSC TCP instead of native |ns3| TCP |
| 1600 to the remaining nodes. It is important that this function be called |
| 1601 **before** calling the ``Install()`` function, as shown above. |
| 1602 |
| 1603 Which stacks are available to use? Presently, the focus has been on |
| 1604 Linux 2.6.18 and Linux 2.6.26 stacks for |ns3|. To see which stacks |
| 1605 were built, one can execute the following find command at the |ns3| top level |
| 1606 directory: |
| 1607 |
| 1608 .. sourcecode:: bash |
| 1609 |
| 1610 $ find nsc -name "*.so" -type f· |
| 1611 nsc/linux-2.6.18/liblinux2.6.18.so |
| 1612 nsc/linux-2.6.26/liblinux2.6.26.so |
| 1613 |
| 1614 This tells us that we may either pass the library name liblinux2.6.18.so or |
| 1615 liblinux2.6.26.so to the above configuration step. |
| 1616 |
| 1617 Stack configuration |
| 1618 +++++++++++++++++++ |
| 1619 |
| 1620 NSC TCP shares the same configuration attributes that are common across TCP |
| 1621 sockets, as described above and documented in `Doxygen |
| 1622 <http://www.nsnam.org/doxygen/classns3_1_1_tcp_socket.html>`_ |
| 1623 |
| 1624 Additionally, NSC TCP exports a lot of configuration variables into the· |
| 1625 |ns3| attributes system, via a `sysctl <http://en.wikipedia.org/wiki/Sysctl>`_-l
ike interface. In the ``examples/tcp/tcp-nsc-zoo`` example, you |
| 1626 can see the following configuration:: |
| 1627 |
| 1628 |
| 1629 // this disables TCP SACK, wscale and timestamps on node 1 (the attributes· |
| 1630 represent sysctl-values). |
| 1631 Config::Set ("/NodeList/1/$ns3::Ns3NscStack<linux2.6.26>/net.ipv4.tcp_sack",· |
| 1632 StringValue ("0")); |
| 1633 Config::Set ("/NodeList/1/$ns3::Ns3NscStack<linux2.6.26>/net.ipv4.tcp_timestam
ps",· |
| 1634 StringValue ("0")); |
| 1635 Config::Set ("/NodeList/1/$ns3::Ns3NscStack<linux2.6.26>/net.ipv4.tcp_window_s
caling",· |
| 1636 StringValue ("0")); |
| 1637 |
| 1638 These additional configuration variables are not available to native |ns3| TCP. |
| 1639 |
| 1640 Also note that default values for TCP attributes in |ns3| TCP may differ from th
e nsc TCP implementation. Specifically in |ns3|: |
| 1641 |
| 1642 1) TCP default MSS is 536 |
| 1643 2) TCP Delayed Ack count is 2· |
| 1644 ················ |
| 1645 Therefore when making comparisons between results obtained using nsc and |ns3| T
CP, care must be taken to ensure these values are set appropriately. See /examp
les/tcp/tcp-nsc-comparision.cc for an example. |
| 1646 |
| 1647 NSC API |
| 1648 +++++++ |
| 1649 |
| 1650 This subsection describes the API that NSC presents to |ns3| or any other |
| 1651 simulator. NSC provides its API in the form of a number of classes that are |
| 1652 defined in ``sim/sim_interface.h`` in the nsc directory. |
| 1653 |
| 1654 * **INetStack** INetStack contains the 'low level' operations for the operating |
| 1655 system network stack, e.g. in and output functions from and to the network |
| 1656 stack (think of this as the 'network driver interface'. There are also |
| 1657 functions to create new TCP or UDP sockets. |
| 1658 * **ISendCallback** This is called by NSC when a packet should be sent out to |
| 1659 the network. This simulator should use this callback to re-inject the packet |
| 1660 into the simulator so the actual data can be delivered/routed to its |
| 1661 destination, where it will eventually be handed into Receive() (and eventually |
| 1662 back to the receivers NSC instance via INetStack->if_receive() ). |
| 1663 * **INetStreamSocket** This is the structure defining a particular connection |
| 1664 endpoint (file descriptor). It contains methods to operate on this endpoint, |
| 1665 e.g. connect, disconnect, accept, listen, send_data/read_data, ... |
| 1666 * **IInterruptCallback** This contains the wakeup callback, which is called by |
| 1667 NSC whenever something of interest happens. Think of wakeup() as a replacement |
| 1668 of the operating systems wakeup function: Whenever the operating system would |
| 1669 wake up a process that has been waiting for an operation to complete (for |
| 1670 example the TCP handshake during connect()), NSC invokes the wakeup() callback |
| 1671 to allow the simulator to check for state changes in its connection endpoints.········ |
| 1672 |
| 1673 ns-3 implementation |
| 1674 +++++++++++++++++++ |
| 1675 |
| 1676 The |ns3| implementation makes use of the above NSC API, and is implemented as |
| 1677 follows. |
| 1678 |
| 1679 The three main parts are: |
| 1680 |
| 1681 * :cpp:class:`ns3::NscTcpL4Protocol`: a subclass of Ipv4L4Protocol (and two nsc |
| 1682 classes: ISendCallback and IInterruptCallback) |
| 1683 * :cpp:class:`ns3::NscTcpSocketImpl`: a subclass of TcpSocket· |
| 1684 * :cpp:class:`ns3::NscTcpSocketFactoryImpl`: a factory to create new NSC |
| 1685 sockets |
| 1686 |
| 1687 ``src/internet/model/nsc-tcp-l4-protocol`` is the main class. Upon |
| 1688 Initialization, it loads an nsc network stack to use (via dlopen()). Each |
| 1689 instance of this class may use a different stack. The stack (=shared library) to |
| 1690 use is set using the SetNscLibrary() method (at this time its called indirectly |
| 1691 via the internet stack helper). The nsc stack is then set up accordingly (timers |
| 1692 etc). The NscTcpL4Protocol::Receive() function hands the packet it receives |
| 1693 (must be a complete tcp/ip packet) to the nsc stack for further processing. To |
| 1694 be able to send packets, this class implements the nsc send_callback method. |
| 1695 This method is called by nsc whenever the nsc stack wishes to send a packet out |
| 1696 to the network. Its arguments are a raw buffer, containing a complete TCP/IP |
| 1697 packet, and a length value. This method therefore has to convert the raw data to |
| 1698 a Ptr<Packet> usable by |ns3|. In order to avoid various ipv4 header issues, |
| 1699 the nsc ip header is not included. Instead, the tcp header and the actual |
| 1700 payload are put into the Ptr<Packet>, after this the Packet is passed down to |
| 1701 layer 3 for sending the packet out (no further special treatment is needed in |
| 1702 the send code path). |
| 1703 |
| 1704 This class calls ``ns3::NscTcpSocketImpl`` both from the nsc wakeup() callback |
| 1705 and from the Receive path (to ensure that possibly queued data is scheduled for |
| 1706 sending). |
| 1707 |
| 1708 ``src/internet/model/nsc-tcp-socket-impl`` implements the nsc socket interface. |
| 1709 Each instance has its own nscTcpSocket. Data that is Send() will be handed to |
| 1710 the nsc stack via m_nscTcpSocket->send_data(). (and not to nsc-tcp-l4, this is |
| 1711 the major difference compared to |ns3| TCP). The class also queues up data that |
| 1712 is Send() before the underlying descriptor has entered an ESTABLISHED state. |
| 1713 This class is called from the nsc-tcp-l4 class, when the nsc-tcp-l4 wakeup() |
| 1714 callback is invoked by nsc. nsc-tcp-socket-impl then checks the current |
| 1715 connection state (SYN_SENT, ESTABLISHED, LISTEN...) and schedules appropriate |
| 1716 callbacks as needed, e.g. a LISTEN socket will schedule Accept to see if a new |
| 1717 connection must be accepted, an ESTABLISHED socket schedules any pending data |
| 1718 for writing, schedule a read callback, etc. |
| 1719 |
| 1720 Note that ``ns3::NscTcpSocketImpl`` does not interact with nsc-tcp directly: |
| 1721 instead, data is redirected to nsc. nsc-tcp calls the nsc-tcp-sockets of a node |
| 1722 when its wakeup callback is invoked by nsc.· |
| 1723 |
| 1724 Limitations |
| 1725 +++++++++++ |
| 1726 |
| 1727 * NSC only works on single-interface nodes; attempting to run it on a |
| 1728 multi-interface node will cause a program error.·· |
| 1729 * Cygwin and OS X PPC are not supported; OS X Intel is not supported but may wor
k |
| 1730 * The non-Linux stacks of NSC are not supported in |ns3| |
| 1731 * Not all socket API callbacks are supported |
| 1732 |
| 1733 For more information, see `this wiki page <http://www.nsnam.org/wiki/Network_Sim
ulation_Cradle_Integration>`_. |
LEFT | RIGHT |