The Complete Guide to TCP & UDP
Navigation

Introduction

Welcome to the complete guide to the Transmission Control Protocol (TCP) and User Datagram Protocol (UDP). These two protocols form the bedrock of communication over the internet, operating at the Transport Layer (Layer 4) of the TCP/IP model. While both are used to send data between applications, they offer fundamentally different services.

  • TCP is the reliable, connection-oriented protocol. It's like sending a registered letter with tracking: you get confirmation of delivery, and the pages arrive in the correct order.
  • UDP is the simple, connectionless protocol. It's like sending a postcard: it's fast and has low overhead, but there's no guarantee it will arrive, or that multiple postcards will arrive in the order they were sent.

This guide will explore their mechanics, differences, and use cases in depth, answering a wide array of technical questions for students, developers, and network engineers.

TCP vs. UDP: Header Comparison

The differences between TCP and UDP are most apparent in their headers. A TCP header is a minimum of 20 bytes, packed with fields for managing reliability, flow, and congestion. A UDP header is a mere 8 bytes, reflecting its simplicity.

UDP Header (8 bytes)

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|          Source Port          |       Destination Port        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|            Length             |            Checksum           |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

TCP Header (20+ bytes)

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|          Source Port          |       Destination Port        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                        Sequence Number                        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                    Acknowledgment Number                      |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|  Data |           |U|A|P|R|S|F|                               |
| Offset| Reserved  |R|C|S|S|Y|I|            Window             |
|       |           |G|K|H|T|N|N|                               |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|            Checksum           |         Urgent Pointer        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                    Options                    |    Padding    |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Key Fields in TCP but not in UDP (and why):

  • Sequence Number (32 bits): Tracks the byte number of the first byte in the segment. Essential for reordering out-of-order packets and detecting duplicates. UDP doesn't guarantee order, so it's not needed.
  • Acknowledgment Number (32 bits): Contains the value of the next sequence number the sender of the ACK is expecting to receive. This is the core mechanism for confirming receipt of data. UDP is connectionless and doesn't acknowledge data.
  • Control Flags (e.g., SYN, ACK, FIN): Used to manage the state of the connection (setup, teardown, reset). UDP has no connection state to manage.
  • Window Size (16 bits): Used for flow control. It tells the sender how much data (in bytes) the receiver is currently willing to accept. UDP has no flow control.
  • Urgent Pointer (16 bits): Used with the URG flag to indicate urgent data that should be processed out of band. UDP has no concept of urgent data.
  • Options (Variable): Allows for extensions to TCP, such as Maximum Segment Size (MSS), Selective Acknowledgments (SACK), and Window Scaling. UDP's simplicity precludes such options.

Encapsulation vs. Decapsulation

Encapsulation and decapsulation are fundamental processes in layered networking models like TCP/IP.

  • Encapsulation (Sending): As data moves down the protocol stack (from Application to Physical layer), each layer adds its own header (and sometimes a trailer). This is like putting a letter (data) into an envelope (TCP/UDP header), then putting that envelope into a larger package (IP header), and finally putting a shipping label on it (Ethernet header).
  • Decapsulation (Receiving): As data moves up the stack on the receiving end, each layer strips off its corresponding header, processes it, and passes the remaining payload up to the next layer. The process is reversed until the original application data is delivered.
     Sender (Encapsulation)              Receiver (Decapsulation)
+-------------------+                    +-------------------+
| Application Data  |                    | Application Data  |
+-------------------+                    +-------------------+
        |                                        ^
        v                                        |
+-----+-------------+  -- TCP/UDP Header --  +-----+-------------+
| TCP |    Data     |                      | TCP |    Data     |
+-----+-------------+                      +-----+-------------+
        |                                        ^
        v                                        |
+----+-----+--------+  ---- IP Header -----  +----+-----+--------+
| IP | TCP |  Data  |                      | IP | TCP |  Data  |
+----+-----+--------+                      +----+-----+--------+
        |                                        ^
        v                                        |
+-----+----+-----+---+  -- Frame Header --  +-----+----+-----+---+
| Eth | IP | TCP | D |                      | Eth | IP | TCP | D |
+-----+----+-----+---+                      +-----+----+-----+---+

Deep Dive into TCP: The Reliable Protocol

TCP Flags Explained

TCP flags (or control bits) are 1-bit fields in the TCP header used to control the connection state and handle data.

  • SYN (Synchronize): Used to initiate a connection. Sent in the first packet of the three-way handshake to synchronize sequence numbers.
  • ACK (Acknowledgment): Indicates that the Acknowledgment Number field is significant. It's used to acknowledge received data. Once a connection is established, virtually all packets have this flag set.
  • FIN (Finish): Used to gracefully terminate a connection. It indicates that the sender has no more data to send.
  • RST (Reset): Abruptly terminates a connection. It's sent in response to invalid segments or to refuse a connection attempt. Unlike FIN, it immediately closes the connection and discards any data in the buffers.
  • PSH (Push): Tells the receiving TCP stack to immediately "push" the data up to the receiving application instead of waiting for its buffer to fill. Often used in interactive applications like SSH.
  • URG (Urgent): Indicates that the Urgent Pointer field is significant. This mechanism is rarely used in modern applications but was designed to allow the sender to notify the receiver of urgent data.

Real-World Use: The Three-Way Handshake

Client                Server
  | -- SYN (seq=x) -->   |  (Request to connect)
  |                      |
  | <-- SYN, ACK (seq=y, ack=x+1) -- |  (Acknowledge request, also request to connect)
  |                      |
  | -- ACK (seq=x+1, ack=y+1) --> |  (Acknowledge server's request, connection established)
  |                      |

Understanding the Three-Way Handshake Process

The process involves two computers, a Client (the one initiating the connection, like your web browser) and a Server (the one being connected to, like google.com).

The handshake uses special messages called TCP segments. For the handshake, three special "flags" in these segments are key:

  • SYN (Synchronize): "I want to start a connection."
  • ACK (Acknowledge): "I have received your message."
  • SEQ (Sequence Number): A unique number to keep track of the data being sent.

Here is the step-by-step process:

Step 1: Client → Server (SYN)

The client wants to start a conversation. It sends a TCP segment with the SYN flag set.

Client says: "Hello, Server! I'd like to establish a connection. My starting sequence number is x."

In technical terms: The client sends a TCP segment where SYN = 1 and it includes an Initial Sequence Number (ISN), let's call it SEQ = x. This number is chosen randomly for security.

Step 2: Server → Client (SYN-ACK)

The server receives the client's request. If it's able and willing to connect, it responds. This response does two things: it acknowledges the client's request AND it proposes its own connection parameters.

Server says: "Hello, Client! I got your message and I'm ready to connect. I acknowledge your sequence number x (by sending x+1). My own starting sequence number is y."

In technical terms: The server sends a TCP segment where SYN = 1 and ACK = 1.

  • It sets its own Initial Sequence Number: SEQ = y.
  • It sets the Acknowledgment Number to the client's sequence number plus one: ACK = x + 1. This confirms it received the first packet correctly.
Step 3: Client → Server (ACK)

The client receives the server's response. It now knows the server is ready. The final step is for the client to acknowledge the server's message.

Client says: "Got it! I acknowledge your sequence number y (by sending y+1). Let's start sending data!"

In technical terms: The client sends a TCP segment where ACK = 1.

  • It sets the Acknowledgment Number to the server's sequence number plus one: ACK = y + 1.
  • The sequence number for this packet is now x+1.
The Connection is Established!

At this point, both the client and server have received an acknowledgment of their connection request. The three-way handshake is complete, and a full-duplex (two-way) reliable connection is established. Now, they can begin sending and receiving actual application data (like the HTML for a webpage).

Ensuring Reliability: In-Order Delivery

TCP's reliability is built on three core mechanisms: Sequence Numbers, Acknowledgments, and Timers.

  • Sequence Numbers (SEQ): TCP views data as a stream of bytes. Each byte is numbered with a sequence number. The SEQ field in a TCP segment contains the number of the first byte of data in that segment.
  • Acknowledgments (ACK): When a receiver gets a segment, it sends back an ACK. The ACK number is *cumulative*; it specifies the sequence number of the *next* byte it expects to receive. For example, if it receives bytes 1-1000, it sends an ACK of 1001.

Handling Packet Issues:

  • Out-of-Order Packets: If the receiver gets segment 2001-3000 before 1001-2000, it will buffer the later segment. It will continue sending ACKs for 1001 until the missing segment arrives. Once segment 1001-2000 arrives, it can process both segments and will send an ACK for 3001.
  • Lost Packets: The sender starts a retransmission timer for every segment it sends. If it doesn't receive an ACK for that segment before the timer expires, it assumes the packet was lost and retransmits it.
  • Duplicate Packets: The receiver uses the sequence number to identify duplicates. If it receives a segment with a sequence number it has already processed and acknowledged, it simply discards the duplicate segment.

Flow Control: The Receive Buffer & Window Size

Flow control prevents a fast sender from overwhelming a slow receiver. This is managed by the receiver, not the network.

  • Receive Buffer: The receiver has a finite buffer to store incoming data before the application reads it.
  • Window Size (rwnd): The receiver advertises its available buffer space to the sender via the "Window" field in the TCP header. This is the amount of data the sender is allowed to send before it must wait for an ACK that updates the window.

What happens if the TCP receive buffer is full?

  1. The receiver's OS sees that the buffer is full (the application isn't reading data fast enough).
  2. In the next ACK it sends to the sender, it will set the Window Size to 0. This is called a "zero-window advertisement."
  3. Impact on Sender: The sender receives the zero-window ACK and must stop sending data. However, it doesn't shut down the connection. It starts a "persist timer."
  4. Impact on Receiver: The receiver processes data. Once the application reads some data and space becomes available in the buffer, the receiver will send a new ACK with a non-zero window size (a "window update"), telling the sender it's okay to resume.
  5. Persist Timer: If the window update packet gets lost, the connection would be deadlocked. To prevent this, the sender's persist timer periodically expires, prompting it to send a small "window probe" packet (typically 1 byte). This probe forces the receiver to re-send its current window size, breaking the deadlock.

Flow Control vs. Congestion Control: Flow control is about the receiver's capacity. Congestion control is about the network's capacity. They are distinct but related mechanisms.

Congestion Control: Slow Start, Avoidance & Recovery

Congestion control prevents a single TCP connection from overwhelming the network. TCP uses a Congestion Window (cwnd), a state variable held by the sender, which limits the amount of unacknowledged data it can have in flight. The effective window is `min(cwnd, rwnd)`.

TCP transitions between several phases:

  1. Slow Start:
    • Trigger: A new connection begins.
    • Behavior: cwnd starts at a small value (e.g., 1-10 MSS). For every ACK received, cwnd is increased by 1 MSS. This results in an exponential increase of the sending rate (cwnd doubles approximately every Round-Trip Time (RTT)).
    • Goal: To quickly probe for available network bandwidth.
  2. Congestion Avoidance:
    • Trigger: cwnd reaches the Slow Start Threshold (ssthresh).
    • Behavior: The growth of cwnd becomes linear instead of exponential. For each full window of data acknowledged, cwnd increases by just 1 MSS. This is a more cautious increase to avoid causing congestion.
  3. Fast Retransmit / Fast Recovery:
    • Trigger: The sender receives three duplicate ACKs for the same sequence number. This is interpreted as a single packet loss, but that the network is otherwise okay (since subsequent packets are getting through).
    • Behavior (TCP Reno):
      1. Set ssthresh to `cwnd / 2`.
      2. Set cwnd to `ssthresh + 3 MSS`.
      3. Retransmit the missing segment immediately without waiting for a timeout.
      4. Enter the Fast Recovery phase, where each additional duplicate ACK inflates `cwnd`, allowing more packets to be sent while waiting for the ACK of the retransmitted segment. Once that new ACK arrives, TCP enters Congestion Avoidance.

Timeout vs. Triple Duplicate ACKs

The network's response to these two events is very different because they signal different levels of problems:

  • On Triple Duplicate ACK (minor congestion):
    • ssthresh = cwnd / 2
    • cwnd = ssthresh (in modern TCP like CUBIC, or `ssthresh + 3` in Reno)
    • TCP enters Fast Recovery, a less punitive state.
  • On Retransmission Timeout (severe congestion):
    • ssthresh = cwnd / 2
    • cwnd = 1 MSS (reset to the beginning)
    • TCP enters Slow Start again. This is a very aggressive back-off, assuming the network is in a bad state.
                                     Congestion Window (cwnd)
                                            |
                                 (Timeout)  |
                                ssthresh_2 . . . . . . . . . . . . . 
                                           / \
                                          /   \. (Triple Dup ACK)
                                         /     `-- Fast Recovery
                                ssthresh_1 . . . . /. . . . . . . . .
                                          /       / <- Congestion
                                         /       /      Avoidance
                                        /       /
             Exponential Growth ->     /       /
           (Slow Start)               /       /
                                     /       /
        -----------------------------+----------------------------- Time -->

Handling Duplicate ACKs

Duplicate ACKs are a crucial signal for TCP's congestion control.

Scenario: Sender sends packets 1, 2, 3, 4, 5. Packet 2 gets lost, but 3, 4, and 5 arrive at the receiver.

  1. Receiver gets packet 1, sends ACK=2.
  2. Packet 2 is lost.
  3. Receiver gets packet 3. It sees a gap. It discards packet 3 (or buffers it) and resends its last expected ACK: ACK=2. (First Duplicate ACK).
  4. Receiver gets packet 4. Again, it sees the gap and sends ACK=2. (Second Duplicate ACK).
  5. Receiver gets packet 5. Again, it sends ACK=2. (Third Duplicate ACK).

Indication: When the sender receives the third duplicate ACK (for a total of four ACKs for packet 2), it assumes that packet 2 was lost but that the network is still flowing (since subsequent packets are arriving). This is the trigger for Fast Retransmit, which is much faster than waiting for a full retransmission timeout.

TCP and Multiple Receivers

False. A standard TCP session does not allow one host to send segments to multiple receivers simultaneously.

Explanation: TCP is a strictly point-to-point protocol. A TCP connection is uniquely defined by a 4-tuple: `(source_ip, source_port, dest_ip, dest_port)`. The entire state machine of TCP—sequence numbers, acknowledgments, window sizes, and connection state—is tied to a single sender and a single receiver. Maintaining this state for multiple receivers within one "session" is not possible with standard TCP.

To send data to multiple receivers, an application must establish a separate, independent TCP connection to each one.

Deep Dive into UDP: The Fast Protocol

UDP Fragmentation at the IP Layer

When an application sends a large UDP datagram, it gets encapsulated in an IP packet. If this IP packet is larger than the Maximum Transmission Unit (MTU) of a network link it needs to cross (e.g., standard Ethernet MTU is 1500 bytes), the router at that link must fragment the IP packet.

How Fragmentation Occurs:

  1. A router receives an IP packet of 4000 bytes, but the outgoing link has an MTU of 1500 bytes.
  2. The router creates smaller IP packets (fragments) that fit within the MTU. The IP header itself is typically 20 bytes, so the data payload of each fragment must be `1500 - 20 = 1480` bytes.
  3. Fragment Offset Calculation: The `Fragment Offset` field in the IP header is used to reassemble the packet. It specifies the offset of the fragment's data, in 8-byte units, from the start of the original IP packet's data.
    • Fragment 1: Data bytes 0-1479. `Fragment Offset = 0 / 8 = 0`. `More Fragments (MF) flag = 1`.
    • Fragment 2: Data bytes 1480-2959. `Fragment Offset = 1480 / 8 = 185`. `MF flag = 1`.
    • Fragment 3: Remaining data. `Fragment Offset = 2960 / 8 = 370`. `MF flag = 0` (this is the last fragment).
  4. The destination host receives these fragments and uses the IP Identification, Flags, and Fragment Offset fields to reassemble the original IP packet before passing the complete UDP datagram up to the application.

Risks and Limitations of Fragmentation:

  • Reliability Nightmare: If even one IP fragment is lost, the entire original datagram is lost. The destination host cannot reassemble the packet and will discard all other received fragments for that datagram. UDP has no mechanism to request retransmission of just the lost fragment.
  • Performance Overhead: Fragmentation and reassembly consume CPU resources on routers and the destination host.
  • Security Issues: Fragmentation can be exploited in attacks like the "teardrop attack."
  • Path MTU Discovery (PMTUD): To avoid fragmentation, systems often use PMTUD to determine the smallest MTU along a path and size their packets accordingly. However, PMTUD can be unreliable if ICMP messages are blocked by firewalls.

Trade-offs for Real-Time Applications

UDP is often the protocol of choice for real-time applications like VoIP, online gaming, and live video streaming.

Advantages (The Trade-offs):

  • Low Latency: No connection setup (handshake), no retransmission delays, no head-of-line blocking. Data can be sent immediately.
  • Timeliness over Reliability: In real-time applications, receiving old data late is often worse than not receiving it at all. A retransmitted video frame from 2 seconds ago is useless.
  • Simplicity: Less processing overhead on the endpoints.

Disadvantages (The Risks):

  • Unreliability: Packets can be lost, duplicated, or arrive out of order, leading to glitches, artifacts, or dropouts in the audio/video stream.
  • No Flow/Congestion Control: A naive UDP application can easily cause network congestion, leading to more packet loss for itself and other users. Well-behaved real-time applications must implement their own congestion control.

Implementing Reliability on UDP

Yes, it is possible for an application using UDP to provide reliable data delivery. This is done by implementing reliability mechanisms at the application layer. This approach allows developers to pick and choose the features they need, creating a "TCP-lite" protocol tailored to their needs.

How to Implement It:

  1. Sequence Numbers: Add a sequence number to the payload of each UDP datagram. This allows the receiver to detect lost or out-of-order packets.
  2. Acknowledgments (ACKs): The receiver sends back custom ACK packets to the sender to confirm receipt. The design can be simple (ACK every packet) or more complex (selective ACKs for missing packets).
  3. Retransmission Timers: The sender maintains a timer for unacknowledged packets. If an ACK isn't received within a certain time, the sender retransmits the data.
  4. Custom Logic: The application can decide what to do on packet loss. For a file transfer, it must retransmit. For a video stream, it might choose to skip the lost frame or request a lower-quality version.

This is precisely what modern protocols like QUIC (used by HTTP/3) do. QUIC runs over UDP but provides a sophisticated, reliable, and secure transport layer with features that even surpass TCP, such as eliminating head-of-line blocking.

Practical Applications & Scenarios

Systems Using Both TCP & UDP

Many complex applications use both protocols to leverage the strengths of each.

  • DNS (Domain Name System):
    • UDP (Port 53): Used for standard queries. A single request/response packet is fast and efficient. If the UDP packet is lost, the client's resolver simply times out and retries.
    • TCP (Port 53): Used for two main cases:
      1. When the response data size exceeds 512 bytes (the traditional limit for UDP payloads).
      2. For zone transfers between DNS servers, where a large amount of data must be transferred reliably.
  • VoIP (Voice over IP) and Video Conferencing (e.g., SIP, RTP):
    • UDP (for RTP): The actual voice/video data is sent over UDP via the Real-time Transport Protocol (RTP). Timeliness is critical, and a lost packet is preferable to a delayed one.
    • TCP (for SIP): The session control and signaling (call setup, teardown, ringing) are often handled over TCP via the Session Initiation Protocol (SIP) to ensure these critical control messages are delivered reliably.
  • Online Gaming:
    • UDP: Used for real-time game state updates like player position, actions, and physics. Latency is paramount.
    • TCP: Used for account login, chat messages, and downloading game assets, where reliability is more important than speed.

Scenario: Frequent Updates to Multiple Receivers

Protocol Choice: UDP.

Reasoning:

  • Efficiency for Multiple Receivers: UDP supports multicast and broadcast. A single packet can be sent from the source and delivered by the network to multiple subscribed receivers (multicast) or all hosts on a subnet (broadcast). This is vastly more efficient than establishing and maintaining a separate TCP connection to every single receiver.
  • Low Overhead: For frequent, small updates (e.g., stock prices, game state), the overhead of TCP's 20-byte header and acknowledgment packets would be significant compared to the payload size. UDP's 8-byte header is much more efficient.
  • State Management: TCP would require the server to manage the state (buffers, timers, sequence numbers) for every single receiver, which would not scale well.

Scenario: Airplane Location Reporting

Protocol Choice: UDP.

Reasoning:

  • Idempotency and Timeliness: Each location report is a self-contained, independent piece of information. A report from 5 minutes ago is now stale; the latest report is what matters. If a UDP packet is lost, it's better to wait for the next update in 5 minutes than to delay new data by trying to retransmit the old, outdated report. TCP's reliability mechanisms would be counterproductive.
  • Simplicity and Cost: The communication might happen over expensive or low-bandwidth satellite links. UDP's minimal header and lack of ACK traffic reduce bandwidth consumption and cost. The protocol is simpler to implement on potentially resource-constrained avionics hardware.
  • No Connection State: Airplanes may move between network coverage areas. UDP's connectionless nature means it doesn't need to re-establish a session if connectivity is temporarily lost, it just sends the next datagram when it can.

Scenario: DNS Queries over UDP

DNS primarily uses UDP for its speed and efficiency. However, this introduces potential issues.

Potential Issues:

  1. Packet Loss: A DNS query or response can be lost in transit. Since UDP is unreliable, neither side would know.
  2. Response Truncation: If a DNS response is too large to fit in a single UDP packet (traditionally > 512 bytes), the server will send a truncated response with the `TC` (Truncation) bit set.
  3. IP Fragmentation: If the response is large but still sent over UDP, it may be fragmented at the IP layer. As discussed before, loss of a single fragment means the entire response is lost.
  4. Spoofing/Cache Poisoning: Because UDP is connectionless, an attacker can more easily spoof the source IP of a DNS server and send a malicious response to a client, potentially poisoning its cache.

Mitigations in Practice:

  1. Application-Level Retry: The DNS client (resolver) implements a simple timeout and retry mechanism. If no response is received after a few seconds, it re-sends the query.
  2. Fallback to TCP: If a client receives a truncated (`TC`) response, it is required to retry the same query over TCP, which can handle arbitrarily large responses reliably.
  3. EDNS (Extension Mechanisms for DNS): EDNS allows clients to specify a larger UDP buffer size they can handle, reducing the need for truncation and fallback to TCP.
  4. DNSSEC (DNS Security Extensions): Mitigates spoofing by using digital signatures to verify the authenticity and integrity of DNS responses.

Security Considerations

UDP Security: Spoofing & Amplification

UDP's connectionless nature makes it more susceptible to certain types of attacks.

  • IP Spoofing: An attacker can easily forge the source IP address in a UDP datagram because there is no handshake to validate the source. The receiver has no built-in way to verify that the packet actually came from the claimed sender.
  • Amplification Attacks (DDoS): This exploits UDP and spoofing. An attacker sends a small query to a public server (like a misconfigured DNS or NTP server) but spoofs the source IP to be the victim's IP. The server then sends a much larger response to the victim. The attacker "amplifies" their attack traffic using the server. A small request from the attacker results in a large flood of traffic hitting the victim.
    Attacker --------(small query, spoofed source=Victim)--------> Public Server
       |                                                                  |
       |                                                                  |
       +<----------------(large response)--------------------------------+ Victim
    

TCP Security Implications

While the three-way handshake makes spoofing harder, TCP is not immune to attack.

  • SYN Flood Attack (DDoS): An attacker exploits the connection setup process. They send a high volume of `SYN` packets to a server, often with spoofed source IPs. The server responds with `SYN-ACK` and allocates resources (memory for the connection state) while waiting for the final `ACK`. Since the final `ACK` never arrives (the spoofed IP is unreachable or not participating), the server's resources become exhausted keeping track of these half-open connections, and it can no longer accept legitimate requests.

Coding & Implementation

Socket Differences: TCP vs. UDP

The choice of protocol deeply impacts how sockets are used in an application.

TCP Sockets (SOCK_STREAM)

  • Type: A stream-oriented socket. It provides a continuous, reliable, bidirectional stream of data. The OS handles packetization, reordering, and retransmissions.
  • Connection: Requires a connection to be established (connect() on the client, listen() and accept() on the server) before data can be sent.
  • Data Transfer: Uses send() and recv() (or write()/read()). The calls do not necessarily correspond to single packets. You might need to call recv() multiple times to get a single message sent by the peer.
  • Analogy: A telephone call. You must dial and establish a connection before you can talk.

UDP Sockets (SOCK_DGRAM)

  • Type: A datagram-oriented socket. It works with discrete messages (datagrams).
  • Connection: Connectionless. No initial setup is required.
  • Data Transfer: Uses sendto() and recvfrom(). Each call to sendto() sends one UDP datagram. Each call to recvfrom() reads one entire datagram. The destination address must be specified with each sendto() call.
  • Analogy: Sending a letter in the mail. You write the address on each one and drop it in the mailbox.

Code Example: UDP Client & Server (Python)

This simple Python example demonstrates the core logic of a UDP server that listens for a message and a client that sends one.

UDP Server (udp_server.py)


import socket

# Server configuration
HOST = '127.0.0.1'  # Standard loopback interface address (localhost)
PORT = 65432        # Port to listen on (non-privileged ports are > 1023)

# Create a UDP socket
# AF_INET is for IPv4, SOCK_DGRAM is for UDP
with socket.socket(socket.AF_INET, socket.SOCK_DGRAM) as s:
    # Bind the socket to the address and port
    s.bind((HOST, PORT))
    print(f"Server listening on {HOST}:{PORT}")

    while True:
        # Wait for a message. recvfrom returns the data and the address of the client
        data, addr = s.recvfrom(1024)  # buffer size is 1024 bytes
        print(f"Received message: '{data.decode()}' from {addr}")

        # Echo the message back to the client, converting to uppercase
        response = data.upper()
        s.sendto(response, addr)
        print(f"Sent response: '{response.decode()}' to {addr}")

UDP Client (udp_client.py)


import socket

# Server configuration
HOST = '127.0.0.1'
PORT = 65432
MESSAGE = b'Hello, UDP Server!'

# Create a UDP socket
with socket.socket(socket.AF_INET, socket.SOCK_DGRAM) as s:
    # Send a message to the server
    s.sendto(MESSAGE, (HOST, PORT))
    print(f"Sent message: '{MESSAGE.decode()}'")

    # Wait for the response
    data, server_addr = s.recvfrom(1024)
    print(f"Received response: '{data.decode()}' from {server_addr}")

Implementing Custom Reliability on UDP

To implement reliability for a chat application on UDP, you need to add mechanisms to handle lost and out-of-order messages. Here is a conceptual approach.

The Approach:

  1. Packet Format: Define a custom header for your data payload.
    [SeqNum (4 bytes)][ACK_Num (4 bytes)][Flags (1 byte)][  Message Data  ]
    Flags could have bits for DATA, ACK, FIN, etc.
  2. Sender Logic:
    • Maintain a `next_seq_num` counter. Increment it for each new data packet sent.
    • Store sent but unacknowledged packets in a buffer (an "in-flight window").
    • For each packet in the window, start a retransmission timer.
    • When an ACK is received, remove the corresponding packet from the buffer.
    • If a timer expires, retransmit the packet from the buffer and restart its timer.
  3. Receiver Logic:
    • Maintain an `expected_seq_num`.
    • If a packet arrives with `seq_num == expected_seq_num`, process it, increment `expected_seq_num`, and send an ACK for the new `expected_seq_num`.
    • If a packet arrives with `seq_num > expected_seq_num` (out-of-order), buffer it. Send a duplicate ACK for the current `expected_seq_num` to signal a missing packet.
    • If a packet arrives with `seq_num < expected_seq_num`, it's a duplicate. Discard it but re-send the ACK for the current `expected_seq_num`.

Challenges:

  • Timer Management: Choosing a good Retransmission Timeout (RTO) value is difficult. It should be based on a dynamically calculated Round-Trip Time (RTT).
  • Congestion Control: This simple design lacks congestion control. A naive implementation could flood the network, making things worse. A robust solution needs to implement its own version of slow start and congestion avoidance.
  • Flow Control: The design also lacks flow control, potentially overwhelming the receiver.

This exercise essentially involves re-inventing many of TCP's core features. It highlights the complexity that TCP handles for us automatically.