Write a Linux packet sniffer from scratch: part one- PF_PACKET socket and promiscuous mode

Background

When we refer to network packet sniffer, some famous and popular tools come to your mind, like tcpdump. I have shown you how to capture network packets with such tools in my previous articles. But have you ever thought about writing a packet sniffer from scratch without dependencies on any third-party libraries? We need to dig deep into the operating system and find the weapons needed to build this tool. Sounds complex, right? In this article, let us do it. After reading this article, you can find that it is not as difficult as you think.

Note that different operating system kernels have different internal network implementations. This article will focus on the Linux platform.

Introduction

Firstly, we need to review how tcpdump is implemented. According to the official document, tcpdump is built on the library libpcap, which is developed based on the remarkable research result from Berkeley, in details you can refer to this paper.

As you know, different operating systems have different internal implementations of network stacks. libpcap covers all of these differences and provides the system-independent interface for user-level packet capture. I want to focus on the Linux platform, so how does libpcap work on the Linux system? According to some documents, it turns out that libpcap uses the PF_PACKET socket to capture packets on a network interface.

So the next question is: what the PF_PACKET socket is?

PF_PACKET socket

In my previous article, we mentioned that the socket interface is TCP/IP’s window on the world. In most modern systems incorporating TCP/IP, the socket interface is the only way applications can use the TCP/IP suite of protocols.

It is correct. This time, let’s dig deeper about socket by examining the system call executed when we create a new socket:

1
int socket(int domain, int type, int protocol);

When you want to create a socket with the above system call, you have to specify which domain (or protocol family) you want to use with that socket as the first argument. The most commonly used family is PF_INET, which is for communications based on IPv4 protocols (when you create a TCP server, you use this family). Moreover, you have to specify a type for your socket as the second argument. And the possible values depend on the family you specified. For example, when dealing with the PF_INET family, the values for type include SOCK_STREAM(for TCP) and SOCK_DGRAM(for UDP). For other detailed information about the socket system call, you can refer to the socket(3) man page.

You can find one potential value for the domain argument as follows:

1
AF_PACKET    Low-level packet interface

Note: AF_PACKET and PF_PACKET are same. It is called PF_PACKET in history and then renamed AF_PACKET later. PF means protocol families, and AF means address families. In this article, I use PF_PACKET.

Different from PF_INET socket, which can give you TCP segment. By PF_PACKET socket, you can get the raw Ethernet frame which bypasses the usual upper layer handling of TCP/IP stack. It might sound a little bit crazy. But, that is, any packet received will be directly passed to the application.

For a better understanding of PF_PACKET socket, let us go deeper and roughly examine the path of a received packet from the network interface to the application level.

(As shown in the image above) When the network interface card(NIC) receives a packet, it is handled by the driver. The driver maintains a structure called ring buffer internally. And write the packet to kernel memory (the memory is pre-allocated with ring buffer) with direct memory access(DMA). The packet is placed inside a structure called sk_buff(one of the most important structures related to kernel network subsystem).

After entering the kernel space, the packet goes through protocol stack handling layer by layer, such as IP processing and TCP/UDP processing. And the packet goes into applications via the socket interface. You already understand this familiar path very well.

But for the PF_PACKET socket, the packet in sk_buff is cloned, then it skips the protocol stacks and directly goes to the application. The kernel needs the clone operation, because one copy is consumed by the PF_PACKET socket, and the other one goes through the usual protocol stacks.

In future articles, I’ll demonstrate more about Linux kernel network internals.

Next step, let us see how to create a PF_PACKET socket at the code level. For brevity, I omit some code and only show the essential part. You can refer to this Github repo in detail.

1
2
3
4
if ((sock = socket(PF_PACKET, SOCK_RAW, htons(ETH_P_IP))) < 0) {
perror("socket");
exit(1);
}

Please ensure to include the system header files: <sys/socket.h> <sys/types.h>.

Bind to one network interface

Without the additional settings, the sniffer captures all the packets received on all the network devices. Next step, let us try to bind the sniffer to a specific network device.

Firstly, you can use ifconfig command to list all the available network interfaces on your machines. The network interface is a software interface to the networking hardware.

For example, the following image shows information of network interface eth0:

1
2
3
4
5
6
7
8
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
inet 192.168.230.49 netmask 255.255.240.0 broadcast 192.168.239.255
inet6 fe80::215:5dff:fefb:e31f prefixlen 64 scopeid 0x20<link>
ether 00:15:5d:fb:e3:1f txqueuelen 1000 (Ethernet)
RX packets 260 bytes 87732 (87.7 KB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 178 bytes 29393 (29.3 KB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

Let’s bind the sniffer to eth0 as follows:

1
2
3
4
5
6
7
8
// bind to eth0 interface only
const char *opt;
opt = "eth0";
if (setsockopt(sock, SOL_SOCKET, SO_BINDTODEVICE, opt, strlen(opt) + 1) < 0) {
perror("setsockopt bind device");
close(sock);
exit(1);
}

We do it by calling the setsockopt system call. I leave the detailed usage of it to you.

Now the sniffer only captures network packets received on the specified network card.

Non-promiscuous and promiscuous mode

By default, each network card minds its own business and reads only the frames directed to it. It means that the network card discards all the packets that do not contain its own MAC address, which is called non-promiscuous mode.

Next, let us make the sniffer can work in promiscuous mode. In this way, it retrieves all the data packets. Even the ones that are not addressed to its host.

To set a network interface to promiscuous mode, all we have to do is issue the ioctl() system call to an open socket on that interface.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
/* set the network card in promiscuos mode*/
// An ioctl() request has encoded in it whether the argument is an in parameter or out parameter
// SIOCGIFFLAGS 0x8913 /* get flags */
// SIOCSIFFLAGS 0x8914 /* set flags */
struct ifreq ethreq;
strncpy(ethreq.ifr_name, "eth0", IF_NAMESIZE);
if (ioctl(sock, SIOCGIFFLAGS, &ethreq) == -1) {
perror("ioctl");
close(sock);
exit(1);
}
ethreq.ifr_flags |= IFF_PROMISC;
if (ioctl(sock, SIOCSIFFLAGS, &ethreq) == -1) {
perror("ioctl");
close(sock);
exit(1);
}

ioctl stands for I/O control, which manipulates the underlying device parameters of specific files. ioctl takes three arguments:

  • The first argument must be an open file descriptor. We use the socket file descriptor bound to the network interface in our case.
  • The second argument is a device-dependent request code. You can see we called ioctl twice. The first call uses request code SIOCGIFFLAGS to get flags, and the second call uses request code SIOCSIFFLAGS to set flags. Do not be fooled by these two constant values, which are spelled alike.
  • The third argument is for returning information to the requesting process.

Now the sniffer can retrieve all the data packets received on the network card, no matter to which host the packets are addressed.

Summary

This article examined what PF_PACKET socket is, how it works and why the application can get raw Ethernet packets. Furthermore, we discussed how to bind the sniffer to one specific network interface and how can make the sniffer work in the promiscuous mode. The next article will examine how to implement the packet filter functionality, which is very useful to a network sniffer.

How HTTP1.1 protocol is implemented in Golang net/http package: part two - write HTTP message to socket

Background

In the previous article, I introduced the main workflow of an HTTP request implemented inside Golang net/http package. As the second article of this series, I’ll focus on how to pass the HTTP message to TCP/IP stack, and then it can be transported over the network.

Architecture diagram

When the client application sends an HTTP request, it determines what is next step based on whether there is an available persistent connection in the cached connection pool. If no, then a new TCP connection will be established. If yes, then a persistent connection will be selected.

The details of the connection pool is not in this article’s scope. I’ll discuss it in the next article. For now you can regard it as a block box.

The overall diagram of this article goes as follows, we can review each piece of it in the below sections

persistConn

The key structure in this part is persistConn:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
type persistConn struct {
alt RoundTripper
t *Transport
cacheKey connectMethodKey
conn net.Conn // underlying TCP connection
tlsState *tls.ConnectionState
br *bufio.Reader
bw *bufio.Writer // buffer io for writing data
nwrite int64
reqch chan requestAndChan
writech chan writeRequest // channel for writing request
closech chan struct{}
isProxy bool
sawEOF bool
readLimit int64
writeErrCh chan error
writeLoopDone chan struct{}
idleAt time.Time
idleTimer *time.Timer
mu sync.Mutex
numExpectedResponses int
closed error
canceledErr error
broken bool
reused bool
mutateHeaderFunc func(Header)
}

There are many fields defined in persistConn, but we can focus on these three:

  • conn: type of net.Conn which defines TCP connection in Golang;
  • bw: type of *bufio.Writer which implements buffer io functionality;
  • writech: type of channel which is used to communicate and sync data among different Goroutines in Golang.

In next sections, let’s investigate how persistConn is used to write HTTP message to socket.

New connection

First, let’s see how to establish a new TCP connection and bind it to persistConn structure. The job is done inside dialConn method of Transport

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
// dialConn in transport.go file
func (t *Transport) dialConn(ctx context.Context, cm connectMethod) (pconn *persistConn, err error) {
// construct a new persistConn
pconn = &persistConn{
t: t,
cacheKey: cm.key(),
reqch: make(chan requestAndChan, 1),
writech: make(chan writeRequest, 1),
closech: make(chan struct{}),
writeErrCh: make(chan error, 1),
writeLoopDone: make(chan struct{}),
}
trace := httptrace.ContextClientTrace(ctx)
wrapErr := func(err error) error {
if cm.proxyURL != nil {
return &net.OpError{Op: "proxyconnect", Net: "tcp", Err: err}
}
return err
}
if cm.scheme() == "https" && t.hasCustomTLSDialer() {
var err error
// dial secure TCP connection, assign to field pconn.conn
pconn.conn, err = t.customDialTLS(ctx, "tcp", cm.addr())
if err != nil {
return nil, wrapErr(err)
}
if tc, ok := pconn.conn.(*tls.Conn); ok {
if trace != nil && trace.TLSHandshakeStart != nil {
trace.TLSHandshakeStart()
}
if err := tc.Handshake(); err != nil {
go pconn.conn.Close()
if trace != nil && trace.TLSHandshakeDone != nil {
trace.TLSHandshakeDone(tls.ConnectionState{}, err)
}
return nil, err
}
cs := tc.ConnectionState()
if trace != nil && trace.TLSHandshakeDone != nil {
trace.TLSHandshakeDone(cs, nil)
}
pconn.tlsState = &cs
}
} else {
// dial TCP connection
conn, err := t.dial(ctx, "tcp", cm.addr())
if err != nil {
return nil, wrapErr(err)
}
// assign to pconn.conn
pconn.conn = conn
if cm.scheme() == "https" {
var firstTLSHost string
if firstTLSHost, _, err = net.SplitHostPort(cm.addr()); err != nil {
return nil, wrapErr(err)
}
if err = pconn.addTLS(firstTLSHost, trace); err != nil {
return nil, wrapErr(err)
}
}
}
switch {
case cm.proxyURL == nil:
case cm.proxyURL.Scheme == "socks5":
conn := pconn.conn
d := socksNewDialer("tcp", conn.RemoteAddr().String())
if u := cm.proxyURL.User; u != nil {
auth := &socksUsernamePassword{
Username: u.Username(),
}
auth.Password, _ = u.Password()
d.AuthMethods = []socksAuthMethod{
socksAuthMethodNotRequired,
socksAuthMethodUsernamePassword,
}
d.Authenticate = auth.Authenticate
}
if _, err := d.DialWithConn(ctx, conn, "tcp", cm.targetAddr); err != nil {
conn.Close()
return nil, err
}
case cm.targetScheme == "http":
pconn.isProxy = true
if pa := cm.proxyAuth(); pa != "" {
pconn.mutateHeaderFunc = func(h Header) {
h.Set("Proxy-Authorization", pa)
}
}
case cm.targetScheme == "https":
conn := pconn.conn
hdr := t.ProxyConnectHeader
if hdr == nil {
hdr = make(Header)
}
if pa := cm.proxyAuth(); pa != "" {
hdr = hdr.Clone()
hdr.Set("Proxy-Authorization", pa)
}
connectReq := &Request{
Method: "CONNECT",
URL: &url.URL{Opaque: cm.targetAddr},
Host: cm.targetAddr,
Header: hdr,
}

connectCtx := ctx
if ctx.Done() == nil {
newCtx, cancel := context.WithTimeout(ctx, 1*time.Minute)
defer cancel()
connectCtx = newCtx
}

didReadResponse := make(chan struct{})
var (
resp *Response
err error
)

go func() {
defer close(didReadResponse)
err = connectReq.Write(conn)
if err != nil {
return
}
br := bufio.NewReader(conn)
resp, err = ReadResponse(br, connectReq)
}()
select {
case <-connectCtx.Done():
conn.Close()
<-didReadResponse
return nil, connectCtx.Err()
case <-didReadResponse:

}
if err != nil {
conn.Close()
return nil, err
}
if resp.StatusCode != 200 {
f := strings.SplitN(resp.Status, " ", 2)
conn.Close()
if len(f) < 2 {
return nil, errors.New("unknown status code")
}
return nil, errors.New(f[1])
}
}

if cm.proxyURL != nil && cm.targetScheme == "https" {
if err := pconn.addTLS(cm.tlsHost(), trace); err != nil {
return nil, err
}
}

if s := pconn.tlsState; s != nil && s.NegotiatedProtocolIsMutual && s.NegotiatedProtocol != "" {
if next, ok := t.TLSNextProto[s.NegotiatedProtocol]; ok {
alt := next(cm.targetAddr, pconn.conn.(*tls.Conn))
if e, ok := alt.(http2erringRoundTripper); ok {
return nil, e.err
}
return &persistConn{t: t, cacheKey: pconn.cacheKey, alt: alt}, nil
}
}
pconn.br = bufio.NewReaderSize(pconn, t.readBufferSize())
// buffer io wrapper for writing request
pconn.bw = bufio.NewWriterSize(persistConnWriter{pconn}, t.writeBufferSize())
// read loop
go pconn.readLoop()
// write loop
go pconn.writeLoop()
return pconn, nil
}

At line 4, it creates a new persistConn object, which is also the return value for this method.

At line 22 and line 46, it calls dial method to establish a new TCP connection (note line 22 handles TLS case). In Golang a TCP connection is represented as net.Conn type. And then the underlying TCP connection is bound to the conn field of persistConn.

Now that we have the TCP connection, how can we use it? We’ll skip the many lines of code and go to the end to this function.

At line 166, it creates bufio.Writer based on persistConn. Buffer IO is an interesting topic, in detail you can refer to my previous article. In one word, it can optimize the performance by reducing the number of system calls. For example in the current case, it can avoid too many socket system calls.

At line 171, it creates a Goroutine and execute writeLoop method. Let’s take a look at it.

writeLoop

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
// writeLoop method in transport.go file
func (pc *persistConn) writeLoop() {
defer close(pc.writeLoopDone)
for {
select {
// receive request from writech channel
case wr := <-pc.writech:
startBytesWritten := pc.nwrite
// call write method
err := wr.req.Request.write(pc.bw, pc.isProxy, wr.req.extra, pc.waitForContinue(wr.continueCh))
if bre, ok := err.(requestBodyReadError); ok {
err = bre.error
wr.req.setError(err)
}
if err == nil {
err = pc.bw.Flush()
}
if err != nil {
wr.req.Request.closeBody()
if pc.nwrite == startBytesWritten {
err = nothingWrittenError{err}
}
}
pc.writeErrCh <- err // to the body reader, which might recycle us
wr.ch <- err // to the roundTrip function
if err != nil {
pc.close(err)
return
}
case <-pc.closech:
return
}
}
}

As the function name writeLoop implies, there is a for loop, and it keeps receiving data from the writech channel. Everytime it receive a request from the channel, call the write method at line 10. Then let’s review what message it actually writes:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
// write method in request.go
func (r *Request) write(w io.Writer, usingProxy bool, extraHeaders Header, waitForContinue func() bool) (err error) {
trace := httptrace.ContextClientTrace(r.Context())
if trace != nil && trace.WroteRequest != nil {
defer func() {
trace.WroteRequest(httptrace.WroteRequestInfo{
Err: err,
})
}()
}
host := cleanHost(r.Host)
if host == "" {
if r.URL == nil {
return errMissingHost
}
host = cleanHost(r.URL.Host)
}
host = removeZone(host)
ruri := r.URL.RequestURI()
if usingProxy && r.URL.Scheme != "" && r.URL.Opaque == "" {
ruri = r.URL.Scheme + "://" + host + ruri
} else if r.Method == "CONNECT" && r.URL.Path == "" {
ruri = host
if r.URL.Opaque != "" {
ruri = r.URL.Opaque
}
}
if stringContainsCTLByte(ruri) {
return errors.New("net/http: can't write control character in Request.URL")
}
var bw *bufio.Writer
if _, ok := w.(io.ByteWriter); !ok {
bw = bufio.NewWriter(w)
w = bw
}
// write HTTP request line
_, err = fmt.Fprintf(w, "%s %s HTTP/1.1\r\n", valueOrDefault(r.Method, "GET"), ruri)
if err != nil {
return err
}
// write HTTP request Host header
_, err = fmt.Fprintf(w, "Host: %s\r\n", host)
if err != nil {
return err
}
if trace != nil && trace.WroteHeaderField != nil {
trace.WroteHeaderField("Host", []string{host})
}

userAgent := defaultUserAgent
if r.Header.has("User-Agent") {
userAgent = r.Header.Get("User-Agent")
}
if userAgent != "" {
// write HTTP request User-Agent header
_, err = fmt.Fprintf(w, "User-Agent: %s\r\n", userAgent)
if err != nil {
return err
}
if trace != nil && trace.WroteHeaderField != nil {
trace.WroteHeaderField("User-Agent", []string{userAgent})
}
}

tw, err := newTransferWriter(r)
if err != nil {
return err
}
err = tw.writeHeader(w, trace)
if err != nil {
return err
}

err = r.Header.writeSubset(w, reqWriteExcludeHeader, trace)
if err != nil {
return err
}

if extraHeaders != nil {
err = extraHeaders.write(w, trace)
if err != nil {
return err
}
}
// write blank line after HTTP request headers
_, err = io.WriteString(w, "\r\n")
if err != nil {
return err
}

if trace != nil && trace.WroteHeaders != nil {
trace.WroteHeaders()
}

if waitForContinue != nil {
if bw, ok := w.(*bufio.Writer); ok {
err = bw.Flush()
if err != nil {
return err
}
}
if trace != nil && trace.Wait100Continue != nil {
trace.Wait100Continue()
}
if !waitForContinue() {
r.closeBody()
return nil
}
}

if bw, ok := w.(*bufio.Writer); ok && tw.FlushHeaders {
if err := bw.Flush(); err != nil {
return err
}
}
err = tw.writeBody(w)
if err != nil {
if tw.bodyReadError == err {
err = requestBodyReadError{err}
}
return err
}

if bw != nil {
return bw.Flush()
}
return nil
}

We will not go through every line of code in above function. But I bet you find many familiar information, for example, at line 37 it write HTTP request line as the first information in the HTTP message. Then it continues writing HTTP headers such as Host and User-Agent(at line 42 and line 56), and finally add the blank line after the headers (at line 86). An HTTP request message is built up bit by bit. All right.

Bufio and underlying writer

Next piece of this puzzle is how it’s related to the underlying TCP connection.

Note this method call in the write loop:

1
2
// write method call in writeLoop
wr.req.Request.write(pc.bw, pc.isProxy, wr.req.extra, pc.waitForContinue(wr.continueCh))

The first parameter is pc.bw mentioned above. It’s time to take a deep look at it. pc.bw, a bufio.Write, is created by calling the following method from bufio package:

1
2
// pconn.bw is created by this method call
pconn.bw = bufio.NewWriterSize(persistConnWriter{pconn}, t.writeBufferSize())

Note that this bufio.Writer isn’t based on persistConn directly, instead a simple wrapper over persistConn called persistConnWriter is used here.

1
2
3
4
// persistConnWriter in transport.go file
type persistConnWriter struct {
pc *persistConn
}

What we need to understand is bufio.Writer wraps an io.Writer object, creating another Writer that also implements the interface but provides buffering functionality. And bufio.Writer’s Flush method writes the buffered data to the underlying io.Writer.

In this case, the underlying io.Writer is persistConnWriter. Its Write method will be used to write the buffered data:

1
2
3
4
5
6
// persistConnWriter in transport.go file
func (w persistConnWriter) Write(p []byte) (n int, err error) {
n, err = w.pc.conn.Write(p) // TCP socket Write system call is called here!
w.pc.nwrite += int64(n)
return
}

Internally it delegates the task to the TCP connection bond to pconn.conn!

roundTrip

As we mentioned above, writeLoop keeps receiving reqeusts from writech channel. So on the other hand, it means the requests should be sent to this channel somewhere. This is implemented inside the roundTrip method:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
// roundTrip in transport.go file
func (pc *persistConn) roundTrip(req *transportRequest) (resp *Response, err error) {
testHookEnterRoundTrip()
if !pc.t.replaceReqCanceler(req.cancelKey, pc.cancelRequest) {
pc.t.putOrCloseIdleConn(pc)
return nil, errRequestCanceled
}
pc.mu.Lock()
pc.numExpectedResponses++
headerFn := pc.mutateHeaderFunc
pc.mu.Unlock()

if headerFn != nil {
headerFn(req.extraHeaders())
}

requestedGzip := false
if !pc.t.DisableCompression &&
req.Header.Get("Accept-Encoding") == "" &&
req.Header.Get("Range") == "" &&
req.Method != "HEAD" {
requestedGzip = true
req.extraHeaders().Set("Accept-Encoding", "gzip")
}

var continueCh chan struct{}
if req.ProtoAtLeast(1, 1) && req.Body != nil && req.expectsContinue() {
continueCh = make(chan struct{}, 1)
}

if pc.t.DisableKeepAlives && !req.wantsClose() {
req.extraHeaders().Set("Connection", "close")
}

gone := make(chan struct{})
defer close(gone)

defer func() {
if err != nil {
pc.t.setReqCanceler(req.cancelKey, nil)
}
}()

const debugRoundTrip = false
startBytesWritten := pc.nwrite
writeErrCh := make(chan error, 1)
// send requet to pc.writech channel
pc.writech <- writeRequest{req, writeErrCh, continueCh}

resc := make(chan responseAndError)
pc.reqch <- requestAndChan{
req: req.Request,
cancelKey: req.cancelKey,
ch: resc,
addedGzip: requestedGzip,
continueCh: continueCh,
callerGone: gone,
}

var respHeaderTimer <-chan time.Time
cancelChan := req.Request.Cancel
ctxDoneChan := req.Context().Done()
for {
testHookWaitResLoop()
select {
case err := <-writeErrCh:
if debugRoundTrip {
req.logf("writeErrCh resv: %T/%#v", err, err)
}
if err != nil {
pc.close(fmt.Errorf("write error: %v", err))
return nil, pc.mapRoundTripError(req, startBytesWritten, err)
}
if d := pc.t.ResponseHeaderTimeout; d > 0 {
if debugRoundTrip {
req.logf("starting timer for %v", d)
}
timer := time.NewTimer(d)
defer timer.Stop()
respHeaderTimer = timer.C
}
case <-pc.closech:
if debugRoundTrip {
req.logf("closech recv: %T %#v", pc.closed, pc.closed)
}
return nil, pc.mapRoundTripError(req, startBytesWritten, pc.closed)
case <-respHeaderTimer:
if debugRoundTrip {
req.logf("timeout waiting for response headers.")
}
pc.close(errTimeout)
return nil, errTimeout
case re := <-resc:
if (re.res == nil) == (re.err == nil) {
panic(fmt.Sprintf("internal error: exactly one of res or err should be set; nil=%v", re.res == nil))
}
if debugRoundTrip {
req.logf("resc recv: %p, %T/%#v", re.res, re.err, re.err)
}
if re.err != nil {
return nil, pc.mapRoundTripError(req, startBytesWritten, re.err)
}
return re.res, nil
case <-cancelChan:
pc.t.cancelRequest(req.cancelKey, errRequestCanceled)
cancelChan = nil
case <-ctxDoneChan:
pc.t.cancelRequest(req.cancelKey, req.Context().Err())
cancelChan = nil
ctxDoneChan = nil
}
}
}

At line 48, you can find it clearly. In last article, you can see that pconn.roundTrip is the end of the HTTP request workflow. Now we had put all parts together. Great.

Summary

In this article (as the second part of this series), we reviewed how the HTTP request message is written to TCP/IP stack via socket system call.

How HTTP1.1 protocol is implemented in Golang net/http package: part one - request workflow

Background

In this article, I’ll write about one topic: how to implement the HTTP protocol. I keep planning to write about this topic for a long time. In my previous articles, I already wrote several articles about HTTP protocol:

I recommend you to read these articles above before this one.

As you know, HTTP protocol is in the application layer, which is the closest one to the end-user in the protocol stack.

So relatively speaking, HTTP protocol is not as mysterious as other protocols in the lower layers of this stack. Software engineers use HTTP every day and take it for granted. Have you ever thought about how we can implement a fully functional HTTP protocol library?

It turns out to be a very complex and big work in terms of software engineering. Frankly speaking, I can’t work it out by myself in a short period. So in this article, we’ll try to understand how to do it by investigating Golang net/http package as an example. We’ll read a lot of source code and draw diagrams to help your understanding of the source code.

Note HTTP protocol itself has evolved a lot from HTTP1.1 to HTTP2 and HTTP3, not to mention HTTPS. In this article, we’ll focus on the mechanism of HTTP1.1, but what you learned here can help you understand other new versions of HTTP protocol.

Note HTTP protocol is on the basis of client-server model. This article will focus on the client-side. For the HTTP server part, I’ll write another article next.

Main workflow of http.Client

HTTP client’s request starts from the application’s call to Get method of net/http package, and ends by writing the HTTP message to the TCP socket. The whole workflow can be simplified to the following diagram:

First, the public Get method calls Get method of DefaultClient, which is a global variable of type Client,

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
// Get method
func Get(url string) (resp *Response, err error) {
return DefaultClient.Get(url)
}

// DefaultClient is a global variable in net/http package
var DefaultClient = &Client{}

// struct type Client
type Client struct {
Transport RoundTripper

CheckRedirect func(req *Request, via []*Request) error

Jar CookieJar

Timeout time.Duration
}

Then, NewRequest method is used to construct a new request of type Request:

1
2
3
4
5
6
7
8
9
10
11
12
func (c *Client) Get(url string) (resp *Response, err error) {
req, err := NewRequest("GET", url, nil)
if err != nil {
return nil, err
}
return c.Do(req)
}


func NewRequest(method, url string, body io.Reader) (*Request, error) {
return NewRequestWithContext(context.Background(), method, url, body)
}

I’ll not show the function body of NewRequestWithContext, since it’s very long. But only paste the block of code for actually building the Request object as follows:

1
2
3
4
5
6
7
req := &Request{
// omit some code
Proto: "HTTP/1.1", // the default HTTP protocol version is set to 1.1
ProtoMajor: 1,
ProtoMinor: 1,
// omit some code
}

Note that by default the HTTP protocol version is set to 1.1. If you want to send HTTP2 request, then you need other solutions, and I’ll write about it in other articles.

Next, Do method is called, which delegates the work to the private do method.

1
2
3
func (c *Client) Do(req *Request) (*Response, error) {
return c.do(req)
}

do method handles the HTTP redirect behavior, which is very interesting. But since the code block is too long, I’ll not show its function body here. You can refer to the source code of it here.

Next, send method of Client is called which goes as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
func (c *Client) send(req *Request, deadline time.Time) (resp *Response, didTimeout func() bool, err error) {
if c.Jar != nil {
for _, cookie := range c.Jar.Cookies(req.URL) {
req.AddCookie(cookie)
}
}
// call send method here
resp, didTimeout, err = send(req, c.transport(), deadline)
if err != nil {
return nil, didTimeout, err
}
if c.Jar != nil {
if rc := resp.Cookies(); len(rc) > 0 {
c.Jar.SetCookies(req.URL, rc)
}
}
return resp, nil, nil
}

It handles cookies for the request, then calls the private method send with three parameters.

We already talked about the first parameter above. Let’s take a look at the second parameter c.transport() as follows:

1
2
3
4
5
6
func (c *Client) transport() RoundTripper {
if c.Transport != nil {
return c.Transport
}
return DefaultTransport
}

Transport is extremely important for HTTP client workflow. Let’s examine how it works bit by bit. First of all, it’s type of RoundTripper interface.

1
2
3
4
5
6
7
// this interface is defined inside client.go file 

type RoundTripper interface {
// RoundTrip executes a single HTTP transaction, returning
// a Response for the provided Request.
RoundTrip(*Request) (*Response, error)
}

RoundTripper interface only defines one method RoundTrip, all right.

If you don’t have any special settings, the DefaultTransport will be used for c.Transport above.

The DefaultTransport is going as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
// defined in transport.go
var DefaultTransport RoundTripper = &Transport{
Proxy: ProxyFromEnvironment,
DialContext: (&net.Dialer{
Timeout: 30 * time.Second,
KeepAlive: 30 * time.Second,
DualStack: true,
}).DialContext,
ForceAttemptHTTP2: true,
MaxIdleConns: 100,
IdleConnTimeout: 90 * time.Second,
TLSHandshakeTimeout: 10 * time.Second,
ExpectContinueTimeout: 1 * time.Second,
}

Note that its actual type is Transport as below:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
type Transport struct {
idleMu sync.Mutex
closeIdle bool // user has requested to close all idle conns
idleConn map[connectMethodKey][]*persistConn // most recently used at end
idleConnWait map[connectMethodKey]wantConnQueue // waiting getConns
idleLRU connLRU
reqMu sync.Mutex
reqCanceler map[cancelKey]func(error)
altMu sync.Mutex // guards changing altProto only
altProto atomic.Value // of nil or map[string]RoundTripper, key is URI scheme
connsPerHostMu sync.Mutex
connsPerHost map[connectMethodKey]int
connsPerHostWait map[connectMethodKey]wantConnQueue // waiting getConns
Proxy func(*Request) (*url.URL, error)
DialContext func(ctx context.Context, network, addr string) (net.Conn, error)
Dial func(network, addr string) (net.Conn, error)
DialTLSContext func(ctx context.Context, network, addr string) (net.Conn, error)
DialTLS func(network, addr string) (net.Conn, error)
TLSClientConfig *tls.Config
TLSHandshakeTimeout time.Duration
DisableKeepAlives bool
DisableCompression bool
MaxIdleConns int
MaxIdleConnsPerHost int
MaxConnsPerHost int
IdleConnTimeout time.Duration
ResponseHeaderTimeout time.Duration
ExpectContinueTimeout time.Duration
TLSNextProto map[string]func(authority string, c *tls.Conn) RoundTripper
ProxyConnectHeader Header
MaxResponseHeaderBytes int64
WriteBufferSize int
ReadBufferSize int
nextProtoOnce sync.Once
h2transport h2Transport // non-nil if http2 wired up
tlsNextProtoWasNil bool // whether TLSNextProto was nil when the Once fired
ForceAttemptHTTP2 bool
}

I list the full content of Transport struct here, although it contains many fields, and many of them will not be discussed in this article.

As we just mentioned, Transport is type of RoundTripper interface, it must implement the method RoundTrip, right?

You can find the RoundTrip method implementation of Transport struct type in roundtrip.go file as follows:

1
2
3
4
// RoundTrip method in roundtrip.go
func (t *Transport) RoundTrip(req *Request) (*Response, error) {
return t.roundTrip(req)
}

In the beginning, I thought this method should be included inside transport.go file, but it is defined inside another file.

Let’s back to the send method which takes c.Transport as the second argument:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
// send method in client.go 
func send(ireq *Request, rt RoundTripper, deadline time.Time) (resp *Response, didTimeout func() bool, err error) {
req := ireq // req is either the original request, or a modified fork

if rt == nil {
req.closeBody()
return nil, alwaysFalse, errors.New("http: no Client.Transport or DefaultTransport")
}

if req.URL == nil {
req.closeBody()
return nil, alwaysFalse, errors.New("http: nil Request.URL")
}

if req.RequestURI != "" {
req.closeBody()
return nil, alwaysFalse, errors.New("http: Request.RequestURI can't be set in client requests")
}

// forkReq forks req into a shallow clone of ireq the first
// time it's called.
forkReq := func() {
if ireq == req {
req = new(Request)
*req = *ireq // shallow clone
}
}

// Most the callers of send (Get, Post, et al) don't need
// Headers, leaving it uninitialized. We guarantee to the
// Transport that this has been initialized, though.
if req.Header == nil {
forkReq()
req.Header = make(Header)
}

if u := req.URL.User; u != nil && req.Header.Get("Authorization") == "" {
username := u.Username()
password, _ := u.Password()
forkReq()
req.Header = cloneOrMakeHeader(ireq.Header)
req.Header.Set("Authorization", "Basic "+basicAuth(username, password))
}

if !deadline.IsZero() {
forkReq()
}
stopTimer, didTimeout := setRequestCancel(req, rt, deadline)

resp, err = rt.RoundTrip(req)
if err != nil {
stopTimer()
if resp != nil {
log.Printf("RoundTripper returned a response & error; ignoring response")
}
if tlsErr, ok := err.(tls.RecordHeaderError); ok {
// If we get a bad TLS record header, check to see if the
// response looks like HTTP and give a more helpful error.
// See golang.org/issue/11111.
if string(tlsErr.RecordHeader[:]) == "HTTP/" {
err = errors.New("http: server gave HTTP response to HTTPS client")
}
}
return nil, didTimeout, err
}
if resp == nil {
return nil, didTimeout, fmt.Errorf("http: RoundTripper implementation (%T) returned a nil *Response with a nil error", rt)
}
if resp.Body == nil {
// The documentation on the Body field says “The http Client and Transport
// guarantee that Body is always non-nil, even on responses without a body
// or responses with a zero-length body.” Unfortunately, we didn't document
// that same constraint for arbitrary RoundTripper implementations, and
// RoundTripper implementations in the wild (mostly in tests) assume that
// they can use a nil Body to mean an empty one (similar to Request.Body).
// (See https://golang.org/issue/38095.)
//
// If the ContentLength allows the Body to be empty, fill in an empty one
// here to ensure that it is non-nil.
if resp.ContentLength > 0 && req.Method != "HEAD" {
return nil, didTimeout, fmt.Errorf("http: RoundTripper implementation (%T) returned a *Response with content length %d but a nil Body", rt, resp.ContentLength)
}
resp.Body = ioutil.NopCloser(strings.NewReader(""))
}
if !deadline.IsZero() {
resp.Body = &cancelTimerBody{
stop: stopTimer,
rc: resp.Body,
reqDidTimeout: didTimeout,
}
}
return resp, nil, nil
}

At line 50 of send method above:

1
resp, err = rt.RoundTrip(req)

RoundTrip method is called to send the request. Based on the comments in the source code, you can understand it in the following way:

  • RoundTripper is an interface representing the ability to execute a single HTTP transaction, obtaining the Response for a given Request.

Next, let’s go to roundTrip method of Transport:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
// roundTrip method in transport.go, which is called by RoundTrip method internally 

// roundTrip implements a RoundTripper over HTTP.
func (t *Transport) roundTrip(req *Request) (*Response, error) {
t.nextProtoOnce.Do(t.onceSetNextProtoDefaults)
ctx := req.Context()
trace := httptrace.ContextClientTrace(ctx)

if req.URL == nil {
req.closeBody()
return nil, errors.New("http: nil Request.URL")
}
if req.Header == nil {
req.closeBody()
return nil, errors.New("http: nil Request.Header")
}
scheme := req.URL.Scheme
isHTTP := scheme == "http" || scheme == "https"
if isHTTP {
for k, vv := range req.Header {
if !httpguts.ValidHeaderFieldName(k) {
req.closeBody()
return nil, fmt.Errorf("net/http: invalid header field name %q", k)
}
for _, v := range vv {
if !httpguts.ValidHeaderFieldValue(v) {
req.closeBody()
return nil, fmt.Errorf("net/http: invalid header field value %q for key %v", v, k)
}
}
}
}

origReq := req
cancelKey := cancelKey{origReq}
req = setupRewindBody(req)

if altRT := t.alternateRoundTripper(req); altRT != nil {
if resp, err := altRT.RoundTrip(req); err != ErrSkipAltProtocol {
return resp, err
}
var err error
req, err = rewindBody(req)
if err != nil {
return nil, err
}
}
if !isHTTP {
req.closeBody()
return nil, badStringError("unsupported protocol scheme", scheme)
}
if req.Method != "" && !validMethod(req.Method) {
req.closeBody()
return nil, fmt.Errorf("net/http: invalid method %q", req.Method)
}
if req.URL.Host == "" {
req.closeBody()
return nil, errors.New("http: no Host in request URL")
}

for {
select {
case <-ctx.Done():
req.closeBody()
return nil, ctx.Err()
default:
}

// treq gets modified by roundTrip, so we need to recreate for each retry.
treq := &transportRequest{Request: req, trace: trace, cancelKey: cancelKey}
cm, err := t.connectMethodForRequest(treq)
if err != nil {
req.closeBody()
return nil, err
}

// Get the cached or newly-created connection to either the
// host (for http or https), the http proxy, or the http proxy
// pre-CONNECTed to https server. In any case, we'll be ready
// to send it requests.
pconn, err := t.getConn(treq, cm)
if err != nil {
t.setReqCanceler(cancelKey, nil)
req.closeBody()
return nil, err
}

var resp *Response
if pconn.alt != nil {
// HTTP/2 path.
t.setReqCanceler(cancelKey, nil) // not cancelable with CancelRequest
resp, err = pconn.alt.RoundTrip(req)
} else {
resp, err = pconn.roundTrip(treq)
}
if err == nil {
resp.Request = origReq
return resp, nil
}

// Failed. Clean up and determine whether to retry.
if http2isNoCachedConnError(err) {
if t.removeIdleConn(pconn) {
t.decConnsPerHost(pconn.cacheKey)
}
} else if !pconn.shouldRetryRequest(req, err) {
// Issue 16465: return underlying net.Conn.Read error from peek,
// as we've historically done.
if e, ok := err.(transportReadFromServerError); ok {
err = e.err
}
return nil, err
}
testHookRoundTripRetried()

// Rewind the body if we're able to.
req, err = rewindBody(req)
if err != nil {
return nil, err
}
}
}

There are three key points:

  • at line 70, a new variable of type transportRequest, which embeds Request, is created.
  • at line 81, getConn method is called, which implements the cached connection pool to support the persistent connection mode. Of course, if no cached connection is available, a new connection will be created and added to the connection pool. I will explain this behavior in detail next section.
  • from line 89 to line 95, pconn.roundTrip is called. The name of variable pconn is self-explaining which means it is type of persistConn.

transportRequest is passed as parameter to getConn method, which returns pconn. pconn.roundTrip is called to execute the HTTP request. we have covered all the steps in the above workflow diagram.

Summary

In this first article of this series, we talked about the workflow of sending an HTTP request step by step. And I’ll discuss how to send the HTTP message to the TCP stack in the second article.

Understand how HTTP/1.1 persistent connection works based on Golang: part two - concurrent requests

Background

In the last post, I show you how HTTP/1.1 persistent connection works in a simple demo app, which sends sequential requests.

We observe the underlying TCP connection behavior based on the network analysis tool: netstat and tcpdump.

In this article, I will modify the demo app and make it send concurrent requests. In this way, we can have more understanding about HTTP/1.1’s persistent connection.

Concurrent requests

The demo code goes as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
package main

import (
"fmt"
"io"
"io/ioutil"
"log"
"net/http"
"sync"
"time"
)

func startHTTPserver() {

http.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
time.Sleep(time.Duration(50) * time.Microsecond)
fmt.Fprintf(w, "Hello world")
})

go func() {
http.ListenAndServe(":8080", nil)
}()

}

func startHTTPRequest(index int, wg *sync.WaitGroup) {
counter := 0
for i := 0; i < 10; i++ {
resp, err := http.Get("http://localhost:8080/")
if err != nil {
panic(fmt.Sprintf("Error: %v", err))
}
io.Copy(ioutil.Discard, resp.Body) // fully read the response body
resp.Body.Close() // close the response body
log.Printf("HTTP request #%v in Goroutine #%v", counter, index)
counter += 1
time.Sleep(time.Duration(1) * time.Second)
}
wg.Done()
}

func main() {
startHTTPserver()
var wg sync.WaitGroup
for i := 0; i < 10; i++ {
wg.Add(1)
go startHTTPRequest(i, &wg)
}
wg.Wait()
}

We create 10 goroutines, and each goroutine sends 10 sequential requests concurrently.

Note: In HTTP/1.1 protocol, concurrent requests will establish multiple TCP connections. That’s the restriction of HTTP/1.1, the way to enhance it is using HTTP/2 which can multiplex one TCP connection for multiple parallel HTTP connections. HTTP/2 is not in the scope of this post. I will talk about it in another article.

Note that in the above demo, we have fully read the response body and closed it, and based on the discussion in last article, the HTTP requests should work in the persistent connection model.

Before we use the network tool to analyze the behavior, let’s imagine how many TCP connections will be established. As there are 10 concurrent goroutines, 10 TCP connections should be established, and all the HTTP requests should re-use these 10 TCP connections, right? That’s our expectation.

Next, let’s verify our expectation with netstat as follows:

It shows that the number of TCP connections is much more than 10. The persistent connection does not work as we expect.

After reading the source code of net/http package, I find the following hints:

The Client is defined inside client.go which is the type for HTTP client, and Transport is one of the properties.

1
2
3
4
5
6
7
8
9
type Client struct {
Transport RoundTripper

CheckRedirect func(req *Request, via []*Request) error

Jar CookieJar

Timeout time.Duration
}

Transport is defined in transport.go like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
// DefaultTransport is the default implementation of Transport and is
// used by DefaultClient. It establishes network connections as needed
// and caches them for reuse by subsequent calls. It uses HTTP proxies
// as directed by the $HTTP_PROXY and $NO_PROXY (or $http_proxy and
// $no_proxy) environment variables.
var DefaultTransport RoundTripper = &Transport{
Proxy: ProxyFromEnvironment,
DialContext: (&net.Dialer{
Timeout: 30 * time.Second,
KeepAlive: 30 * time.Second,
}).DialContext,
ForceAttemptHTTP2: true,
MaxIdleConns: 100,
IdleConnTimeout: 90 * time.Second,
TLSHandshakeTimeout: 10 * time.Second,
ExpectContinueTimeout: 1 * time.Second,
}

// DefaultMaxIdleConnsPerHost is the default value of Transport's
// MaxIdleConnsPerHost.
const DefaultMaxIdleConnsPerHost = 2

Transport is type of RoundTripper, which is an interface representing the ability to execute a single HTTP transaction, obtaining the Response for a given Request. RoundTripper is a very important structure in net/http package, we’ll review (and analyze) the source code in the next article. In this article, we’ll not discuss the details.

Note that there are two parameters of Transport:

  • MaxIdleConns: controls the maximum number of idle (keep-alive) connections across all hosts.
  • MaxIdleConnsPerHost: controls the maximum idle (keep-alive) connections to keep per-host. If zero, DefaultMaxIdleConnsPerHost is used.

By default, MaxIdleConns is 100 and MaxIdleConnsPerHost is 2.

In our demo case, ten goroutines send requests to the same host (which is localhost:8080). Although MaxIdleConns is 100, but only 2 idle connections can be cached for this host because MaxIdleConnsPerHost is 2. That’s why you saw much more TCP connections are established.

Based on this analysis, let’s refactor the code as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
package main

import (
"fmt"
"io"
"io/ioutil"
"log"
"net/http"
"sync"
"time"
)

var (
httpClient *http.Client
)

func init() {
httpClient = &http.Client{
Transport: &http.Transport{
MaxIdleConnsPerHost: 10, // set connection pool size for each host
MaxIdleConns: 100,
},
}
}

func startHTTPserver() {

http.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
time.Sleep(time.Duration(50) * time.Microsecond)
fmt.Fprintf(w, "Hello world")
})

go func() {
http.ListenAndServe(":8080", nil)
}()

}

func startHTTPRequest(index int, wg *sync.WaitGroup) {
counter := 0
for i := 0; i < 10; i++ {
resp, err := httpClient.Get("http://localhost:8080/")
if err != nil {
panic(fmt.Sprintf("Error: %v", err))
}
io.Copy(ioutil.Discard, resp.Body) // fully read the response body
resp.Body.Close() // close the response body
log.Printf("HTTP request #%v in Goroutine #%v", counter, index)
counter += 1
time.Sleep(time.Duration(1) * time.Second)
}
wg.Done()
}

func main() {
startHTTPserver()
var wg sync.WaitGroup
for i := 0; i < 10; i++ {
wg.Add(1)
go startHTTPRequest(i, &wg)
}
wg.Wait()
}

This time we don’t use the default httpClient, instead we create a customized client which sets MaxIdleConnsPerHost to be 10. This means the size of the connection pool is changed to 10, which can cache 10 idle TCP connections for each host.

Verify the behavior with netstat again:

Now the result is what we expect.

Summary

In this article, we discussed how to make HTTP/1.1 persistent connection work in a concurrent case by tunning the parameters for the connection pool. In the next article, let’s review the source code to study how to implement HTTP client.

Understand how HTTP/1.1 persistent connection works based on Golang: part one - sequential requests

Background

Initially, HTTP was a single request-and-response model. An HTTP client opens the TCP connection, requests a resource, gets the response, and the connection is closed. And establishing and terminating each TCP connection is a resource-consuming operation (in detail, you can refer to my previous article). As the web application becomes more and more complex, displaying a single page may require several HTTP requests, too many TCP connection operations will have a bad impact on the performance.

So persistent-connection (which is also called keep-alive) model is created in HTTP/1.1 protocol. In this model, TCP connections keep open between several successive requests, and in this way, the time needed to open new connections will be reduced.

In this article, I will show you how persistent connection works based on a Golang application. We will do some experiments based on the demo app, and verify the TCP connection behavior with some popular network packet analysis tools. In short, After reading this article, you will learn:

  • Golang http.Client usage (and a little bit source code analysis)
  • network analysis with netstat and tcpdump

You can find the demo Golang application in this Github repo.

Sequential requests

Let’s start from the simple case where the client keeps sending sequential requests to the server. The code goes as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
package main

import (
"fmt"
"log"
"net/http"
"time"
)

func startHTTPserver() {

http.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
time.Sleep(time.Duration(50) * time.Microsecond)
fmt.Fprintf(w, "Hello world")
})

go func() {
http.ListenAndServe(":8080", nil)
}()

}

func startHTTPRequest() {
counter := 0
for i := 0; i < 10; i++ {
_, err := http.Get("http://localhost:8080/")
if err != nil {
panic(fmt.Sprintf("Error: %v", err))
}
log.Printf("HTTP request #%v", counter)
counter += 1
time.Sleep(time.Duration(1) * time.Second)
}
}

func main() {
startHTTPserver()

startHTTPRequest()
}

We start an HTTP server in a Goroutine, and keep sending ten sequential requests to it. Right? Let’s run the application and check the numbers and status of TCP connections.

After running the above code, you can see the following output:

When the application stops running, we can run the following netstat command:

1
netstat -n  | grep 8080

The TCP connections are listed as follows:

Obviously, the 10 HTTP requests are not persistent since 10 TCP connections are opened.

Note: the last column of netstat shows the state of TCP connection. The state of TCP connection termination process can be explained with the following image:

I will not cover the details in this article. But we need to understand the meaning of TIME-WAIT.

In the four-way handshake process, the client will send the ACK packet to terminate the connection, but the state of TCP can’t immediately go to CLOSED. The client has to wait for some time and the state in this waiting process is called TIME-WAIT. The TCP connection needs this TIME-WAIT state for two main reasons.

  • The first is to provide enough time that the ACK is received by the other peer.
  • The second is to provide a buffer period between the end of current connection and any subsequent ones. If not for this period, it’s possible that packets from different connections could be mixed. In detail, you can refer to this book.

In our demo application case, if you wait for a while after the program stops, and run the netstat command again then no TCP connection will be listed in the output since they’re all closed.

Another tool to verify the TCP connections is tcpdump, which can capture every network packet send to your machine. In our case, you can run the following tcpdump command:

1
sudo tcpdump -i any -n host localhost

It will capture all the network packets send from or to the localhost (we’re running the server in localhost, right?). tcpdump is a great tool to help you understand the network, you can refer to its document for more help.

Note: in our demo code above, we send 10 HTTP requests in sequence, which will make the capture result from tcpdump too long. So I modified the for loop to only send 2 sequential requests, which is enough to verify the behavior of persistent connection. The result goes as follows:

In tcpdump output, the Flag [S] represents SYN flag, which is used to establish the TCP connection. The above snapshot contains two Flag [S] packets. The first Flag [S] is triggered by the first HTTP call, and the following packets are HTTP request and response. Then you can see the second Flag [S] packet to open a new TCP connection, which means the second HTTP request is not persistent connection as we hope.

Next step, let’s see how to make HTTP work as a persistent connection in Golang.

In fact,this is a well known issue in Golang ecosystem, you can find the information in the official document:

  • If the returned error is nil, the Response will contain a non-nil Body which the user is expected to close. If the Body is not both read to EOF and closed, the Client’s underlying RoundTripper (typically Transport) may not be able to re-use a persistent TCP connection to the server for a subsequent “keep-alive” request.

The fix will be straightforward by just adding two more lines of code as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
package main

import (
"fmt"
"io"
"io/ioutil"
"log"
"net/http"
"time"
)

func startHTTPServer() {

http.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
time.Sleep(time.Duration(50) * time.Microsecond)
fmt.Fprintf(w, "Hello world")
})

go func() {
http.ListenAndServe(":8080", nil)
}()

}

func startHTTPRequest() {
counter := 0
for i := 0; i < 10; i++ {
resp, err := http.Get("http://localhost:8080/")
if err != nil {
panic(fmt.Sprintf("Error: %v", err))
}
io.Copy(ioutil.Discard, resp.Body) // read the response body
resp.Body.Close() // close the response body
log.Printf("HTTP request #%v", counter)
counter += 1
time.Sleep(time.Duration(1) * time.Second)
}
}

func main() {
startHTTPServer()

startHTTPRequest()
}

let’s verify by running netstat command, the result goes as follows:

This time 10 sequential HTTP requests establish only one TCP connection. This behavior is just what we hope: persistent connection.

We can double verify it by doing the same experiment as above: run two HTTP requests in sequence and capture packets with tcpdump:

This time, only one Flag [S] packet is there! The two sequential HTTP request re-use the same underlying TCP connection.

Summary

In this article, we showed how HTTP persistent connection works in the case of sequential requests. In the next article, we can show you the case of concurrent requests.