Wednesday, October 31, 2012

Does Tomcat bite more than it can chew?

This is an interesting blog post for me, since its about the root cause for an issue we saw with Tomcat back in May 2011, which remained unresolved. Under high concurrency and load, Tomcat would reset (i.e. TCP level RST) client connections, without refusing to accept them - as one would expect. I posted this again to the Tomcat user list a few days back, but then wanted to find out the root cause for myself, since it would surely come up again in the future.

Background

This issue initially became evident when we ran high concurrency load tests at a customer location in Europe, where the customer had back-end services deployed on multiple Tomcat instances, and wanted to use the UltraESB for routing messages with load balancing and fail-over. For the ESB Performance Benchmark, we had been using an EchoService written over the Apache HttpComponents/Core NIO library that scaled extremely well and behaved well at the TCP level, even under load. However, at the client site, they wanted the test run against real services deployed on Tomcat - to analyse a more realistic scenario. We used a Java based clone of ApacheBench called the 'Java Bench' which is also a part of the Apache HttpComponents project, to generate load. The client would go up-to concurrency levels of 2560, pushing as many messages as possible through the ESB, to back end services deployed over Tomcat.

Under high load, the ESB would start to see errors while talking to Tomcat, and the cause would be IO errors such as "Connection reset by peer". Now the problem to the ESB is that it had already started to send out an HTTP request / payload over an accepted TCP connection, and thus it does not know if it can fail-over safely by default to another node, since the backend service might have  performed some processing over the request it may have already received. Of-course, the ESB could be configured to retry on such errors as well, but our default behaviour was to fail-over only on the safer connection refused or connect timeout errors (i.e. a connection could not be established within the allocated time) - which ensures correct operation, even for non-idempotent services.

Recent Observations

We recently experienced the same issue with Tomcat when a customer wanted to perform a load test scenario where a back-end service would block for 1-5 seconds randomly, to simulate realistic behaviour. Here, again we saw that Tomcat was resetting accepted TCP connections, and we were able to capture this with Wireshark as follows, using JavaBench directly against a Tomcat based servlet


As can be seen in the trace, the client initiated a TCP connection with the source port 9386, and Tomcat running on port 9000 accepted the connection - note “1”. The client kept sending packets of a 100K request, and Tomcat kept acknowledging them. The last such case is annotated with note “2”. Note that the request payload was not complete at this time from the client – note “3”. Suddenly, Tomcat resets the connection – note “4”

Understanding the root cause

After failing to locate any code in the Tomcat source code that resets established connections, I wanted to simulate the behaviour with a very simple Java program. Luckily the problem was easy to reproduce with a simple program as follows:

import java.net.ServerSocket;
import java.net.Socket;

public class TestAccept1 {

    public static void main(String[] args) throws Exception {
        ServerSocket serverSocket = new ServerSocket(8280, 0);
        Socket socket = serverSocket.accept();
        Thread.sleep(3000000); // do nothing
    }
}


We just open a server socket on port 8280, with a backlog of 0 and start listening for connections. Since the backlog is 0, one could assume that only one client connection would be allowed - BUT, I could open more than that via telnet as follows, and even send some data afterwards by typing it in and pressing the enter key.

telnet localhost 8280
hello world

A netstat command now confirms that more than one connection is opene:

netstat -na | grep 8280
tcp        0      0 127.0.0.1:34629         127.0.0.1:8280          ESTABLISHED
tcp        0      0 127.0.0.1:34630         127.0.0.1:8280          ESTABLISHED
tcp6       0      0 :::8280                 :::*                    LISTEN    
tcp6      13      0 127.0.0.1:8280          127.0.0.1:34630         ESTABLISHED
tcp6      13      0 127.0.0.1:8280          127.0.0.1:34629         ESTABLISHED

However, the Java program has only accepted ONE socket, although at the OS level, two would appear. It seems like the OS also allows more than two connections to be opened, even when the backlog is specified as 0. On Ubuntu 12.04 x64, the netstat command would not show me the actual listen queue length - but I believe it was not 0. However, before and after this test, I did not see a difference in the reported statistics for "listen queue" overflow, which I could see with the "netstat -sp tcp | fgrep listen" command

Next I used the JavaBench from the SOA ToolBox and issued a small payload at concurrency 1024, with a single iteration against the same port 8280


As expected, all requests failed, but my Wireshark trace on port 8280 did not detect any connection resets. Pushing the concurrency to 2560 and the iterations to 10 started to show tcp level RSTs - which were similar to those seen on Tomcat, though not exactly the same.

 

Can Tomcat do better?

Yes, Possibly .. What an end user would expect from Tomcat is that it refuses to accept new connections when under load, and not to accept connections and then reset them halfway through. But one would ask if that is achievable? Especially considering the behaviour seen with the simple Java example we discussed.

Well, the solution could be to perform better handling of the low level HTTP connections and the sockets, and this is already done by the free and open source high performance Enterprise Service Bus UltraESB, which utilizes the excellent Apache HttpComponents project underneath.

How does the UltraESB behave

One could easily test this by using the 'stopNewConnectionsAt' property of our NIO listener. If you set it to 2, you wont be able to even open a Telnet session to the socket beyond 2.

The first would work, the second too
But the third would see a "Connection refused"
And the UltraESB would report the following on its logs:

  INFO HttpNIOListener HTTP Listener http-8280 paused  
  WARN HttpNIOListener$EventLogger Enter maintenance mode as open connections reached : 2

Although it refuses to accept new connections, already accepted connections executes without any hindrance to completion. Thus a hardware level load balancer in front of an UltraESB cluster can safely load balance if an UltraESB node is loaded beyond its configured limits, without having to deal with any connection resets. Once a connection slot becomes free, the UltraESB will start accepting new connections as applicable.

Analysing a corresponding TCP dump

To analyse the corresponding behaviour, we wrote a simple Echo proxy service on the UltraESB, that also slept for 1 to 5 seconds before it replied, and tested this with the same JavaBench under 2560 concurrent users, each trying to push 10 messages in iteration.

Out of the 25600 requests, 7 completed successfully, while 25593 failed, as expected. We also saw many tcp level RSTs on the Wireshark dump - which must have been issued by the underlying operating system.


However, what's interesting to note is the difference - the RSTs occur immediately on receiving the SYN packet from the client - and are not established HTTP or TCP connections, but elegant "Connection Refused" errors - which would be what the client can expect. Thus the client can safely fail-over to another node without any doubt, overhead or delay.

Appendix : Supporting high concurrency in general

During testing we also saw that the Linux OS could detect the opening of many concurrent connections at the same time as a SYN flood attack, and then start using SYN cookies. You would see messages such as 
-->
Possible SYN flooding on port 9000. Sending cookies

displayed on the output of a "sudo dmesg", if this happens. Hence, for a real load, it would be better to disable SYN cookies by turning it off as follows as the root user
-->
# echo 0 > /proc/sys/net/ipv4/tcp_syncookies

To make the change persist over reboots, add the following line to your /etc/sysctl.conf
-->
net.ipv4.tcp_syncookies = 0

To allow the Linux OS to accept more connections, its also recommended that the 'net.core.somaxconn' be increased - as it usually defaults to 128 or so. This could be performed by the root user as follows,
-->
# echo 1024 > /proc/sys/net/core/somaxconn

To persist the change, append the following to the /etc/sysctl.conf
-->
net.core.somaxconn = 1024


Kudos!

The UltraESB could not have behaved gracefully without the support of the underlying Apache HttpComponents library, and the help and support received from that project community, especially by Oleg Kalnichevski - whose code and help has always fascinated me!



2 comments:

Peter Hofman said...

Nice and clear article!

Andreas Veithen said...

The behavior you saw with the TestAccept1 program can actually be explained if you look at how TCP backlog is implemented in Linux. First of all, the check to see if the accept queue is full is implemented in an inline function called sk_acceptq_is_full in include/net/sock.h. It uses the following condition:

sk->sk_ack_backlog > sk->sk_max_ack_backlog

Since the comparison operator used here is ">" and not ">=", the effective size of the accept queue is actually backlog+1. This explains why you see up to 2 connections in state ESTABLISHED on the server side (one that the code in TestAccept1 accepted and one that is still in the accept queue).

The fact that the size of the accept queue is backlog+1 and not backlog may be a bug, but don't forget that anyway the size of the accept queue can never be 0. The reason is that a connection must go through the accept queue before it can be accepted by the application.

You also mention that "It seems like the OS also allows more than two connections to be opened, even when the backlog is specified as 0". That is actually true for Linux, but the connection will only appear as ESTABLISHED to the client. The server side socket will remain in state SYN_RCVD. The reason for this is explained in the following blog post:

http://veithen.blogspot.be/2014/01/how-tcp-backlog-works-in-linux.html

You can see the trace shown in my post is almost identical to the one in your third screenshot.

PS: Note that this doesn't explain your observations about Tomcat.