Saturday, February 23, 2013

What drives talent out?


A few days back I saw the famous presentation from Netflix about its culture, and three interesting slides from it was about what drives talent out of a company.


Today I came across a poster created by හිච්චි කොලුවා which shows another aspect from a more Sri Lankan point of view, and it says its the 'තිත්ත ඇත්ත' .. interesting points to think about!



PS: If you aim the moon, believe that you WILL get there. If a planes' limit is the clouds.. get onto a rocket.. do not deviate from achieving your goal!

Monday, November 5, 2012

How to stop biting, when you cant chew more..

This is a follow up to my earlier post 'Does Tomcat bite more than it can chew?' and illustrates a pure Java program that utilizes Java NIO to stop accepting new messages when one is not able to handle the load, without any dependence on the TCP backlog etc.

Program Implementation

We open a selector, and invoke the startListening() method, that opens a ServerSocketChannel and then binds it to port 8280. The channel is configured as non-blocking, and finally we register our interest in OP_ACCEPT to handle incoming connections.

    private void startListening() throws IOException {
        server = ServerSocketChannel.open();
        server.socket().bind(new InetSocketAddress(8280), 0);
        server.configureBlocking(false);
        server.register(selector, SelectionKey.OP_ACCEPT);
        System.out.println("\nI am ready to listen for new messages now..");
    }


If you telnet to the port via a command line after the server starts up, you would see a message "Hi there! type a word". The server accepts the incoming connection as a non-blocking connection, and registers OP_READ to read the content typed in.


To illustrate how the server can prevent another client from connecting to it while it serves the currently connected client, it prints "I accepted this one.. but not any more now" on its console, cancels the SelectionKey and closes the channel.

A new telnet session will see the "Connection refused" error as expected.

Next, I would type a small word into the first telnet session, and the server would print it in its console, and close the connection.

At the same time, it prints the message "I am ready to listen for new messages now.." on its console, and invokes the above startListening() method again - making it ready to listen and accept a new client.

We re-try the connection from the same command prompt that received the 'Connection refused' earlier, and as expected are greeted with the welcome message again.

What did we learn?

A low level NIO server can stop accepting new connections, if it can determine that its not able to serve new clients. Once it stops accepting connections this way, any client connection attempt will see a 'Connection Refused' error. This may not be the case if our server was implemented differently (See my last article and its example and how Tomcat behaves under load).

Complete source code

import java.io.IOException;
import java.net.InetSocketAddress;
import java.nio.ByteBuffer;
import java.nio.channels.SelectionKey;
import java.nio.channels.Selector;
import java.nio.channels.ServerSocketChannel;
import java.nio.channels.SocketChannel;
import java.util.Iterator;

public class TestAccept2 {

    private ServerSocketChannel server = null;
    private Selector selector = null;

    public static void main(String[] args) throws Exception {
        new TestAccept2().run();
    }

    private void run() throws Exception {
        selector = Selector.open();
        startListening();

        while (true) {
            selector.select();

            for (Iterator i = selector.selectedKeys().iterator(); i.hasNext(); ) {
                SelectionKey key = i.next();
                i.remove();
                if (key.isAcceptable()) {
                    SocketChannel client = server.accept();
                    client.configureBlocking(false);
                    client.socket().setTcpNoDelay(true);
                    client.register(selector, SelectionKey.OP_READ);

                    System.out.println("I accepted this one.. but not any more now");
                    key.cancel();
                    key.channel().close();
                    sayHello(client);

                } else if (key.isReadable()) {
                    readDataFromSocket(key);
                }
            }
        }
    }

    private void startListening() throws IOException {
        server = ServerSocketChannel.open();
        server.socket().bind(new InetSocketAddress(8280), 0);
        server.configureBlocking(false);
        server.register(selector, SelectionKey.OP_ACCEPT);
        System.out.println("\nI am ready to listen for new messages now..");
    }

    private void sayHello(SocketChannel channel) throws Exception {
        channel.write(ByteBuffer.wrap("Hi there! type a word\r\n".getBytes()));
    }

    private void readDataFromSocket(SelectionKey key) throws Exception {
        SocketChannel socketChannel = (SocketChannel) key.channel();
        ByteBuffer buffer = ByteBuffer.allocate(32);
        if (socketChannel.read(buffer) > 0) {
            buffer.flip();
            byte[] bytearr = new byte[buffer.remaining()];
            buffer.get(bytearr);
            System.out.print(new String(bytearr));
            socketChannel.close();

            startListening();
        }
    }

}


Wednesday, October 31, 2012

Does Tomcat bite more than it can chew?

This is an interesting blog post for me, since its about the root cause for an issue we saw with Tomcat back in May 2011, which remained unresolved. Under high concurrency and load, Tomcat would reset (i.e. TCP level RST) client connections, without refusing to accept them - as one would expect. I posted this again to the Tomcat user list a few days back, but then wanted to find out the root cause for myself, since it would surely come up again in the future.

Background

This issue initially became evident when we ran high concurrency load tests at a customer location in Europe, where the customer had back-end services deployed on multiple Tomcat instances, and wanted to use the UltraESB for routing messages with load balancing and fail-over. For the ESB Performance Benchmark, we had been using an EchoService written over the Apache HttpComponents/Core NIO library that scaled extremely well and behaved well at the TCP level, even under load. However, at the client site, they wanted the test run against real services deployed on Tomcat - to analyse a more realistic scenario. We used a Java based clone of ApacheBench called the 'Java Bench' which is also a part of the Apache HttpComponents project, to generate load. The client would go up-to concurrency levels of 2560, pushing as many messages as possible through the ESB, to back end services deployed over Tomcat.

Under high load, the ESB would start to see errors while talking to Tomcat, and the cause would be IO errors such as "Connection reset by peer". Now the problem to the ESB is that it had already started to send out an HTTP request / payload over an accepted TCP connection, and thus it does not know if it can fail-over safely by default to another node, since the backend service might have  performed some processing over the request it may have already received. Of-course, the ESB could be configured to retry on such errors as well, but our default behaviour was to fail-over only on the safer connection refused or connect timeout errors (i.e. a connection could not be established within the allocated time) - which ensures correct operation, even for non-idempotent services.

Recent Observations

We recently experienced the same issue with Tomcat when a customer wanted to perform a load test scenario where a back-end service would block for 1-5 seconds randomly, to simulate realistic behaviour. Here, again we saw that Tomcat was resetting accepted TCP connections, and we were able to capture this with Wireshark as follows, using JavaBench directly against a Tomcat based servlet


As can be seen in the trace, the client initiated a TCP connection with the source port 9386, and Tomcat running on port 9000 accepted the connection - note “1”. The client kept sending packets of a 100K request, and Tomcat kept acknowledging them. The last such case is annotated with note “2”. Note that the request payload was not complete at this time from the client – note “3”. Suddenly, Tomcat resets the connection – note “4”

Understanding the root cause

After failing to locate any code in the Tomcat source code that resets established connections, I wanted to simulate the behaviour with a very simple Java program. Luckily the problem was easy to reproduce with a simple program as follows:

import java.net.ServerSocket;
import java.net.Socket;

public class TestAccept1 {

    public static void main(String[] args) throws Exception {
        ServerSocket serverSocket = new ServerSocket(8280, 0);
        Socket socket = serverSocket.accept();
        Thread.sleep(3000000); // do nothing
    }
}


We just open a server socket on port 8280, with a backlog of 0 and start listening for connections. Since the backlog is 0, one could assume that only one client connection would be allowed - BUT, I could open more than that via telnet as follows, and even send some data afterwards by typing it in and pressing the enter key.

telnet localhost 8280
hello world

A netstat command now confirms that more than one connection is opene:

netstat -na | grep 8280
tcp        0      0 127.0.0.1:34629         127.0.0.1:8280          ESTABLISHED
tcp        0      0 127.0.0.1:34630         127.0.0.1:8280          ESTABLISHED
tcp6       0      0 :::8280                 :::*                    LISTEN    
tcp6      13      0 127.0.0.1:8280          127.0.0.1:34630         ESTABLISHED
tcp6      13      0 127.0.0.1:8280          127.0.0.1:34629         ESTABLISHED

However, the Java program has only accepted ONE socket, although at the OS level, two would appear. It seems like the OS also allows more than two connections to be opened, even when the backlog is specified as 0. On Ubuntu 12.04 x64, the netstat command would not show me the actual listen queue length - but I believe it was not 0. However, before and after this test, I did not see a difference in the reported statistics for "listen queue" overflow, which I could see with the "netstat -sp tcp | fgrep listen" command

Next I used the JavaBench from the SOA ToolBox and issued a small payload at concurrency 1024, with a single iteration against the same port 8280


As expected, all requests failed, but my Wireshark trace on port 8280 did not detect any connection resets. Pushing the concurrency to 2560 and the iterations to 10 started to show tcp level RSTs - which were similar to those seen on Tomcat, though not exactly the same.

 

Can Tomcat do better?

Yes, Possibly .. What an end user would expect from Tomcat is that it refuses to accept new connections when under load, and not to accept connections and then reset them halfway through. But one would ask if that is achievable? Especially considering the behaviour seen with the simple Java example we discussed.

Well, the solution could be to perform better handling of the low level HTTP connections and the sockets, and this is already done by the free and open source high performance Enterprise Service Bus UltraESB, which utilizes the excellent Apache HttpComponents project underneath.

How does the UltraESB behave

One could easily test this by using the 'stopNewConnectionsAt' property of our NIO listener. If you set it to 2, you wont be able to even open a Telnet session to the socket beyond 2.

The first would work, the second too
But the third would see a "Connection refused"
And the UltraESB would report the following on its logs:

  INFO HttpNIOListener HTTP Listener http-8280 paused  
  WARN HttpNIOListener$EventLogger Enter maintenance mode as open connections reached : 2

Although it refuses to accept new connections, already accepted connections executes without any hindrance to completion. Thus a hardware level load balancer in front of an UltraESB cluster can safely load balance if an UltraESB node is loaded beyond its configured limits, without having to deal with any connection resets. Once a connection slot becomes free, the UltraESB will start accepting new connections as applicable.

Analysing a corresponding TCP dump

To analyse the corresponding behaviour, we wrote a simple Echo proxy service on the UltraESB, that also slept for 1 to 5 seconds before it replied, and tested this with the same JavaBench under 2560 concurrent users, each trying to push 10 messages in iteration.

Out of the 25600 requests, 7 completed successfully, while 25593 failed, as expected. We also saw many tcp level RSTs on the Wireshark dump - which must have been issued by the underlying operating system.


However, what's interesting to note is the difference - the RSTs occur immediately on receiving the SYN packet from the client - and are not established HTTP or TCP connections, but elegant "Connection Refused" errors - which would be what the client can expect. Thus the client can safely fail-over to another node without any doubt, overhead or delay.

Appendix : Supporting high concurrency in general

During testing we also saw that the Linux OS could detect the opening of many concurrent connections at the same time as a SYN flood attack, and then start using SYN cookies. You would see messages such as 
-->
Possible SYN flooding on port 9000. Sending cookies

displayed on the output of a "sudo dmesg", if this happens. Hence, for a real load, it would be better to disable SYN cookies by turning it off as follows as the root user
-->
# echo 0 > /proc/sys/net/ipv4/tcp_syncookies

To make the change persist over reboots, add the following line to your /etc/sysctl.conf
-->
net.ipv4.tcp_syncookies = 0

To allow the Linux OS to accept more connections, its also recommended that the 'net.core.somaxconn' be increased - as it usually defaults to 128 or so. This could be performed by the root user as follows,
-->
# echo 1024 > /proc/sys/net/core/somaxconn

To persist the change, append the following to the /etc/sysctl.conf
-->
net.core.somaxconn = 1024


Kudos!

The UltraESB could not have behaved gracefully without the support of the underlying Apache HttpComponents library, and the help and support received from that project community, especially by Oleg Kalnichevski - whose code and help has always fascinated me!



Friday, September 7, 2012


Thursday, August 23, 2012

First they ignore you, then they try to ridicule you, then they fight you, then you win

"First they ignore you, then they try to ridicule you, then they fight you, then you win" - is thought to have been said by Gandhi.


Tuesday, August 21, 2012

AdroitLogic Announces API Management Solution Based On Its High Performance ESB

We've just announced the APIDirector! an API and Services management solution based on our high performance Enterprise Service Bus UltraESB.


One of the key differences of the APIDirector is that it will offer both API and Services management features for enterprises, including support for AS2 and other legacy/traditional B2B and service protocols.

We've announced results of the 6th Round of ESB Performance Benchmarking earlier in August, although I've missed blogging about it previously, as my father was ill during those few days. The benchmark results showed the extreme performance as well as the stability of the UltraESB, which are both key elements of any API management solution.

Since an API management solution will be the entry point for your trading partners, customers and users accessing your exposed APIs - it MUST be capable of withstanding extreme load, as well as deliberate security attacks, without crashing by itself. The Round 6 results and the related information shows how some of the ESBs fail to withstand even legitimate and relatively small amounts of loads, when compared to an external attack. 

The APIDirector performs functions such as credential management, service logging, auditing and performance management support, with an easy to use graphical administration interface that also provides analytical features. We also ship the AS2Gateway as an optional module of the APIDirector, and this allows users to deploy a custom AS2 trading gateway - similar to the AS2Gateway we have hosted publicly.

Next week we will be deploying the AS2Director at one of our beta customer sites in the US, and it will initially deal with defining S/FTP file exchanges and SOAP based services, as well as AS2 based B2B exchanges. The APIDirector will be generally available to the public during the first quarter of 2013, although we will be happy to work with customers with beta releases prior to the general availability. Please contact us to learn more about the APIDirector, and how you could participate in the beta!

Original News Release: http://www.prweb.com/releases/prweb9820781.htm

Tuesday, July 31, 2012

Electronic Invoicing Announced on the AS2Gateway

We've just announced support for electronic invoicing on the cloud based Free B2B Trading Gateway AS2Gateway! This allows users to invoice trading partners via EDIFACT INVOIC messages over AS2, using a simple and intuitive web interface


Recurrent invoicing is simplified as any invoice can be saved as a template and re-used multiple times. The AS2Gateway converts all the invoice details into an EDIFACT D93A INVOIC message right now, and soon will support the other versions, as well as X12 based messages such as the 810.

In the near future, the platform will add support for parsing and generation of more message types, which will allow users to easily generate an invoice based on a purchase order received; or an advanced shipping notification (ASN) etc.

The AS2Gateway offers a free tier to support most SMEs that are required to electronically invoice trading partners for payment. For larger users, a premium tier is available with advanced options (to be announced shortly!), and for retailers or large corporations, the AS2Gateway is available for on-premise private deployment.