Java REST API Benchmark: Tomcat vs Jetty vs Grizzly vs Undertow

  • 5

Java REST API Benchmark: Tomcat vs Jetty vs Grizzly vs Undertow

This is early 2016 and over and over again the question arises as to what Java web container to use, especially with the rise of micro-services where containers are  being embedded into the application.

Recently, we have been facing the very same question. Should we go with:

  1. Jetty, well known for its performance, speed and stability?
  2. Grizzly, which is embedded by default into Jersey?
  3. Tomcat, the de-facto standard web container?
  4. Undertow, the new kid in the block, prised for it’s simplicity, modularity and performance?

Our use case is mainly about delivering Java REST APIs using JAX-RS.
Since we were already using Spring, we were also looking into leverage frameworks such as Spring Boot.

Spring boot out of the box supports Tomcat, Jetty and Undertow.

This post discusses about which web container to use when it comes to delivering fast, reliable and highly available JAX-RS REST API.

For this article, Jersey is being used as the implementation.
We are comparing 4 of the most popular containers:

  1. Tomcat(8.0.30),
  2. Jetty(9.2.14),
  3. Grizzly(2.22.1) and
  4. Undertow(1.3.10.FINAL).

The implemented API is returning a very simple constant Json response …. no extra processing involved.

The code has been kept deliberately very simple. The very same API code is executed on all containers.

For more detail about the code, please look at the link in the resource section.
We ran the load test using ApacheBench with concurrency level=1, 4, 16, 64 and 128
the results in term of fastest or slowest container does not change no matter the concurrency level
so, here, I am publishing only concurrent users=1 and concurrent users=128

System Specification

This benchmark has been executed on my laptop:

 

 

Concurrent number of Users = 1

response-time-10-million-request-1-user

Response Time for 10 million requests for 1 concurrent user

throughoutput-10-million-request-1-user

Through-output for 10 million requests for 1 concurrent user

 

from the 2 graphs above, Grizzly is leading the benchmark followed by Jetty, followed by Undertow. Tomcat remains the last in this benchmark

Concurrent number of Users = 128

 

response-time-10-million-request-128-user

Response Time for 10 million requests and 128 concurrent users

throughoutput-10-million-request-128-user

Through-output for 10 millions requests and 128 concurrent users

Note that Grizzly is still leading here and that concurrency level =128 did not change anything to which server is best or worst.

Note that we have also tested for concurrency level =4, 16 and  64 and the final result is pretty much the same

Conclusion

For this benchmark, a very simple Jersey REST API implementation is being used.

Grizzly seems to give us the very best through-output and response time no matter the concurrency level.

in this test, I have been using the default web container settings.
And as we all know, no one put a container into production with its default settings.

in the next blog post, I will change the server configuration and rerun the very same tests

Resources

The source code is available on GitHub


5 Comments

Konstantinos Vandikas

August 4, 2016at 1:40 pm

Great article! On github you mention that you are using “worker-thread 16*numberOfCores” as suggested by the Undertow people. I was wondering if they gave more details behind this recommedation.

Thanks

Konstantinos Vandikas

August 5, 2016at 9:18 am

Nevermind, please ignore my question – it’s the basic rule of thumb between the boss thread and the worker thread going on.

Thank you for writing this post

Klaus Grønbæk

December 1, 2016at 12:18 pm

Fine article but you are missing a couple couple of important nuances.

Most of the time, the bulk of time (and CPU) spent processing a request is not spent in the servlet container code, it is spent inside your code. In our system, building even the simplest DTO for a REST response take 0.14 ms, when you start adding data to the model it may take 1ms (pure CPU time).
If you look at results for 1 concurrent request, it looks like Tomcat is 50% slower than Grizzly which seems enormous; but in a realistic scenario, where your code uses 1ms to build the DTO, the result become quite different: (1.12-1.08 +1.0) / 1.08 = 3.7%, if you spent 2ms creating the DTO you are down to a mere 1.9% difference.
In the greater scheme of things I doubt you will be able to measure any real difference between these 4 servlet containers, if you are testing it on a realistic scenario. So I would use the one with the best documentation, best forum/user-group, special features you need.

Also, it make very little sense to me to run this test with more concurrent requests than you have cpu cores. In a servlet container each request is processed by a thread, and each thread is basically independant. Assuming the client is using keep-alive, the only synchronisation between requests is obtaining the thread that will process it, this means scalability will be near linear until you reach a bottleneck, which in this case will be CPU. Once you reach the bottleneck maximum throughput is reached, and if you add more concurrent request you will just get longer response times. This means you end up testing the JVMs/OS ability to context switch. Running a test with a high number of concurrent users only make sense if you add think-time between requests. If you really need to handle a continous load of 128 parallel requests and response time is important, you would need 32 4-core servers (assuming there is no IO, if you have 50% IO time you only need 16 4-core servers).

Also, if you have 128 continuous parallel requests, a servlet container is probably not the right tool for the job, as it can only use NIO for part of the request processing, since it eventually needs a thread for processing the request (you can limit the time spent on the thread by using the asynchronous paradigm introduced in the servlet 3.0 spec, but this just moves the problem somewhere else). Instead you want to use a full NIO server like Netty, or Netflix’s Ratpack (built on top of Netty),. In Netty you only have one processing thread per cpu-core, which greatly limits the context switching, however Netty does not implement the servlet spec, so I only recommend it if you have a problem where you know a servlet container will not be a good fix.

    ayush

    March 3, 2017at 10:34 am

    Klaus,
    Thanks for greatexplanation.
    I wanted to know a bit detail as NIO is supported by all the container mentioned above, so you only Netty use Full NIO and other uses only the part of request for NIO ?
    Also from you explanation above ,if we have more NIO call in each request then we should go with Netty server , if we have more concurrent user but less NIO then we can go for Tomcat/Jetty/Undertow.

Amit Pawar

August 28, 2017at 9:30 am

Hi,

I need to test the performance of Grizzly NIO. Can you suggest how to do it so? I don’t find any reference document to understand how to tune it for better performance.

Leave a Reply