Deploying SolrCloud across multiple Data Centers (DC): Performance

  • -

Deploying SolrCloud across multiple Data Centers (DC): Performance

After deploying our search platform across multiple DCs deployment, we load tested the Search API.

We were not too impressed by the initial result.

We had issues like:
– high response time,
– high network traffic,
– long running queries.

After investigation, it turned out that a large amount of search result was being transferred between the SolrCloud nodes and the search API.

This is because clients were requesting a large number of documents.
It turned out that this was a business requirement and we could not put a cap on this.

HTTP compression to the rescue

Solr supports HTTP compression. This support is provided by the underlying Jetty Servlet Container.

To enable HTTP compression for Solr, two steps are required:

  1. Server Configuration

    To configure Solr 5 for HTTP compression, one needs to edit the file
    server/contexts/solr-jetty-context.xml by adding before the closing </config> the following XML snippet:

    
            org.eclipse.jetty.servlets.GzipFilter
            /*
            
                
                    
                
            
            
                mimetypes
                text/html,text/xml,text/plain,text/css,text/javascript,text/json,application/x-javascript,application/javascript,application/json,application/xml,application/xml+xhtml,image/svg+xml
            
            
                methods
                GET,POST
    
    

    The next step is to set the gzip header on the client.

  2. Client Configuration

    The SolrJ client needs to send the HTTP header Accept-Encoding: gzip, deflate to the server. Only then, will the server respond with compressed data.
    To achieve this, org.apache.solr.client.solrj.impl.HttpClientUtil utility class is being used:

    DefaultHttpClient httpClient = (DefaultHttpClient) cloudSolrClient.getLbClient().getHttpClient();
    HttpClientUtil.setAllowCompression(httpClient, true);
    HttpClientUtil.setMaxConnections(httpClient, maxTotalConnections);
    HttpClientUtil.setMaxConnectionsPerHost(httpClient, defaultMaxConnectionsPerRoute);
    HttpClientUtil.setSoTimeout(httpClient, readTimeout);
    HttpClientUtil.setConnectionTimeout(httpClient, connectTimeout);
    

    Note that in the code above we not only enable compression on the client, but we also set soTimeout and connectionTimeout on the client.

  3. The result

    1. Before enabling compression, we were doing in total in term of network traffic 12000KB/sec
    2. After changes, we dropped to 3000KB/s, that is serving just 25% of the original traffic, in other words, a drop of 75% of the network traffic!
    3. We have also seen a drop in response time by more than 60%!
    4. There is a price to pay for all of this: we have noticed a slight increase in CPU usage

Conclusion

However HTTP compression can be very beneficial when serving large response, it is not always the answer.
If possible, it’s better to serve small responses (for instance 10-40 items/pages).

In the next blog, I will share some of the challenges we have been facing.