Monday, January 15, 2007

Web Request Quality Of Service

Recently a query in BEA newsgroup prompted me to think about this simple looking question.

How to achieve quality of the service in web requests by managing its response time? Here meaning of the QOS is broad and simple. It has two basic requirements.

1) Response time more than X seconds is not acceptable.

2) Return some meaningful error message and free up associated resources, in case request can not complete execution in X seconds.

I think there are three aspects of QOS issue, 1) Application’s performance management 2) Timeouts and thresholds configuration of participating resources 3) Return error message by graceful handling of the timed out requests and unresponsive resources.

1) Application's performance management.

Following two are well known pre-requisites to achieve consistent quality of service for web requests.

a) Application deployed with capacity enough to achieve target QOS/performance

b) Well designed, implemented, tuned and tested application as per target QOS/performance

As BEA documents have lots of details on capacity planning and performance tuning, I am not going to discuss in more detail.

2) Participating resources- Timeouts and thresholds configuration.

Typically web request passes through load-balancers, fire-walls, web servers, proxy plug-in, application server subsystems (i.e. servlet engine, EJB container, JDBC pool, JTA, JMS server etc), database and other external servers, i.e. report server.

It is important to review each participating resources and configure proper timeouts as per application’s performance benchmarks and over all QOS requirements.

Listing of the common time outs and threshold parameters.

WebLogic-Proxy-plug-in : WLSocketTimeoutSecs, ConnectTimeoutSecs
WebLogic-Core: Maximum open sockets, Accept Back Log, Execute Thread Queue Length, Socket Reader Threads, Stuck Thread Max Time, HTTP Message Timeout
WebLogic -JTA: Timeout Seconds, Abandon Timeout Seconds, Max Transaction
WebLogic-JDBC:Inactive Connection Timeout, Maximum Waiting For Connection, Statement Timeout
Outbound HTTP Connections: weblogic.net.http.HttpURLConnection.setReadTimeout()
org.apache.commons.httpclient.HttpConnection.setSoTimeout()

3) Return error message by graceful handling of the timed out requests and unresponsive resources.

Once you figured out threshold and timeouts configuration of each subsystem and resource, next area of the focus is, How to handle time out and resource overflow situations for meaningful error message delivery (to upstream clients) ?

Depending on type of resources and interfaces solution can be simple to complex.

For example resources with Java API, may return exception in time out situations. Proper exception handling allows to return appropriate message back to client. Examples of such resources are JDBC statement, HttpURLConnection (out bound http connection) or JTA resources (transaction time out).

Under certain situations java interfaces of external resources (i.e JDBC Driver, outbound HTTP requests) fails to deliver time out responses due to anomaly in network or external resources. It can result in stuck threads, if request hangs for more than configured Stuck Thread Max Time”. This behavior can lead to execute thread queue exhaustion and request queue build up. It is possible to generate monitoring alerts, however it is not possible to kill individual stuck threads or requests.

In WebLogic 9.2, work manager designed to reject requests once it reached to its request capacity (queued and executing requests). Based on constraint configuration, certain number of stuck threads can also trigger shutdown of the work manager. This allows graceful handling of the subsequent incoming requests.

AbstractAsyncServlet can also help to handle unresponsive requests to return error message in response. It does that by decoupling response from incoming requests. I wrote separate blog entry for “Asynchronous Servlet”, which describes in more detail.

Proxy web server also offers an opportunity to return error page to client, in event of the proxy plug-in timeouts. i.e. WLSocketTimeoutSecs, ConnectTimeoutSecs

Other front end infrastructure component (i.e. firewalls, load balancer) and O/S socket resources are generally shared between multiple applications in enterprise environment. It is not practical to tune shared resources beyond certain limit to meet application specific QOS requirements.