Panic in the Data Center!: Design Web Apps That Don't Hurt to Scale

Let's look at some design concepts regarding ease of scalability. It's a dense subject that cannot be completely addressed in generalities, so we'll personalize it a bit:

We join our protagonist, designer of the newest killer app to hit the Web, three weeks before the grand opening. Budgets have been tight, so the development team has been getting by with a single development server; fortunately, QA went smoothly. The team has just built the production cluster, and the CEO wants to be the first person to try it out.

This is the kind of scenario that could result in either a successful launch or layoffs. Unfortunately, there is nothing our protagonist can do to influence his fate at this point. The result of this demo is determined by decisions that were (or weren't) made in the design phase of the application and the team's fidelity in adhering to those architectural decisions.

The fundamental challenge of building a web application to work in a clustered configuration is that the runtime environment is separated into multiple distinct instances of the application, each with a separate memory space. A thread running in one instance has no way of accessing in-memory data residing in another instance.

There are four basic approaches to dealing with this problem which can be used in combination:

Ensure all threads that require specific data always run within the application instance which houses it. This is a method of managing load rather than managing data, and provides no fault tolerance. Each application instance is unaware of, and has no interaction with, other instances. Instead, load is routed through some external method to the instance housing the required in-memory data. In the context of web applications, this approach depends on an external load-balancer to keep track of which user sessions are affiliated with which application instances.

Build remote memory access methods to provide mutual data access between application instances. This is the approach used in Beowulf clusters and other grid computing engines. Implementations of remote memory access are usually done through some sort of messaging interface, although there are more direct methods, such as RMI calls in Java. Because each application instance houses a distinct set of in-memory data, this method does not provide fault tolerance. On the upside, data is only represented in a single application instance; this is great for applications which require large amounts of in-memory data. On the downside, the application must have some way of determining which application instance holds the data it needs. Overall, this approach is best suited for data-intensive rather than real-time applications.

Replicate shared data between application instances so that it is always accessible to all threads. When data is created or modified in one application instance, the updates are communicated to all application instances. This approach is best suited for small amounts of relatively static data that is unlikely to be used simultaneously by multiple application instances since most data locking mechanisms are impractical with the latency inherent in network connections. One of the greatest benefits of this method is that, because all data exists everywhere, it does provide fault tolerance.

Persist shared data in an external data store. In short: let the database manage the data. Each application instance is unaware of the others and relies entirely on external data storage to house data instead of in-memory storage. Data storage and retrieval incurs the additional cost of a database call, but because the data is centrally managed, this approach handles concurrent data access quite well.

As the CEO pulls types the URL, our friend reviews the architectural decisions made months ago. The presentation (user interface) layer is designed to maintain active user sessions, persisting session information in-memory; if a node drops, users with state information on that server will be kicked out, but they can log back in with ease. In contrast, the business logic layer, where all of the core processing is handled, is completely stateless, storing all persistent data in a central database; fault tolerance is seamless.

This design requires not one, but two levels of load balancing with different configurations:

The presentation layer of this web application is designed to assume that users with active sessions will always be served by the same application instance, so the load balancer must be capable of distinguishing one user from another. A common way of implementing "sticky sessions" is for the web application to assign each user session a unique identifier which is passed to the web browser in the form of a cookie. The cookie is then passed along with every request, which the load balancer can use to choose the correct server to handle the request. Since the load balancer must be able to understand the content of HTTP requests, it must be of the type labeled "application layer" or "layer 7" capable.

The business logic layer is completely dynamic, requiring no intelligent routing of requests. This kind of load balancing does not require any of the extra features provided by an "application layer" load balancer.

While the CEO is checking out all of the bells and whistles of the application, smiling the whole time, the marketing director bursts into the room. "Our product launch announcement has just been Slashdotted! It looks like we're going to easily double our traffic estimates on the first day.

Neither the presentation layer nor the business logic layer has a limitation on the number of parallel instances since there is no inter-instance data replication; if additional application instances are required, they can simply be added into the load balancing pools.

Scalability of this web application is effectively limited only by the capacity of the database server. Most commercially available database server products have some clustering capability, although the license cost is usually staggering. Budget-constrained projects often opt for a single high-capacity server and a really solid backup strategy. Naturally, there are some risks associated with the frugal approach, so it is wise to put some serious thought into the decision.

"I don't see a problem with that," is our friend's cool reply, "all we need to do is double up on the application server hardware and licenses. Where's the checkbook?"

Panic in the Data Center!

Saturday, June 21, 2008

Design Web Apps That Don't Hurt to Scale

No comments:

About the Author

Themes

Professional Associations

Blog Roll

Labels