Monday, November 15, 2010

Why using REST will kill your project


REST is fun, but as with most things that are fun, it can also kill you.  First off, if you don't know what it is, and don't be ashamed because I missed the boat for several years after it came out, go to the wikipedia entry.  It seemed like a cool idea and I've designed several APIs in the REST paradigm.  However there is a dark side to REST, which can be deadly, and can kill, and in many cases has killed, projects big and small.  I'm going to use real examples here, but without the names of the companies or participants to protect the guilty.

Let's look at the two major paths software projects start down these days and where REST fits in.  We've got waterfall where you go and design your high-level architecture and APIs and components and let's say you decide on REST.  So you've got a big list of APIs with annotations and in the case of REST these are actual URLs you could call along with the JSON or XML data structures.  On the other hand if you're agile you're going to start out with a minimal set of REST APIs in the first iteration and then add more as you go on until you eventually get to the same place of a bunch of REST APIs with their JSON or XML backing.


To implement this you're going to have an app server jboss/django/rails/etc with code in Java, python, PHP, Ruby, whatever, some Apache, a proxy, a local MySQL/Postgres/Oracle/whatever DB, maybe a Cloud DB thrown in their, maybe your whole thing is running in the cloud.  It doesn't really matter for the purposes of this discussion, what matters is the resulting architecture driven by REST.  Here's an example diagram approximating an actual large-scale system I've worked on:



Everything looks nice and standard and "best practices" buzzword compliant future-proof, doesn't it?  Well in an enterprise system it's actually quite a bit more complicated, such as additional load balancers and rewrite engines and gateways and proxies and components, but this is the gist of it.  The important points to note are the multiple REST components and prevalence of proxies, because this is what will end up leading to our demise.

You see the root of the problem, I believe, is that REST is a heavyweight protocol.  At first when you are designing the APIs in waterfall you try to alleviate this by chunking requests, and having fat methods that return lots of data in one go.  In agile you don't notice it at first but when you start to scale you notice how slow everything is and how your traditional optimization strategies don't seem to work so well.  Basing an application on REST is like basing an assembly program on interrupts, as the early Macintosh I/O was, like putting every memory access over a relatively glacially paced network bus.  The simple act of an HTTP request/response, especially for anything complicated, is quite slow compared to on-box requests.  When you start multiplying components, and have REST requests which to complete must call other REST requests in a cascading tree, then the problem becomes quite troublesome.

There are standard approaches to remedy this situation, all of them ineffective in the long run.  You can try to add more boxes, virtualized or no, cloud or not, with smart load balancing.  Unfortunately this just reduces load but is not going to speedup your shortest path.  We don't even need to invoke Amdahl's Law here, but even if we did we still have enough non-parallelizable operations in enterprise software to complete a REST request that just adding components won't fix everything.  So we also try adding proxies, so we can cache requests, after all the best request is the one that is never made but just taken from an in-memory cache.  Sounds nice but soon you find this adds a whole new level of complexity to the system as you have lots of state-sensitive calls, such as authentication, authorization, purchasing, downloads, transactions, where you need to not cache the responses.  However just not caching these with no-cache denies you most of the benefit of using the proxy so you then have to design added complexity to figure out when you can rely on the cached copy and when you cannot.  So now it's another component, more synchronization, more complexity, more subtle bugs, and more slow-downs.  Then you find that the very approach you thought would save you becomes your downfall.

The core problem is architectural: using a high-latency high-overhead interface in internal APIs.  This is where REST fails and is doomed to fail by its very nature.  Complexity kills.  KISS: Keep It Simple, Stupid.

This is not to say that REST has no place in software design.  As an external-facing API it can be an excellent solution, especially in reducing the monstrosity that was SOAP and WSDL.  It is here that REST demonstrates its greatest strength of providing an intuitive, simple and easy way to access resources and perform operations over the web.

But keep REST away from your internal architecture, as it will in the end impale your beautiful design on the pike of reality.