Monday, November 15, 2010

Why using REST will kill your project


REST is fun, but as with most things that are fun, it can also kill you.  First off, if you don't know what it is, and don't be ashamed because I missed the boat for several years after it came out, go to the wikipedia entry.  It seemed like a cool idea and I've designed several APIs in the REST paradigm.  However there is a dark side to REST, which can be deadly, and can kill, and in many cases has killed, projects big and small.  I'm going to use real examples here, but without the names of the companies or participants to protect the guilty.

Let's look at the two major paths software projects start down these days and where REST fits in.  We've got waterfall where you go and design your high-level architecture and APIs and components and let's say you decide on REST.  So you've got a big list of APIs with annotations and in the case of REST these are actual URLs you could call along with the JSON or XML data structures.  On the other hand if you're agile you're going to start out with a minimal set of REST APIs in the first iteration and then add more as you go on until you eventually get to the same place of a bunch of REST APIs with their JSON or XML backing.


To implement this you're going to have an app server jboss/django/rails/etc with code in Java, python, PHP, Ruby, whatever, some Apache, a proxy, a local MySQL/Postgres/Oracle/whatever DB, maybe a Cloud DB thrown in their, maybe your whole thing is running in the cloud.  It doesn't really matter for the purposes of this discussion, what matters is the resulting architecture driven by REST.  Here's an example diagram approximating an actual large-scale system I've worked on:



Everything looks nice and standard and "best practices" buzzword compliant future-proof, doesn't it?  Well in an enterprise system it's actually quite a bit more complicated, such as additional load balancers and rewrite engines and gateways and proxies and components, but this is the gist of it.  The important points to note are the multiple REST components and prevalence of proxies, because this is what will end up leading to our demise.

You see the root of the problem, I believe, is that REST is a heavyweight protocol.  At first when you are designing the APIs in waterfall you try to alleviate this by chunking requests, and having fat methods that return lots of data in one go.  In agile you don't notice it at first but when you start to scale you notice how slow everything is and how your traditional optimization strategies don't seem to work so well.  Basing an application on REST is like basing an assembly program on interrupts, as the early Macintosh I/O was, like putting every memory access over a relatively glacially paced network bus.  The simple act of an HTTP request/response, especially for anything complicated, is quite slow compared to on-box requests.  When you start multiplying components, and have REST requests which to complete must call other REST requests in a cascading tree, then the problem becomes quite troublesome.

There are standard approaches to remedy this situation, all of them ineffective in the long run.  You can try to add more boxes, virtualized or no, cloud or not, with smart load balancing.  Unfortunately this just reduces load but is not going to speedup your shortest path.  We don't even need to invoke Amdahl's Law here, but even if we did we still have enough non-parallelizable operations in enterprise software to complete a REST request that just adding components won't fix everything.  So we also try adding proxies, so we can cache requests, after all the best request is the one that is never made but just taken from an in-memory cache.  Sounds nice but soon you find this adds a whole new level of complexity to the system as you have lots of state-sensitive calls, such as authentication, authorization, purchasing, downloads, transactions, where you need to not cache the responses.  However just not caching these with no-cache denies you most of the benefit of using the proxy so you then have to design added complexity to figure out when you can rely on the cached copy and when you cannot.  So now it's another component, more synchronization, more complexity, more subtle bugs, and more slow-downs.  Then you find that the very approach you thought would save you becomes your downfall.

The core problem is architectural: using a high-latency high-overhead interface in internal APIs.  This is where REST fails and is doomed to fail by its very nature.  Complexity kills.  KISS: Keep It Simple, Stupid.

This is not to say that REST has no place in software design.  As an external-facing API it can be an excellent solution, especially in reducing the monstrosity that was SOAP and WSDL.  It is here that REST demonstrates its greatest strength of providing an intuitive, simple and easy way to access resources and perform operations over the web.

But keep REST away from your internal architecture, as it will in the end impale your beautiful design on the pike of reality.

22 comments:

  1. Uh... i dont think you understand what REST is meant to do...

    Its not meant to be an internal API it is meant to serve external requests.

    ReplyDelete
  2. Why would REST ever be used internally? Why would anyone even allow that to be created???? It sounds like you've been involved in projects with bad management that suffered from a lack of technological understanding.

    I find the best/most flexible implementation for REST API's in a 2 tiered fashion. Tier 1 is an internal function to be used as the internal API for other code, and tier 2 is the externally callable API that simply calls the internal function and outputs the result in the appropriate format. Problem solved.

    ReplyDelete
  3. I don't know about you guys who keep saying he is just doing it wrong (using it for internal arch.), but I'd venture to say that this problem is endemic right now. With technologies such as WCF (and *sigh*, cloud-computing) being drummed up as a de-facto standard for your data layer by Microsoft, this problem is only bound to get worse, and we see the same exact problems with the SOAP/WSDL flavor. Common sense goes out the window.

    From what I've seen so far, 90% of the times it is being used for internal stuff, it completely destroys performance, makes for awkward interaction with UI<->BL and effectively kills any opportunity to leverage OOP/reuse since this architecture becomes a self-imposed barrier between your layers (especially in static languages). And oh, did I mention that its a bloody mess to debug and that it is a bitch to scale? The upside to that is that apparently it looks /really/ good on powerpoint.

    ReplyDelete
  4. I've *never* seen it used as an internal resource like this--I've *only* seen it as a way to expose internal services, and the internal services are what's used internally. (With the possible exception of some very minor cross-platform traffic it didn't make sense to do any other way.)

    ReplyDelete
  5. I think REST is not really the problem -- the problem really is the distributed nature of your setup -- you will end up having the same problem whether you use REST or any other mechanism if you need to access info across a range of systems to complete a single action.

    The only thing I see is that if you didn't use REST, you might have architected your systems/software differently to account for performance -- but you likely can get the same benefit via REST -- via architecting your REST implementation carefully.

    ReplyDelete
    Replies
    1. Sounds like a reasonable point, assuming you have control over all of the components which are required, that none needs-be wrap services of external components, and that none of the above enforced RESTful APIs exclusively.

      So I agree it's fair to say REST isn't the problem necessarily, but rather the lack of localised API alternatives... except for augmented services from remote servers (like maps, for instance)... then I guess you'd benefit from a richer-per-HTTP-transmit protocol other than REST. So wait... I guess REST really is the problem. ;)

      Delete
    2. But seriously. Any protocol can become a problem where the cost of transmission bloats beyond necessary tolerances. Observing there's an alternative HTTP based protocol, perhaps allowing for more efficient (batched) transmission of compound types isn't necessarily saying that is the solution, but it is fair to say that it wouldn't have the same degree of risk if it were to "blow out". SOAP is more architecturally elastic, yes. But it's also far uglier too, and not necessarily the write answer to the issue of sudden performance degradation due the architectural issues described. Having said, it does serve as a reminder to be vigilant about where REST is employed.

      Delete
  6. What you're describing isn't REST-specific: it's a general performance problem that occurs when you try to replace functions that used to be computed locally with a service-oriented architecture.

    You could actually make the case that using REST will kill your project _less_ than other, more heavyweight SOA protocols, because the transactions can be made smaller and faster. That's not a given, because fanatical adhesion to the principles of REST may result in you making additional transactions or sending redundant data over the wire. But at least you're spared the agony of wrapping and unwrapping SOAP messages with all the XML munging that that entails.

    ReplyDelete
  7. It's an inflammatory title obviously, but REST isn't the issue here. The latency and optimisation impact is a consequence of moving to an SOA/distributed architecture. It sounds like the separation of services hasn't been well thought out. That has nothing to do with REST. A better title would be "why using SOA will kill your project".

    ReplyDelete
  8. Yes, this has lots of problems.

    1. REST may be used internally, it is needed. Internally doesn't mean the solution may not need distributed datum (or a large hypermedia distributed system), but you have to take into account any non-coupled communication will slow down things, mostly if you have transformations around.

    2. REST !=Just HTTP. Here I see one of the most common errors. There is a confusion. REST is NOT A PROTOCOL, on the first hand. And it is particularly NOT HTTP. That means, REST is a style that has much, much more that just the Uniform Interface, that is where using Hypermedia, representations and all that stuff falls in. It also has the client/server considerations, the layering system (where all the other parts mentioned, like balancers, fall in), etc. That is, REST is a style for a full architecture, not for a component. And that is another point

    3.REST != Component. No. It isn't. You may have a subsystem (component) that has an internal architecture using REST, with a high speed API to connect to other components, fine. But any distributed system should have its communications FAST, and services are not the answer. You can read here (http://wmp-archi.blogspot.com/2010/10/distributed-integrated-or-networked.html) the distinction.

    4. REST = Dist. Datum. As I mention, REST is for large hypermedia distributed systems. It is not for High processing or even not for high control systems. It is Datum oriented, and that means it should be used when datum transfer is a priority between uncontrolled nodes (that is, many other parties can add nodes and remove them). REST is for networked systems, not for distributed or integrated ones. If you want distribution (one whole system spli into little parts) use coupling. If you want integration, use services (not necessarily REST, as REST is not made for services), and if you want a networked system based on Datum, use REST.

    We need an architectural oriented guide for REST. It is really needed.

    ReplyDelete
  9. "We need an architectural oriented guide for REST. It is really needed."

    Excellent idea. If you have any articles you're working on or aware of along this path, please let me know.

    ReplyDelete
  10. There are quite a few good talks on infoq.com on REST. The talks and articles by Stefan Tilkov and Jim Webber help to understand certain misconceptions about REST including a lot of the points William Martinez Pomares mentioned.

    ReplyDelete
  11. There is a chapter in the book, "Beautiful Architecture" (http://goo.gl/YkLlO) that talks about REST as an application architecture method. It's Chapter 5.

    ReplyDelete
  12. You've described perfectly the problems of remote procedure calls, however you are laying all the problems under the REST heading, and only mentioning in passing at the end that the real problem is calling remote procedures in the same manner as local ones.

    Incidentally, this is a problem that is explored in quite some detail in Martin Fowlers 'Patterns of Enterprise Application Architecture' - a book that is incidentally 8 years old and something you really should read if you intend to do any form of enterprise application programming ;)

    ReplyDelete
  13. REST is neither good nor bad 100% of the time however I can guarantee that picking an inappropriate technology will be bad all of the time. This is where enterprise / technical architects can earn their salt, you must have an understanding of hardware and software in the architecture to make complex solutions work.

    I tend to argue however that HTTP is fairly lightweight and can be used where some latency can be tolerated but of course it's not as lightweight as raw TCP or sockets but REST is not HTTP and you can avoid some of these overheads if you want with a little extra work.

    You should also setup your architecture carefully, rewriting requests usually involves lost of string parsing and regex this is itself hugely resource intensive, only rewrite what you need exclude the REST calls from that parsing.

    As suggested by others on here crossing any boundary is costly in terms of performance, a boundary could mean a web / REST service, server, processor, processor core and before than you even cared about swapping stuff in and out of registers.

    I'm not saying we need to think at the level of processor cores all the time you just need to understand the problem domain and pick the appropriate solution.

    ReplyDelete
  14. I tend to find that REST works best as an interface to a complex sub-system/resource, which is general purpose and thus ought not to be bound to your main application. It is somewhat easy to get carried away when you adopt new ideas. This is why I take issue with the ideas of Agile. Agile developers end up picking the immediately easiest route, but fail in the long run.

    ReplyDelete
  15. Sorry for the huge review, but I'm really loving the new Zune, and hope this, as well as the excellent reviews some other people have written, will help you decide if it's the right choice for you.

    ReplyDelete
  16. This blog is written very popular, too creative too ideas. I strongly support your point of view, very much agree with your views; I always wanted to understand the comprehensive features. Thank you so much

    ReplyDelete