What's In a Name

Juliet: What's in a name? That which we call a rose.
By any other name would smell as sweet.

Some of us remember the days when it was common for companies to have their own server farms. Servers were huddled together in temples, called Server Rooms, where nothing was left to chance: Everything was controlled: temperature, humidity, air filtering, power supply, fire extinction... Only the priests (called System Administrators), would normally be allowed to enter. Touching the holy of holies - the actual computer - required a special Cleansing ritual, getting rid of bad spirits (in the form of static electricity, cleansed by Grounding).

Since the beginning of modern computing, computers - which were then awe inspiring heaps of vacuum tubes and wires, were given most appropriate, awesome names: Colossus, ENIAC, Whirlwind, UNIVAC, WITCH, Pegasus, Golem, Electron... Each computer was one-of-a-kind, Like mythological gods. This was probably out of respect (or boredom).

In time, as computers became more accessible and popular, and as computer networking became more or less a standard, uniquely naming servers became a must. After all, if two computers want to 'talk' with each other, they can't both be named ENIAC. Some cold heart-ed bastards would still give their servers strictly functional names - frontend-1-us-east-8, file-server, etc., the rest of us, those with some sense of humor, would give their servers names like: Elvenpath, Yoda, Constitution, Dogbert, and even Lucy.

lucy_the_computer

In time, (useful) software started to appear, and the rest is history. But as everyone soon figured, running important applications which were dependent on single computers was problematic since computers sometimes fail. Hardware vendors were expected to reduce hardware faults in any way possible, including crazy things like oven-testing their hardware. But that wasn't enough. It became clear that preventing individual failure is not the right approach. The next line of thought was - Can we build something that would tolerate failure?

Sh*t happens. If you're lucky - it happens to you at least once a day.

Light-bulb moment (or maybe panic?). Redundancy!! Two power supplies! Two UPS systems! Two of everything!!

Back in the 1960's, Greg Pfister invented the concept of a Computer Cluster. Originally aimed at addressing cases where a single computer could not handle the load, but later, throughout the 70's and 80's, adapted for high availability, with Digital (may it rest in peace) VAX, the Tandem Himalayan (a circa 1994 high-availability product) and the IBM S/390 Parallel Sysplex.

It's 1988, all of a sudden. Time flies when you're reading someone else's blog. Three guys (David Patterson, Garth A. Gibson, and Randy Katz) presented a paper at the SIGMOD conference, titled "A Case for Redundant Arrays of Inexpensive Disks (RAID)": Let's put a lot of cheap hard disks and turn it into a virtual one, that can tolerate N failures. And RAID was born. Light and joy filled every living soul. Except for Hardware vendors ("did that guy just say inexpensive?!") which quickly rebounded - with 'RAID Edition' disks.

10 years later... you guessed right, it's 1998. A group of very smart people, (Diane Greene, Mendel Rosenblum, Scott Devine, Edward Wang and Edouard Bugnion) realized that this is a never ending loop. Hardware is hardware, and as anything physical - it's eventually going to break, and fixing a broken server can take time. They start VMWare.

Lucy is still a piece of (hopefully) cold Silicone, surrounded by wires, plastic, gold, other metal, a pinch of quartz... So Lucy can still fail. But with VMWare, if the vessel holding Lucy's spirit breaks, Lucy's spirit can move to another vessel! A stronger, better one. Good for you, Lucy!

Let it fail!

So by now it's clear that anything can fail. Software fails, hardware fails, heck - even Grand Temples of Cloud fail, and more frequently than they would like you to know.

Around early 2000's, software has started shifting from monolithic, single points of failure applications, to the Micro-service oriented, a-la the Let it Fail philosophy: Build it so that when (not if) something fails, something else takes its place, without disruption to the Service.

Lean Manufacturing, spear-headed by Toyota with TPS, and later adopted by the Agile software movement and the Manifesto of Agile Software Development, taught us to embrace change. The Let it Fail philosophy is teaching us to embrace failure.

Luckily, we're at a point in time where the stars are aligning just right. Enough real-life experience was gained by 'doers' (in the words of Nassim Nicholas Taleb), and a lot of concepts and technologies have started to surface and become mature enough, so that with some luck - if put together exactly right - could be the spark that starts a new revolution. A new kind of thinking, using Albert Einstein's words.

So what are these technologies and knowledge?

Micro-Services

Collective experience has taught us that large, monolithic, stateful services are hard to manage. So a (new? sounds familiar) software architecture called Microservices is taking hold. As Martin Flower and James Lewis put it, "The micro-service architectural style [1] is an approach for developing a single application as a suite of small services, each running in its own process and communicating with lightweight mechanisms, often an HTTP resource API".

Services, and true service oriented architectures, go extremely well with another technology that has recently started to mature - containerization. Docker is the poster child of this technology, and has gained tremendous traction and attention - and rightfully so.

Docker

In one sentence, it's a way to containerize processes, that is much more lightweight (smaller is faster design principle, big-time) than Virtual Machines, but at the cost of less isolation. IT'S NOT A VM, especially not from the security standpoint, but it's a very good fit for Microservices and statelessness. Docker is still young, but there are literally tens of thousands of very smart people around the world that spend significant amounts of time with this technology, and are coming up with clever things to do with it, on daily basis.

I previously said that it's much easier to manage small, well defined, stateless services, which brings me to a great example how real world, collective experience has resulted in an exceptionally clear approach to 'how things should be done'. This again reminds me of Nassim Nicholas Taleb's take on 'doers' vs. talkers in his book Antifragile: Things That Gain from Disorder (Incerto). The approach below just screams real-world, collective experience (including a lot of failures) of many people:

12 Factor Apps

Heroku (?) a PAAS provider, has released a blueprint of idealized modern-day cloud-enabled applications, called the 12-Factor:

The twelve-factor app is a methodology for building software-as-a-service apps that:

  • Use declarative formats for setup automation, to minimize time and cost for new developers joining the project;
  • Have a clean contract with the underlying operating system, offering maximum portability between execution environments;
  • Are suitable for deployment on modern cloud platforms, obviating the need for servers and systems administration;
  • Minimize divergence between development and production, enabling continuous deployment for maximum agility;
  • And can scale up without significant changes to tooling, architecture, or development practices.
In the 12-factor list of principles, principle VI: Processes states that you should "Execute the app as one or more stateless processes".

State is the root of all evil. That's what my Computer Science professor taught me, when I was struggling with Lisp. Unfortunately, that didn't quite sink at the time.. if(x) --> lambda (x,x) ?! wh..what?!

15 years later, I started playing with Docker and clustering, and I realize that life would be a whole lot better if an application (or service) had no state, so that I could stop it form running in one place (Lucy?) and start it on another (Yoda?). Sadly, although this is true in some cases, most applications are not 12-factor by a long shot. State is here, and it's never going away. But luckily - modern, specialized sate-ful services have been built with the possibility (and the more modern ones - with the high probability) of failure in mind: Cassandra, MongoDB, ElasticSearch, Redis, clustered file systems (CEPH), etc etc.

That allows software architects to design systems where data (state) is held within services that can handle failure (some more easily than others), which allows many of the other services to now be state-free, at least to some degree.

BTW: A system that relies on the existence and availability of a server called Lucy, actually has state - the name Lucy is state enough to make your whole system go down, when Lucy takes a nap. It's implicit state, and that's probably one of the worst.

Why are you boring us with these obvious points?

More and more companies are moving to the cloud. Moving what? Servers. They're moving servers. Instead of having a small temple at work - let's tear down our own private temple, and instead - make a monthly sacrifice to the Grand temple of Amazon (now estimated at about 45% of the market). They have a whole clan of Amazon priests that know how to take care of Servers. And we can still give them names!! Lucy is now in the Sky. Poetic!!!
So moving to the cloud changed very little, for most cloud customers: They still rely on servers, only now these servers are usually virtual, and running on someone else's infrastructure.

One day, and that day is coming soon - individual computers would no longer be important. They will be created, maintained, assigned work, and terminated - by software, composed of Microservices, running on clusters that span multiple 'clouds', and optimized in real-time. When that day comes, productivity will increase, availability will increase, and cost will go down. The priests of Amazon will still have an extremely important role, but the monthly tribute will be much smaller, and spread between a number of Cloud Temples. Using multiple clouds simultaneously will be the norm, applications and services will move from cloud to cloud as if there was no barrier, according to "constraints and policies as specified by the service description". That day will come, sooner than you think. That's what Multicloud is all about.

Servers are nothing. Services are everything

To address everything that practical Computer Science has taught us, a modern, 2015 architecture would try to break the application into small, ideally stateless micro-services, and place data / state in specialized services that can handle failure effectively and efficiently. Failure is not a question of 'if', it's a question of 'when'. Something needs to govern these clusters of servers and services.

PS. When you turn on your light switch, do you care or know where the Electrons that power your lightbulb come from?

Servers are the electrons that make your software light bulb shine.

Interesting reads:

A paper on Software Reliability by CMU. Reliability Engineering from Wikipedia