Apr 13, 2015 - Current State of Microservices

Microservices are definitely a paradigm shift in software-architectural thinking. It didn’t happen over night; It stands on the shoulders of all the knowledge and practical experience that was gained over the last few decades, around software architecture, design and operations.

Although Compartmentalization is as old and time-tested as it goes (and obviously not just in Software), it’s completely green-fields with regard to implementation in the form of Microservices. Luckily we’re at a time where the idea can be realized in production, without requiring a significant change in the way software is developed. Docker is a great example of that, especially in the sense that it made Compartmentalization accessible to the public. Maybe that’s what was missing in previous attempts. There’s no cook-book on how to deploy Microservices at-scale, or in production, or ‘for-real'. People like us, all over the world are trying to figure this out. All this is spawning immense growth, which we’re all a part of.

Let me tell you, there will be some nice scars to show in a few years, and some ideas and technologies won’t make it. This is natural, and it’s exactly what happened with Computer Networking. Who remembers Token Ring? It’s the fact that many ideas exist, and some will fail that makes the whole field more robust, or Anti-Fragile (thanks Nicolas Nassim Taeb).

It’s absolutely logical to re-examine everything we know at a time like this, and try to push the boundaries of the new concepts and technologies to their limit. For example, how should Microservices discover each other? how should they communicate?

Personally, and I may be wrong, when I think about production and scale, I think about stability. And when it comes to stability, old and time-tested things tend to work well. These ideas have survived for a reason. Networking technology is old and time tested. DNS, Proxy technologies, IP, DCHP, TCP etc - are very mature, well documented and defined. I’m not ready to throw them out just yet. Using existing old, time-tested Networking technology (maybe with a new twist, like Weave) to solve new (or old) problems, seems to me like a very promising path forward.

Time will tell.

(Y)

Jan 12, 2015 - Servers are nothing. Services are everything

What's In a Name

Juliet: What's in a name? That which we call a rose.
By any other name would smell as sweet.

Some of us remember the days when it was common for companies to have their own server farms. Servers were huddled together in temples, called Server Rooms, where nothing was left to chance: Everything was controlled: temperature, humidity, air filtering, power supply, fire extinction... Only the priests (called System Administrators), would normally be allowed to enter. Touching the holy of holies - the actual computer - required a special Cleansing ritual, getting rid of bad spirits (in the form of static electricity, cleansed by Grounding).

Since the beginning of modern computing, computers - which were then awe inspiring heaps of vacuum tubes and wires, were given most appropriate, awesome names: Colossus, ENIAC, Whirlwind, UNIVAC, WITCH, Pegasus, Golem, Electron... Each computer was one-of-a-kind, Like mythological gods. This was probably out of respect (or boredom).

In time, as computers became more accessible and popular, and as computer networking became more or less a standard, uniquely naming servers became a must. After all, if two computers want to 'talk' with each other, they can't both be named ENIAC. Some cold heart-ed bastards would still give their servers strictly functional names - frontend-1-us-east-8, file-server, etc., the rest of us, those with some sense of humor, would give their servers names like: Elvenpath, Yoda, Constitution, Dogbert, and even Lucy.

lucy_the_computer

In time, (useful) software started to appear, and the rest is history. But as everyone soon figured, running important applications which were dependent on single computers was problematic since computers sometimes fail. Hardware vendors were expected to reduce hardware faults in any way possible, including crazy things like oven-testing their hardware. But that wasn't enough. It became clear that preventing individual failure is not the right approach. The next line of thought was - Can we build something that would tolerate failure?

Sh*t happens. If you're lucky - it happens to you at least once a day.

Light-bulb moment (or maybe panic?). Redundancy!! Two power supplies! Two UPS systems! Two of everything!!

Back in the 1960's, Greg Pfister invented the concept of a Computer Cluster. Originally aimed at addressing cases where a single computer could not handle the load, but later, throughout the 70's and 80's, adapted for high availability, with Digital (may it rest in peace) VAX, the Tandem Himalayan (a circa 1994 high-availability product) and the IBM S/390 Parallel Sysplex.

It's 1988, all of a sudden. Time flies when you're reading someone else's blog. Three guys (David Patterson, Garth A. Gibson, and Randy Katz) presented a paper at the SIGMOD conference, titled "A Case for Redundant Arrays of Inexpensive Disks (RAID)": Let's put a lot of cheap hard disks and turn it into a virtual one, that can tolerate N failures. And RAID was born. Light and joy filled every living soul. Except for Hardware vendors ("did that guy just say inexpensive?!") which quickly rebounded - with 'RAID Edition' disks.

10 years later... you guessed right, it's 1998. A group of very smart people, (Diane Greene, Mendel Rosenblum, Scott Devine, Edward Wang and Edouard Bugnion) realized that this is a never ending loop. Hardware is hardware, and as anything physical - it's eventually going to break, and fixing a broken server can take time. They start VMWare.

Lucy is still a piece of (hopefully) cold Silicone, surrounded by wires, plastic, gold, other metal, a pinch of quartz... So Lucy can still fail. But with VMWare, if the vessel holding Lucy's spirit breaks, Lucy's spirit can move to another vessel! A stronger, better one. Good for you, Lucy!

Let it fail!

So by now it's clear that anything can fail. Software fails, hardware fails, heck - even Grand Temples of Cloud fail, and more frequently than they would like you to know.

Around early 2000's, software has started shifting from monolithic, single points of failure applications, to the Micro-service oriented, a-la the Let it Fail philosophy: Build it so that when (not if) something fails, something else takes its place, without disruption to the Service.

Lean Manufacturing, spear-headed by Toyota with TPS, and later adopted by the Agile software movement and the Manifesto of Agile Software Development, taught us to embrace change. The Let it Fail philosophy is teaching us to embrace failure.

Luckily, we're at a point in time where the stars are aligning just right. Enough real-life experience was gained by 'doers' (in the words of Nassim Nicholas Taleb), and a lot of concepts and technologies have started to surface and become mature enough, so that with some luck - if put together exactly right - could be the spark that starts a new revolution. A new kind of thinking, using Albert Einstein's words.

So what are these technologies and knowledge?

Micro-Services

Collective experience has taught us that large, monolithic, stateful services are hard to manage. So a (new? sounds familiar) software architecture called Microservices is taking hold. As Martin Flower and James Lewis put it, "The micro-service architectural style [1] is an approach for developing a single application as a suite of small services, each running in its own process and communicating with lightweight mechanisms, often an HTTP resource API".

Services, and true service oriented architectures, go extremely well with another technology that has recently started to mature - containerization. Docker is the poster child of this technology, and has gained tremendous traction and attention - and rightfully so.

Docker

In one sentence, it's a way to containerize processes, that is much more lightweight (smaller is faster design principle, big-time) than Virtual Machines, but at the cost of less isolation. IT'S NOT A VM, especially not from the security standpoint, but it's a very good fit for Microservices and statelessness. Docker is still young, but there are literally tens of thousands of very smart people around the world that spend significant amounts of time with this technology, and are coming up with clever things to do with it, on daily basis.

I previously said that it's much easier to manage small, well defined, stateless services, which brings me to a great example how real world, collective experience has resulted in an exceptionally clear approach to 'how things should be done'. This again reminds me of Nassim Nicholas Taleb's take on 'doers' vs. talkers in his book Antifragile: Things That Gain from Disorder (Incerto). The approach below just screams real-world, collective experience (including a lot of failures) of many people:

12 Factor Apps

Heroku (?) a PAAS provider, has released a blueprint of idealized modern-day cloud-enabled applications, called the 12-Factor:

The twelve-factor app is a methodology for building software-as-a-service apps that:

  • Use declarative formats for setup automation, to minimize time and cost for new developers joining the project;
  • Have a clean contract with the underlying operating system, offering maximum portability between execution environments;
  • Are suitable for deployment on modern cloud platforms, obviating the need for servers and systems administration;
  • Minimize divergence between development and production, enabling continuous deployment for maximum agility;
  • And can scale up without significant changes to tooling, architecture, or development practices.
In the 12-factor list of principles, principle VI: Processes states that you should "Execute the app as one or more stateless processes".

State is the root of all evil. That's what my Computer Science professor taught me, when I was struggling with Lisp. Unfortunately, that didn't quite sink at the time.. if(x) --> lambda (x,x) ?! wh..what?!

15 years later, I started playing with Docker and clustering, and I realize that life would be a whole lot better if an application (or service) had no state, so that I could stop it form running in one place (Lucy?) and start it on another (Yoda?). Sadly, although this is true in some cases, most applications are not 12-factor by a long shot. State is here, and it's never going away. But luckily - modern, specialized sate-ful services have been built with the possibility (and the more modern ones - with the high probability) of failure in mind: Cassandra, MongoDB, ElasticSearch, Redis, clustered file systems (CEPH), etc etc.

That allows software architects to design systems where data (state) is held within services that can handle failure (some more easily than others), which allows many of the other services to now be state-free, at least to some degree.

BTW: A system that relies on the existence and availability of a server called Lucy, actually has state - the name Lucy is state enough to make your whole system go down, when Lucy takes a nap. It's implicit state, and that's probably one of the worst.

Why are you boring us with these obvious points?

More and more companies are moving to the cloud. Moving what? Servers. They're moving servers. Instead of having a small temple at work - let's tear down our own private temple, and instead - make a monthly sacrifice to the Grand temple of Amazon (now estimated at about 45% of the market). They have a whole clan of Amazon priests that know how to take care of Servers. And we can still give them names!! Lucy is now in the Sky. Poetic!!!
So moving to the cloud changed very little, for most cloud customers: They still rely on servers, only now these servers are usually virtual, and running on someone else's infrastructure.

One day, and that day is coming soon - individual computers would no longer be important. They will be created, maintained, assigned work, and terminated - by software, composed of Microservices, running on clusters that span multiple 'clouds', and optimized in real-time. When that day comes, productivity will increase, availability will increase, and cost will go down. The priests of Amazon will still have an extremely important role, but the monthly tribute will be much smaller, and spread between a number of Cloud Temples. Using multiple clouds simultaneously will be the norm, applications and services will move from cloud to cloud as if there was no barrier, according to "constraints and policies as specified by the service description". That day will come, sooner than you think. That's what Multicloud is all about.

Servers are nothing. Services are everything

To address everything that practical Computer Science has taught us, a modern, 2015 architecture would try to break the application into small, ideally stateless micro-services, and place data / state in specialized services that can handle failure effectively and efficiently. Failure is not a question of 'if', it's a question of 'when'. Something needs to govern these clusters of servers and services.

PS. When you turn on your light switch, do you care or know where the Electrons that power your lightbulb come from?

Servers are the electrons that make your software light bulb shine.

Interesting reads:

A paper on Software Reliability by CMU. Reliability Engineering from Wikipedia

Nov 30, 2014 - Necessity is the mother of invention

My previous startup, WhatsOn, which created personalized, curated social feeds for media companies, required significant computational resources. Being a bootstrapping entrepreneur, and short on cash - my goal was to "make do" with whatever I could get for free, or for the lowest possible cost. My background in software architecture and 'can do' attitude has put me on a path towards an architecture that can do just that.

But as I worked on my solution, I realized that this is a private case of a much broader issue. When companies migrate to the cloud, they expect that their spending on the cloud would be efficient and effective. The reality of the matter is, that moving to the cloud didn't make IT governance and capital efficiency better, the opposite: Over-spending is the norm, rather than the exception, and the degree of over-spending is significant.

When this dime dropped, I wanted to kick myself. I should have realized this sooner. Back in 2004-2007 I was the Chief Architect of Mercury's APM product line (now HP), called Business Availability Center (BAC) AKA Topaz. Mercury practically invented the term 'Business Technology Optimization' (BTO). I have met numerous customers like Bank of America, Wells Fargo, Lockheed Martin - you name it - and I have seen with my own eyes how BTO delivers value to these customers. A lot has changed since then - but it seems that the problems just got bigger: moving to the cloud created a whole set of new optimization challenges. By 2017, enterprise spending on cloud computing will amount to a projected $235.1B (IHS). How much of that could be saved, and put to better use?

So the following question popped into my mind: the techniques I've used to save a few hundred Dollars per month, could they be generalized and applied to organizations that spend hundreds of thousands, and up to tens of millions of Dollars per month on 'Cloud' (private, public, hybrid, on premises) ?

I decided to take a methodical approach to this question. I started off by meeting with cloud customers (focusing on companies that spend ~$100k per month on Cloud) and interviewing them.

I quickly followed by thrall research, mostly hand-on, in order to learn all the tools and disciplines out there that represent the state-of-the-art.

Then, my next step was to define a functional spec that would address a subset of the market needs, and an architecture that could support it and all the features that are along the path I have chosen.

To make a long story short: The answer is Yes. There's a better way to do Cloud, but it requires a different kind of thinking than the line of thought that was used to bring us to create what we all know today as The Cloud.

Join me in this journey towards a better cloud - or cloud technology optimization.