The weather in Eastern US is a personal tragedy for some people, and is making many other daily lives troublesome.

For much of the Silicon Valley and Startup crowd, it has brought down some major web services, such as Netflix, Pintrest, Instagram, this website and a number of BootstrapLabs servers and services.

And we may all be upset about the down-time, and think that Amazon has not built it supposedly [almost] fail-safe infrastructure properly. My comments are;

  • Yes, you can in theory avoid these disasters; how-ever a startup could build their infrastructure using Amazon and other cloud/hosting solutions to actually be tolerant against failures like this (the last few bits of ensuring tolerance to that level is quite expensive though).
  • However, I have had some multi-site (multiple tier 3 , off-site hosted (clustered and replicated MySQL database) web based services that went down much harder in similar conditions. Cost of running these: 30 x what it would cost on Amazon.
  • Some of our MySQL  (Amazon RDS) instances where affected by the outage, and failed completely. It took down several of our services and web sites for BootstrapLabs. It took us less than 30 minutes to be back online, with no lost data (it took longer for us to start to address the problem, no staff on call for these things over week-ends at this time). This mostly due to the automated snapshot and backup functionality that Amazon provide, for us to get even close to the availability we have with Amazon today with our own servers (instead of their cloud infrastructure), we would need co-located servers in multiple datacenters, we would need redundancy in power, off-site backups, and a replicated SAN environment between the datacenters (which require some decent bandwidth). It would cost us again about 30 x what we pay today, and we would need more manual labour on-top of that. We have a higher availability for a foot-print that fits for our budget with Amazon.

So in essence I would not complain about Amazon’s issues here, and it is in-fact possible to design web based services on Amazon that would fail-over gracefully for [most] events like this, so if that is what you need, make sure you build your platform to the tune of that. Most likely Amazon might be a great fit while you still have not pinned down the exact scale and need, after you grow to fairly large and stable load on your servers, there might be more efficient providers and solutions.

So in essence, thanks for making a very powerful and efficient product available.

Just my 5 cents.