Being a startup, it's always essential to decide what you want to spend your time on. For us, it's essential that spend as much time as possible on improving our product and talking to our customers - in short, creating value for our customers. That's why we initially outsourced our entire hosting setup to Heroku, although we have plenty of in-house hosting/ops knowledge. Customers don't care much where we are hosting our solution, as long as it's safe and has a high uptime.
Initially it was very easy to get started on Heroku. The tool belt was great and the add ons were many. We used Heroku PostgreSQL as our database and Open Redis as caching + pub/sub and that served our first couple of customers very well. As we added more and more customers, we started to receive complaints about errors. They were seeing the Heroku application error screen - but strangely, we didn't get any errors in Opbeat and when we tried to accessed the website it looked fine. The latest log lines did also not show any signs of issues until one day where I experienced it myself. I immediately checked the logs and apparently we had run out of memory on one of our Dynos. I increased it to x2 Dynos and solved the most immediate problem, however, I was left with a bad taste in my mouth. We were paying for a completely managed hosting solution and no-one cared to tell me that we were out of RAM? Furthermore, because of the way Heroku is designed, you have very little access to the running environments and it would be close to impossible to get any useful information, such as what in specific was taking up all the RAM.
Within the next couple of months, we were experiencing frequent problems with deploying as well as general downtime and we decided we didn't want to continue over-paying for a service we feel we didn't really get. We looked into the alternatives: VPS', co-location and cloud solutions. We figured we wanted something with the same flexibility in scaling as Heroku provided and something that could offer an easy deployment process. With Heroku we used Circle CI which had support for automatically pushing the code to Heroku, once it passed the unittests.
In the end we decided to use AWS, in specific, we wanted to use their Auto Scaling Groups (ASG). It effectively works exactly like Heroku, except you have to do a little more work. While Heroku builds an image of your application on git push with ASG you need to make an Amazon Machine Image (AMI) yourself. However, once you have the image you have the exact same flexibility as Heroku - just tell Amazon how many instances you want and they will take care of the rest, including killing and spawning new instances if they crash. As a bonus, you can let it scale automatically based on metrics such as CPU usage, however, we do no use this feature at the moment.
As I began to test ASG late on a Sunday night I realised that the AWS UI wasn't very intuitive and that it was a long process to make a new deployment. I knew that it was very unlikely that I would get my good friend and business partner Ales to use this tool. I had just heard of Netflix' Open Source stack and I knew that they were running on AWS as well and decided to check it out. They built a simple web-based management software for AWS named Asgard, that lets you setup and manage ASG with just a few clicks. I was immediately intrigued by the name which refers to Norse Mythology and decided to try it out, despite my bad preconception of Java, that I knew from Java Applets and the horrible "Introduction to Programming" classes in University. It worked really well and I immediately knew it was the right tool for the job.
Now I had a great user interface for managing our services and deploying them once we had an AMI, however, I did not have an easy way to build an AMI automatically from CircleCI. I had a look around and found packer which seemed fit to solve the problem. It worked great for building the "base image" with would have all the requirements, however, for building the application image that I would create after each build I felt like I was writing a lot of code over and over again to bootstrap it. So I made a small python script called dynpacker which lets you create multiple AMI's (e.g. web, worker, scheduler, etc.) using Packer by taking a base AMI, a zip of your code as well as the processed that needs to run rather than a bunch of configuration files. This worked great with CircleCI, since I could now just run dynpacker after each successful build and packer would take care of the rest. Dynpacker effectively replaced the Procfile from Heroku.
We have now been running for around a month on AWS and it has been a great pleasure. We are even starting to move services that we previously had managed into ASG using Asgard - one of those examples are Elastic Search.