Zero-downtime server updates with HTTPS, Tomcat, Nginx and Amazon Load Balancer (ELB)

Hi all,
it is strange for me to write in a tech blog not directly related to coding but you know, you never stop learning here at Balsamiq.

I will describe our recent experience setting up a load balancer with SSL on Amazon Web Services. Our goal was to achieve zero down time during upgrades for the server running our internal back-office webapp. We wanted also to redirect all http traffic to https to enforce encryption. It was not trivial, but Peldi and Luis did a great work!!

One of our webapp tasks is to listen to incoming transactions posted by our online seller and store them in our database. It's based on grails and is running on an Amazon EC2 instance. Every day we develop and release little new features and is quite a problem when we upgrade our server because incoming transactions will be lost while it is down for maintenance...

Initial setup

 

We needed a solution that makes upgrading our webapp simple and reduces downtime to zero! We wanted also to keep the ability to redirect http traffic to https like in our single server setup.
So, let's go for it using Amazon Services!

ELB setup

The first step is to setup an Amazon Elastic Load Balancer. Really easy using the Amazon Management Console. We want our load balancer to receive all incoming traffic on ports 80 and 443 and to simply forward it to the same ports on our instances, like this:

Since the load balancer will handle SSL, we had to provide a certificate. If you already uploaded your certificate to Amazon, it will appear in a drop down list for you to choose, alternatively you will have to paste the certificate and the key issued by your certificate authority in the corresponding text boxes. Do not forget (like I did!!) to add also the chain certificate. Note that if you want to modify the certificate after your ELB has been created you can use command line tools like described here (we used a certificate issued by godaddy with no problems).

Next step is to configure the heath check for your instances so that the load balancer will know which ones are down and where to safely route its traffic. We used a check on port 44 HTTP (remember SSL is managed by the load balancer!) of each EC2 instance.

Amazon provides out of the box a fancy public DNS name for your load balancer but you will certainly want to use your own sober domain name. To do this you should create a CNAME record for the LoadBalancer DNS name as specified by Amazon docs. For more information about CNAME records, see the CNAME Record Wikipedia article.

Ok, now we have a simple load balancer but we have no instances yet!

EC2 instances setup

In our idea each instance should run a servlet container with our grails application and should listen for incoming connections on ports 80 and 443. Connections on port 80 should be redirected to 443 to enforce encryption through the load balancer. We started from a standard Ubuntu 10 image and customized if for our purposes.

We used a divde et impera thecnique here, thanks to a brilliant idea of Luis!
We had 2 different tasks:

  • Running a webapp
  • Doing some proxying/url rewriting stuff

So we used two different pieces of software on the same machine, each doing its own best: Tomcat would have handled our application and Nginx would have acted as our proxy. Installing both products was straightforward.

Tomcat configuration was really nice and easy. Since Tomcat only purpose was to run our webapp we just put a simple connector for http on port 8080 in the configuration file server.xml with no encryption and no redirects:
<Connector port="8080" protocol="HTTP/1.1"
connectionTimeout="20000"
URIEncoding="UTF-8" />
Then we just set our webapp as the ROOT one. Here the Tomcat reference to do that.

Nginx configuration was a little trickier. We created a virtual host config file under nginx folder /etc/nginx/sites-available and created a link to it inside the folder /etc/nginx/sites-enabled. The file contains Nginx directives for url rewriting and redirecting. Detailed info on virtual hosts can be found on Nginx wiki.

The configuration file uses two server statements, the first looks like this:
server {
  server_name  localhost;
  access_log  /var/log/nginx/website.redirector.access.log;
  location / {
    rewrite ^ https://public.website.com permanent;
  }
}

This basically says to nginx to listen on port 80 for any location request (/) and redirect clients to https://public.website.com that is the main website using https protocol. Now all http connections will be redirected to the home page using https!

You can alternatively use this location line to redirect to the same page requested, always forcing https.
location / {
  rewrite ^ https://public.website.com$uri permanent;
}

Ok, now we just need to create the last server statement:
server {
  listen   443; ## listen for ipv4
  listen   [::]:443 default ipv6only=on; ## listen for ipv6
  server_name  localhost;
  access_log  /var/log/nginx/website.access.log;
  location / {
    proxy_pass      http://127.0.0.1:8080;
  }
}

This is needed for handling connections on port 443. Connections incoming on this port are unencrypted because the load balancer handles the SSL certificate and decryption. We want these connections to be passed on to our tomcat server listening on the same machine on port 8080, so we set the proxy pass value to http://127.0.0.1:8080.

Done! Connections on port 443 are decrypted by our load balancer, passed on port 443 of our instance to Nginx that forwards them to Tomcat running on port 8080!

It remains just a little issue to take care of...
Our webapp is totally ignorant about our complex-but-beautiful-2stages-proxy-configuration, so when it sends a redirect to the client browser it will use a  location containing an address of the kind: http://127.0.0.1:8080 !!!
Nginx to the rescue! We can add a rule to our configuration to take care of all that messy redirects. This is the final config that translates the redirects from http://127.0.0.1:8080 to our pretty  https://public.website.com
server {
  listen   443; ## listen for ipv4
  listen   [::]:443 default ipv6only=on; ## listen for ipv6
  server_name  localhost;
  access_log  /var/log/nginx/website.access.log;
  location / {
    proxy_pass      http://127.0.0.1:8080;
    proxy_redirect http://127.0.0.1:8080 https://public.website.com;
  }
}

Now our setup is finally ok!

Final setup

Automatic updates: the icing on the cake

Last thing we did is to add some automation to our instance. We created a startup job that executes the following tasks:

  1. Download the last build of our webapp from our build server and put it into the Tomcat war folder
  2. Start Tomcat so that it will deploy the new war as the ROOT webapp and start listening on port 8080
  3. Start Nginx that will serve requests on ports 80 and 443 as per our configuration

After this last piece of configuration we crated an AMI out of our instance.

Each time we want to update our server we simply launch a new instance and add it to the load balancer. As soon as it recognizes the instance as "in service" we just stop the old one... voilà our webapp was updated with zero downtime! :)

Nearly automatic updates

 

I hope this will help someone out there, or at least give some ideas on how to approach similar problems.
Ciao!

Paolo

Comments (16)

  1. Thanks for sharing Paolo, it’s nice to hear from other people using Grails on EC2 even the platform doesn’t really matter in this case.

    I’m curious about a few thing.
    * Do the instances talk to a db on a different server or do each instance have its own db?
    * Let’s say I’m working on a mockup on the old server while you’re bringing up a new server, how do you make sure I don’t loose my work?
    * How do you handle changes to the database, do you use a tool like liquidbase?

    Again, thanks for the detailed description and the pretty diagrams :-), it’ll help us in the future when we start using a load balancer.

  2. Thanks for the writeup Paolo!

    A couple of things worth adding:
    - Tomcat is essentially running without security, BUT the security is given by the fact that the EC2 that’s running Tomcat doesn’t even have a public IP at all, the ELB is the only server that can reach it by the internal Amazon id.
    - The zero-downtime update is only possible for new releases that don’t require database migration, which in our case is almost all releases. If you do need to migrate data, you’ll have to take everything down or it won’t work.

  3. Really thank you for sharing this, Paolo.
    I think that this further article found some days ago can add something about this topic:
    Moving a Production MySQL Database to Amazon RDS with Minimal Downtime (http://geehwan.posterous.com/moving-a-production-mysql-database-to-amazon).

  4. @Dror:

    * Do the instances talk to a db on a different server or do each instance have its own db?

    They talk to the same DB on RDS.

    * Let’s say I’m working on a mockup on the old server while you’re bringing up a new server, how do you make sure I don’t loose my work?

    The old server will finish its transaction, the new one won’t try to do anything to the DB until people use it. And even when they use it, it will have its own transaction.

    * How do you handle changes to the database, do you use a tool like liquidbase?

    We do. And as I wrote above, this strategy only works for updates that don’t require database migration.

  5. This is great. Thanks a lot for sharing.
    It always interests me to see how people are using amazon’s services as I am always skeptical on whether to use them for some of the work I’m doing or not. What made it more interesting for me is that you are using Grails :) it’s nice to hear more and more about people using Grails.

    omarello
  6. You can also achieve the same just with Nginx using its load balancing features.

  7. I’m running into an issue with the load balancer’s health check and Nginx. If Tomcat goes down, Nginx serves a friendly error message that Amazon’s load balancers don’t identify as an error.

    Short of recompiling Nginx, do you have any ideas how to mitigate this problem?

  8. The elb can forward both requests from port 80 and 443 to 8080.

    I am assuming you needed some rewriting done by nginx that’s why you decided to add nginx to proxy port 80 and 443 to forward to tomcat 8080.

    I think you can eliminate the nginx listener on port 443 as it acts just like port 80 (nbon-SSL). Therefore your elb ports 80 and 443 can be forwarded to instance port 80.

  9. Pingback: The Development Community « brianlovescode

  10. Pingback: The Development Community | brian_loves_code

  11. Hi
    I just bought your product 2 weeks ago and I have been using it quite a lot. I am using your product for all the mock-ups for the website that I am trying to build.

    When I was doing some research on the technology to be used, load balancing etc, I googled across this website. Found some pretty interesting stuff..

    Just a quick question — The mySQL is a non-managed instance right ? Do you have a dedicated DBA to take care of it ? In my little experience, I always found that databases are never stable in the longer run especially if you have replication.

    Also, on a side note, who takes care of patching the OS ? Do you have dedicated staff for it again ?

    Siva Nalavenkata
  12. The MySQL instace we use for our internal webapp is hosted on Amazon RDS. We don’t need read replicas because the amount of data is small, so it is quite easy for me alone to manage it. Occasionally I perform a db schema update when the new webapp version requires it, with minimal downtime.
    When I deploy the webapp I usually check for major software updates on the EC2 instance and in that case I create a new AMI with the updated software.

  13. Thanks for the quick reply Paolo. I have been evaluating between Azure and AWS for our app and for now since we are just starting up, I think we might go with Azure ( with all infra support handled by Azure and since my friend is a .NET developer and ofcourse for starters its cheaper I guess ) . Will have to check how good the response times are going to be

    On a different note, the more I use your mock-ups product, the more I am loving it. I guess its the ‘informal’ look-n-feel thats making it realy cool. I can sense the amount of effort you have put in to make it look the way it is.

    Peldi,
    I just saw your interview on Mixergy. Very good stuff. Appreciate you being so open about your ideas.

    Good Luck

    Siva Nalavenkata
  14. With ELB you can do it better, you can have the ELB take care of SSL and the EC2 instance would be running on port-80 only. When the ELB gets a request on port 443, it will take care of the SSL termination on its own and talk to the instance on port 80. Saves some CPU cycles for the instance, and makes it every easier to configure it.

  15. Another thing you can do is pass the X-Forwarded-For and the X-Forwarded-Proto headers, and use these in nginx to log the “real ip” and protocol (http/https) that called your application (the one ELB answered to). Useful in many occasions, for example for redirecting the non-https address to the https one when you see that X-Forwarded-Proto is “http” instead of “https”.

  16. Hi,

    I just stumbled open your site and read it. I am looking for a solution for a certain problem, didn’t find the answer here, but read your blog anyway and have a suggestion for you:

    WAR parallel deployment.

    Your WAR files can have different versions, so while one is active and working, the next one can be uploaded and will replace the old one.

    More info here: http://tomcat.apache.org/tomcat-7.0-doc/config/context.html

Leave a Comment

Your email is never published nor shared.