We always like to do an assessment of our systems and how things worked and share that publicly.
We have a real time map that always shows the registrations and page views right now. It was fun to watch when registration opened:
Performance was great for users. As usual, we saw a decrease (faster!) in the response time for users since there was a greater percentage coming from computers with fast connections rather than mobile phones.
We were very pleased with our infrastructure. We recently did a set of upgrades and we ran with our “every day” configuration to handle the load of more than 20 registrations per second. Here are the graphs:
NGINX Load Balancers – 3 servers – 2 M3.large and 1 c4.large (we wanted to see the difference between the m3 and c4, but as seen below, we did not get enough load to really determine true max bandwidth):
Web Servers – We recently switched to running 4 c4.2xlarge servers. This is where most of the stress happens in the system, and we reached a peak of about 50% – so we could have handled twice the load under our everyday operating environment. Of course we can grow the number of web servers within a few minutes if load gets too high.
Memcache Data Servers – We run 8 m3.medium memcache data servers. This really reduces the load on the database in our environment and is one of the things that makes our system so fast. These were not stressed reaching a max of 12%.
Memcache Session Servers – 8 m3. medium. This holds session information for each browser and enables us to dynamically move users from one server to another for high availability. Again, these had plenty of capacity.
Database – Our main database runs RDS (we are part of the test program with AWS and will move to Aurora later this year, which should improve our availability and performance even more). We run a db.m2.2xlarge instance with high availability and redundancy turned on. We also have Read Only database instance that runs on a db.m3.xlarge and a shard also running on a db.m3.xlarge. These all performed very well and showed plenty of capacity. The main database hit a max of 12%. The Read Only hit 30% and is very easy to duplicate and expand.