The Spring Lake 5 Mile opened for the second year with us. They sold out their 12,000 slots in the first 9 hours. They opened at 5AM and has nearly 2,000 registrations in the first 6 minutes!
We were monitoring our systems, and are really happy with the scalability and performance. We maintained approximately 2 second response time throughout. Here are some of the graphs and charts.
This first one is interesting, as you see the average page load time (what users see in their browsers – cool!) went down when registration opened at 5AM. The reason is that most people registering were doing so from home so they had high speed internet. Before then about half of the users were accessing via mobile devices with slower connection speeds.
At the peak we were averaging less than 2 second response time for the user and 135 milliseconds on our servers – no degradation with the spike.
This shows how the number of users spiked quickly at 5AM (and the mix of browsers. Yes, Google Chrome is the most widely used browser, and still the fastest.
These were the 10 web servers we had running. The key resource here is CPU utilization at less than 10% – meaning we could have handled about 10X the load with this configuration (and we can scale much higher).
We use NGINX as our front end load balancers. We only had 2 of them running because the load was not expected to be very high. As you can see, these are very efficient and only reached 4% utilization, so we have room for lots more and can add more instances to the two we run at all times.
Another view of our Web Servers from the Amazon AWS console.
This is our Database Servers. The first one is Average and the second is Max. Looks like we need to upgrade the size of our Shard 1 since it had a peak around 75%, but that was a very quick spike and our memcache servers probably reduced that greatly once that initial set of data had been loaded into cache. The good news is that our main database server never went over 10%, so there was plenty of horsepower left there.
The memcache Session and State servers (we run 4 of each) were all well below 10% as well.