One of the Amazon servers went into a hang mode yesterday (always on a Sunday ;-). As we have written before, all of our servers are set up in a redundant fashion, but there are always “gotchas” that you can only learn from experience.
We did not have down time, but the site was slow for a while until we manually worked around it.
Today, we implemented a new level of monitoring and redundancy that will take hung memcached servers out of rotation and automatically fire up new ones.
Onward in our march toward the search of perfection for race directors and timers!