The Case of the Recurring Network Timeout

This post was co-authored with Roshan Revankar

At Ticketmaster we’re passionate about monitoring our production systems. As a result, we occasionally come across interesting issues affecting our services that would otherwise go unnoticed. Unfortunately, monitoring only indicates the symptoms of what’s wrong and not necessarily the cause. Digging in deeper and getting to the root cause is a whole different ball game. This is one such example. Continue reading

Tools Shaping Culture

Ticketmaster’s Mark Maun recently presented at the Southern California Linux Expo on how great tools can actually be a driving factor for cultural change at scale. Ticketmaster’s DevOps culture has gone through transformative change largely through the use of open source tools. In Mark’s SCALE 13x presentation, Mark walks you through the motivations for change and shares examples of how great tooling has impacted Ticketmaster’s ability to increase product velocity and overall system reliability at scale. Mark’s presentation starts at 3:43:00.

You can see an expanded description of the presentation from the SCALE website here.

Getting Over the Performance Hump with Apache Camel

This post goes over some of our findings in improving performance of one of our Apache Camel based web services.  The past year, Ticketmaster has seen tremendous growth in our new product, TM+, of which our service is a key component. With the increase in traffic, we have been working hard to ensure acceptable response times for our customer-facing services.  For the most part, our service has performed well, with a majority of requests completing within our desired duration. However, we had occasional requests which took longer to execute, at seemingly random intervals, so we set about to investigate and fix these worst-case performance spikes. Continue reading