5 Reasons Why You’ll Love Our Devjams

Back in January, we held our very first developer outreach and mini hackathon event, aka devjam, in Scottsdale, AZ, to a sold-out audience. Since then, we’ve had three of these one-day events in Durham, NC, Québec City, Canada, and Austin, TX, with one coming up in Los Angeles in June.

So far, the response has been very positive. The feedback received has helped us build better APIs and a better developer experience. The devjams’ Net Promoter Score (NPS) is 72, which is pretty high. So thank you to everyone involved in this success :-)

IMG_5200

Devjams are our way to, in the parlance of Steve Blanks, get out of the building and engage our customers. Since my team’s mission is to create APIs developers love to make our platform accessible, devjams are a great and fun way to engage our customers, the developers using those APIs.

IMG_5243

Whether you’re a developer or someone interested in Ticketmaster tech in general, here are 5 reasons why we believe you too will love being at one of our devjams:

#5 Meet the team

Devjams are a great opportunity to meet the people of Ticketmaster who are opening up the platform, building APIs and shipping products. We believe that the best innovations happen when people connect at a human level, and devjams are a fantastic setting for that. Once you meet the team, you will feel the stoke!

IMG_5266

#4 Join the team

It takes a special kind of talent to spend their Saturday with a bunch of strangers to talk APIs and product innovation. That’s the kind of talent we’re interested in and we have various opportunities where they can contribute. Who knows, we might be working together soon on some awesome products😉

IMG_5683

#3 Get inspired

Serendipitous “eureka” moments are a conversation or a keystroke away. It happens every single time without fail. You’ll find inspiration in the demos others give and in the cross-functional collaboration throughout the day. Given how intimate our events are (30-70 people), it’s easy to have meaningful conversations that will set your imagination free.

IMG_5709

#2 Give feedback

As mentioned earlier, we focus very heavily on the developer experience, or DX. You’re the customer and we’re here to listen to you. We want your feedback. We want your input. As a developer myself, I love when companies do that because it shows they’re invested in me. We’re definitely invested in you, so come on out and share your thoughts with us!

IMG_3175

#1 Have FUN!

By far the best reason for coming out to one of our devjams! We’re a fun bunch and like to connect with others. Our events are informal, laid back, and are as much about enjoying oneself as they are about coding and APIs. There will be plenty of food and beer for everyone. Just bring your jolly self and have a great time!

IMG_5202

We hope to see you very soon at one of our devjams this year. You can also follow us on Twitter to stay abreast on our exciting journey to open up the platform at Ticketmaster.

We would love to hear from you directly. Please leave a comment below or reach out to me on Twitter.

Let’s build amazing products together!🙂

How Agile Inceptions Are Changing Ticketmaster’s Team Culture

The Ticketmaster Way

An agile inception is a collaborative discovery workshop to visually get all stakeholders and team members to a common understanding prior to the start of a project.  The first time I observed my first agile inception I watched 3 transformations occur.  API developers’ eyes widened as they realized for the first time how their “piece” fit into the larger ecosystem and how different applications used it; developers were visibly excited to understand the business problem they were trying to solve and the team rallied together to then solve the problem and ideate based on a common understanding. Product, engineering, UX, program and our stakeholders were bound together by a common mission.   It was one of the most inspiring things I have ever witnessed a team go through. A year and a half later, the Ticketmaster coaching organization regularly facilitates inceptions as a way to kick off a new project or team and the inception process is now part of our technology culture. “The Ticketmaster Way” became a term used to describe how our culture was changing and we began to experiment with new processes and techniques to deliver ideas to our fans and clients.

How we got here

Like many organizations, Scrum gave us a base and common understanding around agile principles and ceremonies.  It became a very dogmatic approach and much of principals were driven by the project management organization however, the value was not fully understood throughout the team.  Although, we were doing agile things, we were not agile minded.  It was time to change (yes, this was hard!)  At the start of our transformation, most of our teams were comprised of all one specific skillset on one part of the system or site.  We piloted the idea of creating a cross-functional product focused pod. To get bring this group of people who were assembled as a team and had varied understandings of systems, architecture and applications, we piloted our first inception.   The entire team moved to a separate building which gave them ample opportunity to form autonomously.  Ticketmaster’s first inception lasted 2 weeks.   During this time, we also introduced other agile frameworks into the team including Kanban, Lean and some Design Thinking workshops.

Inception.jpg

The Inception

We have since refined our process considerably.  After reforming most of our teams into pods and over 40 inceptions later, we have found that 2-3 days at the beginning of the project to get to a common understanding is the right amount of time.  Inceptions are critical to help the team spend much less time in the forming and storming stages of team development and shift to norming and performing much quicker (therefore, eliminating waste!)

Tuckman
Figure 1. The Tuckman Model of Group Development

Inceptions are not only Ticketmaster’s way of kicking off a new project or team, it most importantly aligns the team around a concise mission with measurable business value – outcomes vs. outputs.  In fact, we no longer call projects…”projects”. We deliver on Promises of Value or (POV’s).

Who comes to the inception and what is the outcome?

We bring all team members, key stakeholders and customers into the same room (yes, we fly remote members out for this) The inception is an investment as part of the POV.  We keep the tone neutral by ensuring that it is facilitated by 2 members outside of the delivery team.  We break up the inception into 3 key areas. “The Strategy”, “The Work”, and “The Team”.  Although “The Work” and “The Team” are equally important, strategy is most often done at the product or business level and the team rallies around the work. Think of the team as “The Shark Tank”.  The team needs to be fully bought in and invested in the product that they are building to be inspired and motivated.  Because of this I am going to focus on “The Strategy” portion of the inception in this blog post.

Agree
Figure 2: Illustration via
Jeff Patton & Luke Barrett for Thoughtworks who re-created the cartoon from an unknown origin.

The Strategy – The Mission

The mission is not about telling the team what they will deliver. It is about understanding the problem that they are trying to solve. Product and design help the team to align on some pre-inception materials.  This should be light, but some thought to help the team understand who our competitors are, target users, the user journey, business drivers, the investment. UX will then will help to put some visual context, prototypes, personas and design around what we are trying to solve. The mission statement is powerful. Why is the mission statement so important? It is what the team believes in and clearly states the problem that the team will solve. Product may come to the table with a mission statement already written, however, I typically have the team rewrite the mission so it is in their voice.  They own it.. It needs to be something that any person on the team understands, is able to state and is passionate about. This is where it all starts. In writing the mission statement, I actually prefer the hypothesis statement format over an elevator statement. It clearly states what the team will deliver, the value it will bring to the business and how they know they will be successful.  At the end of the day, we really are building a hypothesis.  We have yet to prove its value.

We will deliver _______________________________

Which will enable the business to ________________

We will know we are successful when _____________

We are fans too! Bring the team members into the problem

The teams at Ticketmaster are passionate. Employees go to events; they experience shows and connect with the fan. We are fans! This passion becomes a critical piece in binding the team to the mission.  We begin the inception by allowing the team to fully experience the problem that our end-users are facing and empathize with them. We found that creating personas for the team was not enough, but having the team fully immersed in what the fan or client was experiencing sparked instant emotion, processing, and then ideation from the team. For example, in one inception we had the team call customer service and make changes to an order. We recognize that this is an experience that can be much better today, so what better way to have the team understand the pain then to go through it! This is the hand-off. It is at this point the team fully understands the WHY. They are not being told it will save the company x amount of dollars, or that it will improve the bottom line. They understand the pain and can see that this is a problem that can be solved.

When possible, we will bring our client or customer into the inception and have them demo directly to the team in how the product is being used and the many steps they have to go through to find a solution.

Measure Success

How will we know that what the team has been working on has been successful? Output is not as critical as outcomes. Just because we deliver something does not mean that it is meaningful to anyone. This final step is critical to motivate the team and help them feel successful in their journey. Are they going down the right path? Should they pivot? The team, just like the business, wants to know from the customer themselves. What do these measurements look like?

I often hear in these workshops that “we want to improve our Net Promoter Score or improve conversion”. But it’s not clear how this product or feature will move the needle in the many products we are delivering throughout the organization. It needs to be tangible. I usually ask 3 basic questions:

  1.  How do you know that you’ve achieved your mission for your slice or POV?
  2.  What does that look like and how will that help you make decisions or validate what to work on next?
  3.  When can the team celebrate their next win?

I like to ask when the team can celebrate their next win because most often than not, these are measurements that the team can fully get their head around – “We integrated with X and validated that the system is fully functional”. “We tested with internal users and validated that the user interacted with application as expected”…

It helps to break down the victory into much smaller wins which keeps the team motivated and excited.

Ticketmaster’s culture is changing. At the core of this change are the people who are passionate about solving hard problems. Given the context, room to innovate, alignment and a way to measure their success the teams become the change agents.  Invest in an inception, it will allow your teams to align early and deliver value quickly.  The key to kicking off a successful inception is by bringing the team into the strategy up front as possible.

  1. Get the team to collaboratively write (or rewrite) the mission statement.
  2. Demo the problem to the team.  Get every member on the team to feel the pain and then give this demo again for any new team members that come on-board.
  3. Understand the team’s success metrics and then measure success at the end of each slice/mvp.

Teams that are motivated feel empowered and driven.  Empowered teams produce amazing results.

celebrate

Our journey to shorter release cycles

The team that is responsible for the development of the ticketing websites of Ticketmaster International has been around for many years.

Where we came from

When we started we had a rather undefined project based process where we released new versions of the site when a project was ready. It was generally many months between each new version. Sometimes over a year. We were using a waterfall approach.

The business made project requests and handed over their requirements to the product team. They then analysed the project and wrote a large specification that they handed over to the visual design team. When designs were done the engineering team analysed the specification and made a technical design. After that, we broke the project down in smaller work items and worked until we were done. Then testers got to work and reported bugs for anything that didn’t follow the specifications. After a few loops of bug fixing and testing again, the new version was tested again for regression issues.

Then the new version was handed over to the operational team that deployed it to a staging environment where the business could see the result. It was common that the end result did not meet the expectations of the stakeholders. All the handovers in the process meant that the original intent was lost.

water-fall_454x172

Where we come from (Image courtesy of iquestgroup.com)

One step at a time

Our first step to improving this was to bring the testers into the same team as the developers. That meant they could test right after something was done. The next step was to regularly demo what we had done for the product team and the business. That meant we got feedback earlier if we were not delivering what was asked for.

We started to do sprints of originally 3 weeks with a demo after each sprint. Then we started to bring product managers into our daily meetings to be closer to the development. Then we made the engineering team more involved in defining the work. We broke the project down to stories that would bring value to the end user. The development team was actively helping product managers breaking down the project to user stories.

We shortened our sprints to two weeks. We also started doing test driven development. That is a way to help driving good technical design by doing automatic tests even before the functionality is developed. We also wrote higher level tests that tested the system from the outside. We decided that a new release would happen every 8 weeks.

After doing that for a while, we moved down to a 5-week cycle. We still had low coverage of automatic tests so we had to test the product manually for regressions at the end of each release. We got stuck in this state for quite some time but were happy with the way things worked. After all, we had come a long way from how we used to work.

We took further steps to work better together. We moved people to the same office, we started to create tools for improving our efficiency of deploying and generally automated as many manual tasks as possible. Coverage of automatic test grew over time.

Our most recent step

We wanted to take a further step to reduce our cycle. This time, the goal was to release to production after every sprint. It had a more profound impact than the changes we did before. All stakeholders would be affected as everyone would need to do their part more often. We decided to ask all stakeholders what such a change would mean for them and what changes were needed to enable a 2-week release cycle. We started tackling the identified issues.

Of course, our main priority in this change is to maintain the high-quality standards we have previously set for ourselves. We measure the rate at which high priority bugs are reported from production and how often and for how long business disruptions occur.  

One leap of faith we have to do when releasing after every sprint is that we removed the regression testing step that we used to have at the end of the cycle. That’s not to mean that we don’t test for regressions. Every story has manual exploratory testing included. All our automatic tests also catch regression issues and they are fixed within minutes of being introduced.  We also need to deploy the new version of code without disrupting sales. The local staff in each country that use our product must be able to translate any new text to the local languages before production deployment.

What we learned

We now have some 2-week releases under our belt and things have gone well. It has uncovered lots of inefficiencies that were hidden before and made everyone rethink their ways of working. Doing something that feels wasteful can be tolerated when you do it only every fifth week. But doing it every other week gets you thinking of ways to improve efficiency much more. It has also made us take the incremental approach much more serious.

I’m very proud of what our team has accomplished so far. It has taken hard work and dedication to reach the point where we are now.  The next steps would be to come to the point where we release code as soon as something is done no matter how small the change.

 

What Ticketmaster is doing about professionalising software careers

“My wife is doing her accountancy exams so is busy tonight, so I’m staying in to look after the kids” is something you expect to hear in every day conversation. But what about “My partner is doing his software engineering training tonight so I’m unable to come out for a beer?” Why is this still unusual? Are software developers still a product of their bedroom and dorm room past, hacking things together and so often saying ‘yes’ when no other professional would in a similar situation?

Software development, let’s face it, is an immature industry. We build software, but not many of our parents generation did. What we often fail to realise is the number of different roles we play. When we started out in software, we most likely will have felt the excitement of building something, and seeing it work. But to get it to work properly, we likely had to do some testing. As our initial projects mushroom, we start to learn various constraints – so without realising it, we start thinking about the design. Then when we have something that other people start using, they might have found a problem with it. So we need to do some investigative work. Then we find that there are additional requirements which can’t be met by the current design – so we start to think about the architecture.

What we haven’t realised is that we’re doing all the parts of what is known as the Software Development Lifecycle (SDLC): Requirements, Design, Development, Testing & Support.

Let’s think about other roles we play in our life. We may be parents, sons & daughters, friends and neighbours. But what about surfers, fans, climbers, shoppers and garage organisers? What we learn from looking at our lives that way, is that if we don’t plan for these roles, or put any time into them, then there are most likely negative consequences. If as a parent, we put no time or effort into parenting, or trying to be better as a parent, then that’s likely going to have a detrimental effect on us and our children. If as a garage organiser we put no time or effort planning or being a garage organiser, then our garage will remain a mess, and it’ll be impossible to find things when we need them.

sdlc

So the same goes for us as software engineers. We need to plan, prepare and learn how to master the different roles we play, or there will be negative consequences. Software support is often seen as less glamorous than developing, but as engineers it requires sharpening our detective skills to be able to find a metaphorical needle in a haystack. This requires a completely different set of skills to development. So in order to take ourselves more professionally, we need to gain an awareness of these different roles, and gain maturity in developing our skills in these different areas.

Here at Ticketmaster we’ve been Agile for over 6 years now, and we’ve come to the realisation that with that level of maturity, multi-disciplined skills are required in order for development teams to continue to grow in a healthy way. When we reviewed our career progression processes across multiple engineering locations, we also found a lack of consistency across job roles and criteria for career progression. We came to the conclusion that without a transparent process that was consistent across all our teams, it was hard to set expectations and provide our engineers with the optimum experience for growing how they wanted to in their chosen profession.

So what we’ve done at Ticketmaster to address these issues is create a career map – a flexible set of pathways built from what we do as engineers day to day, but that provides both a range of options for career progression along with outline steps of how to get there. This provides mastery of any chosen career progression route, with a clear vision of the outcome that you’re looking for as an engineer.

How did we do this? We used a number of industry standard resources, building on the SWECOM model created by the IEEE, and using influences from other industry models such as SFIA. This has enabled us to build a three part progression guide:

progression guide

  1. SDLC skills – these are built up from the SWECOM model and outline 5 maturity steps in each professional skill, built from the SDLC.
  2. Technical skills – these have been built up using a platform and technology neutral approach, again providing a 5 step program of technical skill for all areas pertinent to Ticketmaster’s engineering strategy.
  3. Behavioural skills – we have outlined four areas for awareness and growing maturity that we believe make the biggest difference to our success as a people first organisation: Team work, Innovation, Results orientation, Professionalism. We can’t and don’t want to mandate behaviour, so we provide lots of examples of what maturity looks like at each level.

So armed with a clear map of what is required to progress, engineers are empowered to make this journey with the support of their colleagues. How do we do that? We used the guiding principles from Drive by Daniel Pink.

Autonomy

It was really important to us that we avoid this scenario, which you may be familiar with:dt_c131214

Again, we considered the engineering profession from the viewpoint of different roles, and saw the role of the manager as a coach, like Boris Becker working with Novak Djokovic. We want to create an environment in which engineers can excel, rather than be micro-managed or stressed by unnecessary distractions. So we trained all our managers to operate with a coaching model in mind at all times, supporting and only intervening from the sidelines when required.

Mastery

We’ve empowered our engineering teams by giving them the tools and resources required to progress their careers – and them giving them responsibility for progression. The progression guide makes it clear what is required to progress to the next role. The guide was put together through numerous rounds of consultation, to ensure everyone had a say in how it was constructed. In order to satisfy the requirements of the next role, we have introduced a peer review and assessment process, whereby individuals submit their evidence to show they have mastered the professional, technical and behavioural skills to the appropriate level. What we learnt from engineering led organisations such as Google as well as other industries is not to have a single peer reviewing, but to have a minimum of three to ensure bias is minimised and the process is as objective, fair and transparent as possible.

We’ve rolled out training across all our engineering teams on how the career map process works, and run workshops on how to create suitable objectives in order to reach the requirements for each role. At each stage we have looked to put responsibility for progression with the engineer, but provided suitable support structures to enable and facilitate progression. This includes an online tool specifically created to help manage progress, providing a simple view for instantly seeing what steps are required and a convenient way to store evidence.

Purpose

Software engineering is constantly exploring unchartered territory. That’s why agile has won out over waterfall as a methodology because iterative exploration and learning is almost always better than educated guesswork. Like software projects, software careers are also subject to a high level of flux because of the nature of the industry. Established skills can be rendered redundant overnight by the introduction of a new technology, product or service (e.g. the iPhone, or Uber). By providing engineers with multiple options in terms of charting their career path, we believe that we can avoid exposure to the threat of redundancy of specific technologies by building opportunities for career breadth and depth.

This enables engineers to not only change as the industry does, but also to facilitate proactivity rather than being constantly reactionary, which is far more stressful. Undue stress does not contribute to a healthy working environment, which is what we’re constantly striving for.

But we’re also looking to professionalise our industry, and make software engineering an even more exciting destination than it was a generation ago, when it was really in its infancy. By building our approach on existing standards and the best disciplines from other industries, we’re aiming to show that we’re part of an extremely vibrant but also professional industry which will endure for millennia to come.

Ticketmaster win big at the Techworld Techie awards

techies

A huge night for Ticketmaster last night taking four prizes at the Techworld Techie awards! The Ticketmaster tech team were a very happy bunch taking the award for Most Innovative Team, Rockstar Developer of the Year, Best Place for Developers to Work, and the big one…the Techies 2016 Grand Prix award.

“This is great recognition for great people working for a great company” said Gerry McDonnell, SVP of Technology for Ticketmaster International who was especially proud of his team and their achievements.

On taking the prize for most innovative team, Adam Gustavsson said “Our team story is about a journey, a journey that started over four years ago in an attempt to answer a simple question: how can we move faster? That journey has taken us from a team that released once or twice a year to one that releases every other week, with even shorter cycles in sight. But more importantly, it is a story demonstrating that seismic change can be driven from the bottom up, even within large organisations.”

Nicolo Taddei was overjoyed on receiving the award for Rockstar Developer of the Year, “I personally believe that the biggest impact didn’t come directly from my software, but more from my personality on the team of engineers within Ticketmaster. With a positive mind set I enabled myself and my colleagues to take more action and calculated risks to improve our developer experience, code quality and innovation which have a real and positive impact on our clients and customers.”

We’re super proud of winning the best place for developers to work too. We love live entertainment, we live and breathe it. Everyone at Ticketmaster is a fan, hence our mission to bring out the fan in all! The move to our new offices in Angel three years ago came as part of a pivot in our company strategy, reinventing ourselves as a tech leader and the owner of the largest ticketing ecosystem in the world. “We look at ourselves as a technology company. A few years ago we didn’t do that. We had to redefine ourselves culturally”, Mark Yovich, President of Ticketmaster International said. With this award, it’s fair to say that this transformation is complete, as we now look to build on the solid base we have to provide fans and clients with the best possible experiences.

A big thanks to everyone at Techworld for the recognition and organising such a fun and inspiring evening, and hats off to Ticketmaster for sweeping the Techies 2016!

Black Friday – Celebrating savings through Kaizen

Kaizen

Whilst the internet whipped itself up into a frenzy about discounted televisions, here at Ticketmaster International we were celebrating savings of a different nature. For a few years, we have been using Kaizen techniques to improve efficiency and quality. “Kaizen”, a term borrowed from lean manufacturing, can be translated from Japanese to mean “good change”: it is a continuous improvement technique which recognises that the individual doing the job is the expert on that job and encourages individuals to make small, incremental changes that are within their power to implement.

Kaizen has been embedded in our business in different ways. For our developers who already use agile techniques, including scrum and Kanban, Kaizen was already at the heart of their operation. Regular retrospective sessions are used to identify, shape and get commitment to incremental changes to development and team behaviour. But whilst Kaizen is familiar to agile developers, it was an alien term to people elsewhere in the business. We initiated a programme to encourage wider adoption of Kaizen and empowered individuals to improve the processes they worked on a daily basis. A reward scheme was initiated and a shortlist of Kaizen implementations is reviewed regularly by the Exec team and celebrated through a number of internal communication channels.

Recently we have seen teams such as Ticketmaster Denmark and our contact centre team in the UK taking the Kaizen programme and making it their own. The contact centre team have started an initiative focussing on incremental changes that improve the customer journey. In Denmark, a Kaizen working group has been formed, not only to encourage Kaizen in their business, but to look at the Kaizen improvements that have been made elsewhere and see how they can be adapted and reused.

Black Friday is a US shopping tradition that is being exported through major online retailers such as Amazon. So with all hype about savings all around us, I asked people from our international business to share their Kaizen time savings and improvements. Here are some of their stories:

  • “We simplified the process for customer services engaging with event promoters. By cutting out intermediary steps they calculate they have saved 6 hours per month.”
  • “Our team have been doing some work to reduce the solution size, which has had a massive impact on [software] build times. So far we have reduced the build time of the solution from around 7 minutes to 2.8 minutes which is a saving of nearly 5 hours per day across the team.”
  • “By standardising print codes in Ireland we reduced duplication and conflict, saving 62 hours per month”
  • “One of my colleagues wrote me a script to automate the collection of KPI data. Hugely helpful and saves me at least 1 day each month.”
  • “We improved the guidance for visitors to the Copenhagen Office which led to a £255 reduction in metro fines and delays in the first 9 months.”

It is great to see Kaizen continuous improvement becoming more mainstream in our business and the benefits of small, incremental change being celebrated by such a diverse group. We will continue to evangelise Kaizen and provide coaching to teams who are interested in adopting it as well as other lean techniques and look forward to sharing more success stories with you on Black Friday next year!

Third-Party Components – Hidden Technical Debt

I was recently reminded of something I learned many years ago before coming to Ticketmaster from people much smarter and more experienced than myself. Back then I was pushing to introduce a set of third-party libraries to help lay the groundwork for a replacement for our flagship product, a mainframe based mail and groupware system. The logic, I thought, was flawless: The libraries would give us cross-platform support for a number of key functional areas including network communication, database access for many different database systems, file system, threading, you name it. Writing cross-platform code is pretty straight-forward until you have to touch the metal, and then it can be…challenging. Why re-invent the wheel, I thought, when somebody else had already invented some very nice wheels?

The company selling the libraries – yes, there was a time before Github and the explosion of open source libraries – was successful, well respected, produced quality libraries and offered great support. I did my research, readied my arguments and presented it all to management and senior developers. They were, in a word, underwhelmed. When I asked why they didn’t think it was a good idea I got the simple answer, “We’ve had nothing but bad experiences with these types of things”.

I was disappointed but there was a lot of work to do so I just let it go. But it did stick with me. I mean, why would seemingly smart and experienced developers turn their noses up at re-usable components solving common problems? Over the years however I started to understand their reluctance. Nothing truly catastrophic, mind you, just a lot of time spent wrestling with the devil in the details. And that is what I was reminded of the other day at Ticketmaster.

A Simple Job

The job seemed simple enough: Upgrade several open source components we use, all from the same group, from version 2.5 to 2.6. Certainly there couldn’t be any major changes, and the previous upgrade went smoothly enough. What could possibly go wrong? So we upgraded the components, ran the tests and BLAM, the first sign of trouble: a bunch of our tests were broken. Well, not just the tests. Our app was broken. In the end, it took a couple of people a couple of days to work through all of the issues discovered. And while QA always intended to perform a smoke test after the upgrade, testing was much more extensive than planned because of the issues during the upgrade.

This story would have ended happily enough except our app, a web-based e-commerce site, came out in production and BLAM, two showstopper bugs that required a rollback and immediate fixes. And both could be tied directly to changes in the third-party components we had just upgraded. This is not to say that it was bugs in the components that caused the problem. Rather, changes in the component code combined with our existing or new code lead to unintended, and more importantly, undetected side effects.

The Devil IS the Details

In the one case, the behavior of one component method had changed. Combined with some new, and seemingly unrelated changes in our code, the side effect showed itself in a very specific scenario with the result that a large group of site visitors would be unable to buy things on the site without first encountering an error. In the second case, a deprecated method for initializing a widely used component had to be updated to use newer and less clear methods. In this case, we simply implemented the new method wrong with a small, but very important side effect: we were passing the proxy server ip address to backend systems instead of the client’s ip address where the actual client ip address is an important part of the anti-fraud system.

So what’s the lesson of all this? Well some would say more tests are the answer. And they’d be wrong. In the first case, the error appeared in only one very specific scenario with a very specific set of pre-conditions. It was triggered by changes in our code, which we knew about, that interacted with changes in the behavior of the third party component, which we didn’t know about. Couple this with the previously unknown set of preconditions to trigger the error and you see that nobody could have foreseen the potential error and written a test to cover it.

In the second case, where we implemented the new method incorrectly, we had a test covering it. The problem was that the test was wrong. And this was owing to a tiny detail in the implementation of one of the component’s internal methods. And for the test-first proponents out there, yes, the test failed, was implemented and then passed. Problem is it was a bug in the test that made it pass.

To me the lesson is pretty simple: Think long and hard before pulling third-party stuff into your code-base. Don’t be blinded by “how easy things are to integrate” or “look at all the cool stuff we get” or even “everybody else is using it”. You really need to understand what you are getting yourself into and have a solid plan for how to maintain what has now become part of your code base. Because in the end, this is technical debt that you will be living with for quite a while.

Implementing a DevOps Strategy across multiple locations & product teams

Over the last 18 months, a change has begun within the Ticketmaster International Team. Barriers are being broken down between the engineering and operational teams, our different product delivery teams are being aligned and knowledge sharing across teams is happening more and more. What’s changed? We developed a strategy based around DevOps to create a leaner higher performing organisation and our journey is underway.

As with many large mature international companies our organisation is probably not unique; our Product delivery & TechOps teams are distributed across 5 geographical locations: Belgrade (Serbia), Gothenburg (Sweden), London (UK), Quebec (Canada) and Stoke (UK). Across these teams we manage about 15 different platforms. Our challenge was to create a DevOps strategy and implement change in a flexible manor is across all delivery teams.

With any distributed organisation we have suffered from communication barriers, although tooling such as Skype, Slack, Zoom, are all helping to break down the barriers. However, more fundamental issues existed such as terminology, multiple tools being used for the same job, skills and abilities differences between locations, and silos. A further example of silos was with our TechOps team being a separated centralised group, with different goals to the engineering team. When different groups that need to work together are not aligned and have different goals this can cause friction. In this case, because the way we’ve been organised, the multiple concurrent requests coming into TechOps from the various engineering teams has caused problems in their ability to service all teams at the same time which causes delays.

The differences in tooling and processes have created a barrier that slows us all down. We needed a new approach and developing a DevOps strategy has been one of the answers for us.

Our DevOps Strategy

In developing our DevOps strategy we wanted all teams to speak the same language, and have a shared understanding and skills. We wanted to break down the silos that had been built over time, bringing teams closer together and aligning resources to delivering products, so that we can be more agile, nibble, developing and releasing high quality products quickly, efficiently and reliably. Echoing the Agile manifesto principles:

Our highest priority is to satisfy the customer through early and continuous delivery of valuable software – Principle #1

Deliver working software frequently, from a couple of weeks to a couple of months, with a preference to the shorter timescale – Principle #3

Coalescing our ambitions and desires, mindful of the agile manifesto principles we defined 4 main objectives for our DevOps strategy:

  • Maximise for delivering business value.
  • Maximise the efficiency and quality of the development process.
  • Maximise the reliability of applications and environments.
  • Maximise for service delivery.

With these objectives we started to define requirements to achieve them. Quickly we ran into a mountain of requirements and with that a prioritisation nightmare: how to prioritise the requirements across 5 global locations and 15+ delivery teams, each with different needs.

The Maturity Models

After several rounds of attempting to prioritise in a sensible way we began to arrange the requirements into themes and with that a Maturity Model evolved; one maturity model for each objective.

Maximise for delivering business value. This goal is centred on continuous delivery, creating fast runways down which we can launch our applications.

devops-strategy-image00

Maximise the efficiency and quality of the development process. This goal is centred on continuous integration, creating the environment to launch a battery of automated tests and gain fast feedback to be able to evolve code.

devops-strategy-image01

Maximise the reliability of applications and environments. This goal is centred on instrumentation, creating the visibility into the inner workings of our applications for root cause analysis and fault tolerance.

devops-strategy-image02

Maximise for service delivery. This goal is centred on organisational change, creating alignment of cross-functional teams responsible for delivering software.

devops-strategy-image03

The Maturity Models are great; they provide a vision of what our strategy is. Defining the steps required to progress through to achieving advanced levels of DevOps, we can set long term and short term targets on different themes or levels to be reached.  They’re modular so we can change the strategy if improved technology or processes become apparent, and fill in gaps where some exist.

Flexible Planning

The nice thing about the maturity models is the flexibility they provide. They are maps that can guide you from a low maturity to a high maturity level of DevOps.  If you imagine how you would use maps to plan a route from A to B, depending on various conditions, such as day of week, time of day, potential traffic density, road speed, road works, etc the routes chosen will be most appropriate given a set of circumstances.

devops-strategy-image04

The DevOps maturity models are true roadmaps, as opposed to a linear list of requirements, allowing each individual delivery team to navigate their own path dependent on their context based on what is most important to them or what concerns they have at any point in time.  Furthering this flexibility, the Maturity Models allow teams to change their routes and reprioritise their plans in consort with business changes and needs.

When individual teams select and complete a portion of the maturity model no other team has yet reached comes with an additional benefit. The problems solved by those teams can be shared with all other teams allowing them to achieve that same work faster avoiding the potential pitfalls that would have been learnt by the early adopting team.

Even though all product delivery teams have the flexibility to select their own routes to achieving our DevOps objectives, ultimately everyone ends up at the same location. So the maturity models enable various programs of work to be planned across different teams with very different needs and abilities.

Standardisation

As good as our maturity models are they weren’t able to solve a couple of issues which still existed: we’re using multiple tools to do the same jobs and we speak different languages because we use different terminology for the same things. To solve this prior to kicking off our strategy we set up focused working groups to define and agree a set of standards for tooling, definition of terms (e.g. naming conventions), best practices (e.g. code reviews) and core specifications (e.g. Logging, Heartbeats & Health checks).

Our Core Tooling

  •         Git – Source Control
  •         GitLab – Git Management & Code Reviews
  •         Jenkins – Application Builds
  •         SonarQube – Code Quality Reporting
  •         Sonatype Nexus – Package Management
  •         Rundeck – Operational Support
  •         Octopus Deploy – Deployment (Windows only)
  •         Chef – Configuration Management

Standardising our tooling and specifications for implementing instrumentation meant we could reduce support overheads, share knowledge and solve problems once. Guidelines and best practices meant we’re working in the same ways and all have shared understanding. Definition of Terms meant we could all speak the same language and avoid confusion.

With the maturity models and standards we have created a shared vision and enabled flexibility for each product delivery team to plan what they want to work on. We have created a framework that enables all product delivery teams start working on achieving the DevOps objectives in parallel but focusing on what’s important to their needs at any given point in time.

Sold-out Event Kicks Off Developer Outreach at Ticketmaster

As we prepare to roll out a ton of new technology and deliver APIs that developers love this year, we’ve started to engage the developer community to get their feedback on our redesigned developer portal and the updates to our APIs before we launch them.

Last Saturday, we held our very first event in Scottsdale, AZ, to a sold-out audience of enthusiastic geeks. The event helped set the tone for all future engagements with the developer community. It also clearly showed how external developer feedback is critical to the health and stability of a platform and how we need to hold more of them often.

IMG_2263

Over 75% of registered participants showed up. The market average of free events like this is about 40%. I think we did well🙂

The turnout showed how much the community is interested in what we have to offer. Some of the attendees flew in from Virginia, NY, Washington and California. The energy and the feedback we received were far more than I had hoped for.

Also, five API demos were given at the end of the day, which was great to see🙂

jodymulkey_2016-Jan-16 (1)

The Results

The verbatim we received from developers was overwhelmingly positive with lots of feedback on what we could do better. Here’s some examples of what they said:

The whole platform is very clean and simple. It’s very easy for any developer to jump right into it.
all good…took a few minutes to get oriented to the various parts.
I found myself flipping between the interactive demo and the static documentation a lot. It would be helpful if I didn’t need to do that.
API response docs should be present
So far I can see it is very easy to use, with clean documentation and quick starter.

I’m a biz dev guy…the fact that I can understand any of it makes it pretty impressive from my pov.

The API needs to consistently deal with images for non-events. We also need a way to get a link to an event itself, not its attraction(s). Also, it needs to have a way to identify the base URI for relative URLs.

A few common themes emerged:

  • API bugs
  • Data inconsistencies
  • Documentation completeness
  • Need for tools and SDKs

We gathered detailed feedback throughout the day and the team is currently addressing them. The state and quality of the platform will be a whole lot better by the time we hold our next event in Los Angeles in February (stay tuned).

Open Platform NPS = 65

Overall, our initial Net Promotor Score, or NPS, is 65. This is a pretty high score (Amazon’s is 65) and shows a lot of goodwill from the developer community. Our goal is to keep our NPS above 70. Now that would be amazing!🙂

Aside from the written feedback, we also asked developers to rate us on various aspects of the platform. Here’s how we fared:

1234

Stay Connected

Follow us on Twitter and subscribe to our Medium Publication to be the first to learn when we launch the developer portal and make our APIs publicly available. Exciting times ahead for the Ticketmaster Open Platform🙂

Tests and Comic Strips: How Dilbert Explains a Philosophy of Testing

At Ticketmaster, we’re committed to testing. Automated tests improve our stability, helping us sell tickets to more fans and add new features. As a developer, I have an additional motivation to write tests well: tests illustrate the intended behavior of my code. Besides achieving good code coverage, an articulate test quickly teaches another person what my code does. Here are a few cues that I’ve settled on, inspired by comic strips.

Testing speeds things up.

dt_c130920_1

DILBERT © 2015 Scott Adams. Used By permission of UNIVERSAL UCLICK. All rights reserved.

Keep it short

Three frames, four lines of dialogue: that’s limited space for pictures and words. Readers will skip over a fifty-line test block with glazed, lizard eyes. Do you want your readers to pay attention? Keep it tight, then.

dt_c151006_2

DILBERT © 2015 Scott Adams. Used By permission of UNIVERSAL UCLICK. All rights reserved.

Stay focused

Test one feature at a time.

dt_c141028_3

DILBERT © 2014 Scott Adams. Used By permission of UNIVERSAL UCLICK. All rights reserved.

Do you love Game of Thrones? Complex subplots, amirite? Intertwining narratives are fun, but testing many things at once is just confusing. Instead of writing one script that covers all possible cases, I separate each behavior into its own test. One strip, one joke; one test, one case. Isn’t that a Marley song?

Be self-contained

The Boss wants data-driven product releases.

dt940414dhc0_4

DILBERT © 1994 Scott Adams. Used By permission of UNIVERSAL UCLICK. All rights reserved.

Do you recognize Dilbert and the Boss? Of course! Did you forget what happened before this strip from 1994? Maybe a fanboy remembers, but everything you need to know to get the joke is in these frames. I describe this as episodic style. The tests form a series of short narratives with familiar actors, but the events of one test do not impact the outcome of another. When tests rely on the same data, I generate fresh objects before each test. Tweaking the data for a particular test won’t pollute the tests that come after it, and my readers won’t need to track state as they scroll down the page.

Clear narrative

Have you noticed the similarity between Given-When-Then stanzas and the Arrange-Act-Assert pattern? These templates both resemble story structure because people remember narratives. While writing a test, I map the logic into exposition, rising action, and resolution. If any phase is longer than a line or two, I use comments or wrapper functions to delineate the boundaries. But I can’t hide too much detail—the punch line loses its sting if you don’t know the Boss practices feng shui martial arts.

Don’t leave your app defenseless.

dt_c151002_5

DILBERT © 2015 Scott Adams. Used By permission of UNIVERSAL UCLICK. All rights reserved.

Tie it all together

There are many elements to a comic strip: the dialogue, the characters, the scenery. These layers all converge on a single message. Do you see Rodney’s wrinkled tie? This subtle detail emphasizes just how unsafe the Boss’ new product is. Without pictures, variable names must evoke imagery that reinforces the behavior being tested.

Poor Rodney—no wonder his tie is wrinkled.

dt_c100821_6

DILBERT © 2010 Scott Adams. Used By permission of UNIVERSAL UCLICK. All rights reserved.

That’s all. I write short tests that focus on a single behavior. I name variables and functions carefully, and isolate state within each test. I ensure that the sequence of actions conveys a narrative structure. These elements of style focus the reader’s attention on my code’s behavior and foster clear understanding in a memorable way. I hope you find these cues helpful.

Love it? Hate it? Know any good jokes? Tell me what you think by dropping a note to matt@ticketmaster.com.

 

Using Multiple Languages in Your Development Environment

Many modern software engineers now work on multiple projects over the course of their career, each with different requirements.  These requirements often cause us to consider different tools and even languages to get our work done faster and more efficiently.  The Commerce Order Processing and Order Details team decided to take a different approach and decided to integrate multiple languages into its development environment, using Java and the JVM as a base.

The Transition to Groovy

Java is a great language.  It’s well supported, it’s standard in many large corporate enterprises, it’s taught in virtually every single CS/CE program so you have a large pool of talent to draw from.  It allows you to create highly structured programs, and allows code reuse to a large extent.

But sometimes Java is frustratingly inflexible.  Some code that should be dead simple is very complicated by virtue of forcing you down a particular philosophy.  Worse, experienced Java developers tend to create designs that encourage complication and lots of structure.  For a team that wants to be more flexible and agile, this is not good.  Java 8 has created some mechanisms designed to address some of these, but they are in the very early stages.

The Order Details team at Ticketmaster wanted to experiment with design philosophies to encourage faster development and more flexibility; however, we had a large code base that we needed to add features to and we didn’t want to rewrite most of this code.  So instead, we decided to test something out – introducing a language that wouldn’t be as verbose as Java.  Enter Groovy.

Groovy is a language that compiles to the Java Virtual Machine.  Its syntax builds on top of Java, which makes it very good as a transitional language – you rename a file’s extension from .java to .groovy, and it just works.  Groovy has a lot of features that make it attractive: true closures, automatic field access, syntactic sugar to compact code, etc.

This isn’t possible without the JVM itself, which in itself allows you to use multiple languages in the same program.

Add in some Clojure

We needed to add a feature to our Order Details service that involved custom data translation.  The details for that will be in a future blog entry, but eventually we decided to use Clojure as our implementation language, as it’s naturally suited to this type of feature.  A similar feature that existed in one of the older versions of our service was 1-2 orders of magnitude larger in terms of lines of code, so we were motivated to make this change.

However, Clojure as a language is also a more difficult language to adapt.  To help ease the transition, the members of our team decided to read documentation and books on the subject; we also used pair programming heavily to spread information through the team quickly.

It took around 1-2 months to finish the feature; however, the schemas to create the translations were easy to understand, and by and large the engine needs very little modification once it was written, so it was effective.

Advantages to the Approach

The main advantage in this approach is that you can write code that is particularly efficient for a particular problem without having to throw out existing code.  The Clojure code, when written in Java, would have been several times larger and much harder to debug.  In addition, if we needed something that was better to do in Groovy, we wrote it in Groovy.  This included unit and integration tests, which is not well supported in Clojure.  The Groovy change allowed us to slowly transition into a language that allowed us to move faster without having to slow down and learn an entirely new language.

Disadvantages to the Approach

However, languages need to be supported and new team members (whether they’re new hires or other teams working with our code base) need more time to ramp up to the code, especially if they’re from a mainly Java background.  We decided the code for the feature was isolated enough that the transition can be done over time, but it’s still an issue.

In addition, there are certain architectural problems you won’t be able to solve with solely a language change.  In that case, it may be better to start from scratch.

Who should use this approach and why would they benefit?

I don’t think this approach should be used for all teams; but teams with the characteristics below could be well served by this approach:

  • A high degree of autonomy and engineers who understand multiple languages
  • A domain-specific problem set better suited with another language
  • An experimental approach to target a specific technology without leaving their current development environment

 

Conclusion

Using multiple languages in the same development environment can be useful in the case of a specific problem domain or the desire to transition to an approach where they believe they can do work more efficiently.  While this approach uses the JVM as a target compilation platform, Javascript can and has been used as a platform as well.  The approach isn’t for all teams, but it can yield great gains when used correctly.

2015 London tmTechEvent – Ticketmaster’s in-house tech conference

hotel

“Welcome to the tmTechEvent 2015,” John McIntyre, head of PMO at Ticketmaster International announced as he scanned my badge. The scanner beeped appreciatively as it recognised my credentials and I was granted entry – just like going to the O2! Apart from the fact that this was our annual Ticketmaster technology summit meeting in a conference room just up the road from our offices at the edge of London’s Silicon Roundabout. But this wasn’t any old conference. Being in the live entertainment space, we had quizzes (using Kahoot.it), a live Twitter feed (@TMTechEvent) and a party in our London HQ’s basement bar complete with gourmet burger van!

As I entered the room there was a palpable sense of tension and nervous excitement in the air as I greeted my colleagues from the Sports division: with the Rugby World Cup starting on Friday, their systems would be under the spotlight – or was it just a matter of doubt over England’s ability to deal with a tricky opening tie with Fiji? Ultimately both fears proved unfounded – all the events ran smoothly and England prevailed.

Along with leaders from all of Ticketmaster’s other technology teams, we had joined together to take stock of our progress in revamping our technology real estate. The 4 day event was packed full of seminars, workshops and group sessions, with the overall aim of evaluating our strategies and determining where to correct our course. We followed the guiding principle of “focus where you want to go, not what you fear.”

wheretogoquote

We pulled in leaders from all over the business to help shape our vision of where we’re going. We combined this with focused feedback sessions on the various aspects of our strategy to determine the best way to pivot to adapt to the changing landscape.  Using workshops to facilitate a rich exchange of ideas we covered subjects from staff satisfaction to talent management to deeper technical subject matter such as engineering KPIs and creating reference architectures.

Other highlights included live demos of in-house tools, including one that had been created to show the visibility of progress in our DevOps program. It was really cool to see each team’s progress in one place across our home grown four- part maturity model. Even better was sharing the whole event with colleagues across Engineering and Operations and feeling a real sense of unity, proving that good DevOps is about culture change and not just a bunch of new processes!

conference 2015

The sheer scope and depth of material covered was brain bending. We rolled out our new career mapping program, providing a structured career map and promotion process across all of our engineering teams. We had a thorough review of the initial results of our innovative and much talked about technical debt management program that we rolled out at the start of the year. We reviewed the progress being made on our employee feedback survey, to ensure that the concerns of our engineers are being taken seriously (it’s not just about having more opportunity to play ping pong!)

Overall this was an inspiring event. There was a tangible confidence and will to achieve our ambitions, based on a very real sense of achievement from how much we had already changed things for the better in Ticketmaster Engineering. The desire to increase collaboration and tackle the bigger challenges ahead together was strong.

Having reset our sights on the vision of the organisation, our technology vision and our engineering vision, the result was a noticeably energised and motivated group of rock-star tech leaders ready to take that vision back out to their teams and the company. The future of better live entertainment starts here!

In Search of a Reactive Framework (or: How we select new technologies)

About seven or eight months ago we started looking at new and improved ways of creating services instead of our tried and tested Spring framework based approaches. The Spring framework certainly has its merits and we have used it with much success, however, the service we were about to write, called Gateway, was to route millions of requests to other services further down in our architecture layers:

GatewayDiagram_v1

We knew that the service had to run efficiently (thereby saving on hardware) and scale effectively. The event driven, reactive approach was one that we were looking to embrace. After much research we had some concerns. Not only is that space full of different frameworks at different levels of maturity, but we also had to consider our current skill set, which is predominantly Java with a small amount of Scala, Ruby, Clojure (which our data science guys use) and a handful of other languages we’ve picked up through company acquisitions. How could we adopt this new paradigm in the easiest possible way?

What this blog post will detail is the approach we used to select the framework we chose. It’s a tried and tested approach we’ve used before and will continue to use and improve upon in the future.

How we did it

Before we describe the stages of what we did, suffice to say that there is no point in doing a technology selection without a business context and an idea of the service(s) a framework will be used to build.

The technology selection was broken up into the following steps:

  • Identify principles – these are the rules that the framework must adhere to. These are properties that a framework can meet in different ways and contain a degree of flexibility
  • Identify constraints – these are rules that cannot be broken or deviated from in any way. If any are broken, the framework is no longer a candidate.
  • Create a short list of 5 – 10 candidate frameworks
  • Determine high-level requirements and rank using MoSCoW prioritisation
    • Read the documentation, Google groups and other relevant articles and trend data to determine conformance to the requirements, including all options and workarounds if applicable.
    • If any of the Musts are broken by a candidate then it is dropped
    • Create a short list of, ideally, three candidates
  • Create a second set of more detailed requirements, rank using MoSCoW and weight in terms of importance
  • Determine the architecturally significant business stories and error scenarios that the service to be built from the framework needs to implement:
    • Write the end-to-end acceptance tests for these stories. The primary scenario with one or two error scenarios is sufficient.
    • Implement these end to end acceptance tests in all three frameworks – this will give an idea of how well the framework meets the service’s paradigm(s), how easy it is to work with in the develop/test cycle and also make sure to post on the message boards or mailing lists to see how quickly a response arrives from the framework maintainers.
    • Update the second set of more detailed requirements with the results of this experience

Our results are pretty detailed so have been added into a separate PDF that you can download here. We have left this in raw format and hope that they will be a good reference for others.

Outcome and experiences to date

As you can see from the PDF of results, we chose Vertx. It won out not only because of its raw power, but because of it’s fantastic architecture, implementation, ease of use, Google Groups support and the fact that Red Hat employs a small team to develop and maintain it. Indeed, a few weeks after we selected it, it was announced that Red Hat hired two more engineers to work on Vertx.

So overall we have been very happy with our selection of Vertx. We had version 2.1.5 running in production for several months and recently upgrade to Vertx 3. The maintainers’ swift response on the Vertx Google Group definitely helped during our initial development phase and during the upgrade to version 3. Performance wise, the framework is extremely fast and we know that any slow down is most likely due to what we have implemented. Adoption has been a success. From a team of two developers, we scaled to four and now eight. Choosing a Java based framework has been a boon as the only additional complexity that needed to be learned by the developers joining the team was the event driven nature of Vertx (i.e. the framework itself). Had we chosen Scala/Play it would have been much harder. Indeed, with the success of Vertx, our decision to standardise on the JVM as a platform and our embracing of the reactive approach, we have a couple of services being built using Scala and one using Scala/Play. It would be great to hear of your experiences using reactive frameworks. Which ones did you choose? How easy were they to adopt? Please leave a comment, below.

Shipping High Quality Mobile Apps

I want to share a story about a recent career transition I made – moving from program management to quality engineering . I was very nervous about the change since I wasn’t 100% sure whether the decision I made was the right one. There was a huge opportunity on the table to deliver more value to our fans more often. The risk of course was that quality would slip.

Read on to hear how we became featured in the Google Play Store and in Android Central as “AC editors’ Apps of the Week”!

Joining the Android Team

As I joined the passionate team of Android Engineers I knew it was a new beginning of something great and challenging for all of us. As a team we quickly built a strong app foundation and developed a culture of teamwork and support for one another. Our fundamental rule was: check your ego at the door! Everyone on the team supported each other like family. We had a few goals in mind:

  • Reduce Regression time from 2-3 weeks to 8 hrs
  • Build once and ship everywhere
  • Ship high quality software together

In this post, I want to share my personal experience as well as how my team was able to set up our automation framework.

Why Automation?

It all started with a few automation training sessions with our Quality Architect – Eric Lee – in a small conference room on the 2nd floor of our building. As I remember the room was packed with eager QA engineers wanting to learn about making automation work for our mobile apps. Eric demoed sample tests that navigated through a few pages of our app utilizing Cucumber, Appium and Java. When we came out of the training session we realized how we could utilize type of tests to help us meet our main goal “Reduce regression time from 2-3 weeks to 8 hrs”.

Regression test
One of our regression tests

We used our 10% time to setup a small device lab. We started with one device since we came across some challenges running the Ticketmaster Android app across multiple devices due to our Appium configuration. We made an update to our configuration assigning each instance of Appium a unique port as well as assigning a unique port to each of the devices.

Config Changes
Screen Shot 2015-10-23 at 12.06.12 PM

After making these changes, we’re able to execute our test suite across four different devices simultaneously. Our regression tests run overnight and detect any ongoing problems while development is in progress. The team gets a detailed report of what failed in our automation tests bringing everyone aware of any issues that may have been introduced during development. This helps our team bring focus on critical release regressions and allows us to collaborate with one another to come up with the highest quality app.

Android Build pipeline
AndroidBuildpipeline

Automation report
Automationreport

In 2014 having no automation coverage we released our apps only 4 times. On the other hand, this year we have published more than 19 versions of our TM app to the market place. What a difference a year makes! To date, our automation coverage is up to 63% focusing on our critical path of our app. We are still not done and continuing to add more features into our automation suite to get us closer to our goal in reducing our regression time to 8 hrs. With automation in place, we are equipped to have more frequent releases to the marketplace bringing key features to our fans quicker.

Detailed automation report

Next up for the android team is to publish our newly designed app across all international markets. This will give our team a more maintainable code base supporting all regions. As you can imagine, our automation is critical to supporting more markets. The ability to run the same tests across the different binaries for each region and get detailed test reports for very little additional effort is invaluable. Our monthly releases going forward will support a total of 5 markets – build once and ship everywhere!

Culture Change

Shipping High Quality Mobile Apps – Our Android team has the “do it and let’s do it together” attitude that makes me want to come to work every day. During the past few months, our android team have made significant improvements to how we managed our releases and our sprint schedules. With our project and product manager’s help organizing our stories, our grooming sessions have become more engaging and interactive. That helps our team bring focus to the features that we will be developing. Our demos and retrospectives have improved as well – our engineers showcase their work to the stakeholders and other teams, giving each engineer ownership of the work they completed. One of our stakeholders stated this after one of our demos:

“This is a TEAM!

Well done guys. Your pride shows in your work, and it also showed in the demos you gave at the last showcase. I am inspired and impressed. You have improved over the past couple of months, and that shows in this latest release of your app.”

Our quality and software engineers are very passionate about delivering the highest quality features to our fans. For each and every submission and release, the team’s ownership and responsibility is second to none. Whether it’s looking at the reviews on the Google Play store, our crash-free rate, daily active users – the team has great pride and commitment to deliver a high quality app.

Being featured on Google Play Store and Android Central as “AC editors’ Apps of the Week” was a great achievement thanks to the hard work and dedication of the team. We’re continuing to make changes every day to help our team be more efficient. A great example of this is our recent decision to include an app release within each sprint, inclusive of the sprint’s work (as opposed to releasing in the subsequent sprint). Our Android team has become like a family. We enjoy working with each other and we build software in a cohesive way. We are a dedicated team and committed to a mission to be better each day when we come to work at Ticketmaster.

References:

Jenkins: https://jenkins-ci.org/

Appium: http://appium.io/

Selenium: http://www.seleniumhq.org/projects/grid/

Cucumber: https://cucumber.io/docs

Ticketmaster and Button: A Mobile Commerce Experience

Ticketmaster is hard at work creating world-class mobile consumer experiences, but even when you are a major player in ticketing, finding engaged audiences with intent is always a challenge. We need to continually be there for our fans while they are in the process of exploring the things they love – bands, sports teams, theater – in best-of-breed publishers, and find a way to seamlessly deliver them into our apps. Apps that are iterating at what feels like warp speed.

In the last year alone, we have released 11 versions of our iOS app. During this time we have delivered: a Universal version (iPad compatibility), In-Venue Seat Upgrades, ApplePay, Search Suggest, iCloud Account Sync, Seat/Section Preview, Sign-In after Offering, Accepting Transferred Tickets, Camera-Scanning Credit Cards, and iOS9 App-Content Searching.

As a global market leader and incumbent in the space, we are continually finding better ways to iterate, test and evolve more quickly. This doesn’t always mean developing technology in-house. We frequently test and implement exciting new ideas in the marketplace through third-parties. Through one partnership with Button, we get access to many other companies that are philosophically aligned with our own customer acquisition goals.

Fans use a variety of mobile apps to consume content like music, videos, news, or sports scores and stats. Most apps focus on a particular form of media, like music streaming, or a particular group of fans, like hockey fans. This leads to a fragmented mobile commerce marketplace, that is something we’re constantly thinking about and developing for. For example: How do we enable discovery of related content across many disparate apps?

Button provides a deep-linking connective tissue between these disparate apps.In fact, Button’s integration techniques are actually quite straightforward and easy to use. This is how it works:

Let’s say we have a Great Music App that provides some amazing music streaming services. That app may want to offer users a button that links to more content by a band. The content could be videos, news, or in our case: concert tickets. In the app, this “button” can be created using the Button SDK. This is a subclass of UIControl which includes some code that creates a deep link into an external app. Button will also provide an Affiliate ID to make sure this app gets credit for any purchases made due to the link.

There is a little bit of a tricky part here: First, we need some kind of common name or ID for the artist to make sure we land the user in the right place in the linked app. Second, depending on whether the linked app is installed or not, your button could open a link to the Apple AppStore or a deep-link directly into the external app.

The deep link opens the Ticketmaster app directly onto a page listing upcoming concerts by the specified artist. If the user then purchases tickets, the Order ID and Amount are sent to Button along with the original app’s Affiliate ID. It’s convenient for us and secure for the fan.

Music App Example:

button_flow

From the linked-app side, everything is very simple. The app only needs to handle two events:

  1. App has opened with a deep-link and an affiliate ID
  2. User has placed an Order (bought tickets)

We handle the app opening in the appDelegate. All we really need to do here is store the Affiliate ID for later. The Button SDK can help here:

button_ad_code

Next, when a purchase is completed, we look to see if we have a stored Affiliate ID and send it to Button along with the Order Number and Price. The Button SDK handles this for us as well:

button_order_code

So two lines of code and done!

Observations:

Now, given how simple these operations are, you might question the need for the Button SDK at all. Button has already thought of this and also provided a simple network API that your app can call directly to get everything you need. The API is a little more code, but allows the transparency and flexibility needed to make sure Button integrates perfectly with your existing security and coding standards.

I feel like Button could do a little more to solve that tricky business I mentioned earlier in the originating app, but for the linked app, their implementation couldn’t be simpler.

Button has provided an amazing first step to solving the big problem of linking content across the diversity of media-rich apps found on mobile devices today and in the future. The future of the mobile commerce marketplace – better for the developer and better for the fan.

Experimenting With Efficiency

Have you ever felt bogged down by the weight of process? I’m experimenting with increasing efficiency and reducing workload on my team at Ticketmaster by applying lessons from Gene Kim’s book on devops – “The Phoenix Project”. By learning and implementing what the book calls, “The Three Ways”, we hope to drastically increase our productivity and quality of code, all while reducing our workload.

The First Way is defined as “understanding how to create fast flow of work as it moves from one work center to another.”1 A ‘work center’ can be either a team or an individual who has a hand in working as a part of a larger process. A major part of creating fast flow of work relies on improving the process of hand-offs between different teams. By working on improving visibility of the flow of work, one is able to both get a better understanding of the current workflow and identify which work centers act as bottlenecks.

The Second Way, “shortening and amplifying feedback loops, so we can fix quality at the source and avoid work”2 is about being able to understand and respond to the needs of internal and external customers. In order to shorten feedback loops, one should find ways to reduce the number of work centers or the number of steps it takes to complete a task (including but not limited to combining teams, removing steps altogether, or automating certain processes). The other part of the Second Way requires reducing work at the bottlenecks or otherwise finding ways to remove work from the system, so that the feedback for the work left in the system can be emphasized.

The Third Way is to “create a culture that simultaneously fosters experimentation, learning from failure, and understanding that repetition and practice are the prerequisites to mastery”3. Major components of the Third Way include allocating time for the improvement of daily work, introducing faults into the system to increase resilience, and creating rituals, such as code katas or fire drills, that can expose people to new ways of doing things, or help them master the current system.

In our workplace, we are working on applying the the Three Ways to improve our daily lives. Currently, we have several teams from different geographical locations working on the same codebase. Initially, this led to many dependency conflicts, lots of tasks being blocked by other teams, and there were many issues regarding communication.

We recently started using a kanban board in order to give us better visibility into our workflow, and have added a column on the board for every hand-off between teams. The focus is now on finding ways to reduce the wait time between columns. We have put together checklists in order to aid with communication and improve quality, so that wait times might be reduced. Simultaneously, we are working on ways to remove our reliance on other teams for things such as code review. There are still problems regarding story blockers, but it is hoped that these problems can be solved in the long run by either re-structuring the team’s responsibilities to match the system’s design, or vice versa.

kanban2

Figure 1. Our team’s kanban board

Applying the Three Ways is still a work in progress, but we are already seeing benefits.  Whereas before our product people were creating tasks faster than us developers could work on them, now they are scrambling to keep up with us. Although we still have to deal with stories from other teams acting as blockers, the flow of communication has greatly improved, and the decrease in wait time has been noticeable. Any time gained by our team is being used for “10% time”, which is time dedicated to either research or tasks that will help our team improve daily work and overall efficiency.


1. Gene Kim, The Phoenix Project, Page 89
2. Gene Kim, The Phoenix Project, Page 89
3. Gene Kim, The Phoenix Project, Page 90

Ticketmaster’s Interactive Seat Map Technology from Flash to the Future

If you have a mobile device or read the news lately, you may have noticed that there are issues with browser plug-ins such as the Flash Player. Visiting a website developed with Flash can cause security issues if your plug-in is not up-to-date or, if it is disabled or not available, you can experience reduced functionality.

Currently, the Interactive Seat Map (ISM) feature on our website, ticketmaster.com, is powered at its core by a Flash component. We are doing our best to continue to ensure a smooth and safe ticket buying experience in this rapidly changing environment, such as making sure click-to-play works against our current Flash ISM. At the same time, we are researching and developing multiple new rendering technologies – from building a JavaScript SVG and HTML5-compatible ISM, to an OpenGL ISM for use in native mobile applications, to server-side rendering technology. Further, these tools will give our clients access to customized seat maps for reports and to power our fan views when there is not a need for interaction.

Fans should begin seeing these improvements today on trial events through our mobile website and in the coming months in our new responsive website that we are sending traffic to for some events. Until then, I hope everyone who wants to pick their seat and get a ticket enjoys our distinctive feature of being able to see and select not only verified but exact seats using the ISM.

I know I do, as I used it to purchase three Chicago White Sox tickets two days before a game in July for a family outing. We were able to pick the seats we wanted in row two of the Chris Sale K-Zone section, and see Chris Sale beat Mark Buehrle in a two-hour game in perfect seats. What a great time!

ism

For help with the interactive seat map, please see our FAQ.


Brad Bensen is a Software Architect for the Inventory domain.

Symptom-Based Monitoring at Ticketmaster

monitoring_dash
When Rob Ewaschuk – a former SRE at Google – jotted down his philosophy on alerting, it resonated with us almost immediately. We had been trying to figure out our alerting strategy around our then relatively new Service-Oriented Architecture – the term microservices hadn’t quite entered the zeitgeist at the time.

It’s not that we didn’t we didn’t have any alerting. In fact, we had too many – running the gamut from system alerts like high cpu, low memory to health check alerts. However, these weren’t doing the job for us. In a system that is properly load balanced, a single node having high cpu does not necessarily mean the customer is impacted. More so, in an SOA architecture, a single bad node in one service is extremely unlikely to result in a customer-impacting issue. It’s no surprise then that with all the alerting we had, we still ended up having multiple customer-impacting issues that were either detected too late or – even worse – by customer support calls.

Rob’s post hit the nail on the head with his differentiation of “symptom-based monitoring” vs “cause-based monitoring”:

I call this “symptom-based monitoring,” in contrast to “cause-based monitoring”. Do your users care if your MySQL servers are down? No, they care if their queries are failing. (Perhaps you’re cringing already, in love with your Nagios rules for MySQL servers? Your users don’t even know your MySQL servers exist!) Do your users care if a support (i.e. non-serving-path) binary is in a restart-loop? No, they care if their features are failing. Do they care if your data push is failing? No, they care about whether their results are fresh.

It was obvious to us that we had to change course and focus on the symptoms rather than the causes. We started by looking at what tools we had at our disposal to get symptom-based monitoring up and running as soon as possible. At the time, we were using Nimbus for alerting, Open TSD for time series data and then we had Splunk. Splunk is an industry leader for aggregating machine data – typically log files – and deriving business and operational intelligence from that data. We had always used Splunk for business analytics and for searching within logs while investigating production issues but we had never effectively used Splunk for alerting us to those issues in the first place. For a symptom-based monitoring tool, Splunk now stood out as an obvious candidate for the following reasons:

  • Since Splunk aggregates logs from multiple nodes, it is possible to get a sense of the scale and scope of the issue.
  • It also allowed us to set up alerting based on our existing logs without requiring code changes. Though, over time, based on what we learnt, we did enhance our logging to enable additional alerts.

Since the objective was to alert on issues that impact the user, we started by identifying user flows that were of most importance to us, e.g., add to cart, place order, and add a payment method. For each flow, we then identified possible pain points like errors, latency and timeouts, and defined appropriate thresholds. Rob talks about alerting from the spout, indicating that the best place to set up alerts is from the client’s perspective in a client server architecture. For us, that was the front end web service and the API layer that our mobile apps talk to. We set up most of our symptom-based alerts in those layers.

When our symptom-based alerts first went live, we used a brand-spanking new technology called email – we simply sent these alerts out to a wide distribution of engineering teams. Noisy alerts had to be quickly fine-tuned and fixed since there is nothing worse than your alerts being considered as spam. Email worked surprisingly well for us as a first step. Engineers would respond to alerts and either investigate it themselves or escalate to other teams for resolution. It also had an unintentional benefit because there was greater visibility among different teams about the problems in the system. But alerts by email only goes so far – they don’t do well when issues occur outside of business hours they are easy to miss amidst the deluge that can hit an inbox, and there is no reliable tracking.

We decided to use PagerDuty as our incident management platform. Setting up on-call schedules and escalation policies in PagerDuty was a breeze and our engineers took to it right away – rather unexpected for something meant to wake you up in the middle of the night. Going to email allowed us to punt on a pesky conundrum – in a service oriented architecture, who do you page? But we now need to solve that problem. For some issues, we can use the error code in the alert to determine which service team has to be paged. But other symptom based alerts – for example latency in add to cart – could be caused by any one of the services participating in that flow. We ended up with somewhat of a compromise: For each user flow, we identified a primary team and a secondary team based on which of the services had the most work in that flow. For example, for the add to cart flow, the Cart Service could have been primary and the Inventory Service might be secondary. In PagerDuty, we then set up escalation policies that looked like this:

PagerDuty Escalation

Another key guideline – nay, rule that Rob calls out – is that pages must be actionable. An issue we’ve occasionally had is that we get a small spike of errors that is enough to trigger an alert but doesn’t continue to occur. These issues need to be tracked and looked into, but they don’t need the urgency of a page. This is another instance where we haven’t really found the best solution, but we found something that works for us. In Splunk, we set the trigger condition based on the rate of errors:

splunk-alert

The custom condition in the alert is set to:

stats count by date_minute|stats count|search count>=5

The “stats count by date_minute” tabulates the count of errors for each minute. The next “stats count” counts the number of rows in the previous table. And finally, since we’re looking at a 5 minute span, we trigger the alert when the number of rows is 5 implying that there was at least one error in each minute. This obviously does not work well for all use cases. If you know of other ways to determine if an error is continuing, do let us know in the comments.

This is just the beginning and we’re continuing to evolve our strategies based on what we learn in production. We still have work to do around improving tracking and accountability of our alerts. Being able to quickly detect the root cause once an alert fires is also something we need to get better at. Overall, our shift in focus to symptom-based alerting has paid dividends and has allowed us to detect issues and react faster, making the site more stable and providing a better experience for our fans. Doing this while ensuring that our developers don’t get woken up by noisy alerts also makes for happier developers.

Designing an API That Developers Love

api-love2

It’s an exciting time at Ticketmaster. The company is growing and innovating faster than ever. We’re rolling out new products, most recently our client-facing Ticketmaster ONE, as well as experimenting with new concepts at a very high cadence.

A big part of that agility is attributed to our API (what’s an API?).

To meet the high demand for growth and innovation, and given the sheer size of our company, API development at Ticketmaster is distributed across many teams in various international locations. That makes it all the more important, albeit difficult, for us to speak the same language as we develop this critical capability. We’re at a point where we need principles and guidelines for developing a world-class API that delights both internal and external developers.

Yes, we will be opening up our APIs to the larger developer community soon. I know, I’m stoked too! More on that in a later post :)

API Design Principles

So in order to get our decentralized engineering team to build APIs that look and feel like they came out of the same company, we need to establish certain API design principles. If you dig deep into APIs with strong and loyal developer following (i.e. Amazon, Stripe, Flickr, Edmunds, etc), you’ll notice that they follow what I like to call the PIE principle: Predictable, Intuitive and Efficient APIs.

1. Predictable

They behave in a way that’s expected and do it in a consistent manner. No surprises. No Gotchas. Software is a repeatable process and a predictable API makes it easy to build software. Developers love that.

2. Intuitive

They have a simple and easy interface and deliver data that’s easy to understand. They are “as simple as possible, but not simpler,” to quote Einstein. This is critical for onboarding developers. If the API isn’t easy to use, they’ll move on to the competitor’s.

3. Efficient

They ask for the required input and deliver the expected output as fast as possible. Nothing more, nothing less.

These are APIs that make sense. That’s why they delight and engage developers. Documentation, code samples and SDKs are important, especially to external developers, but the real battle here is ensuring the API itself is as easy as PIE.

API Design Guidelines

To ensure our own API is PIE-compliant, we’ll need to address and reconcile the following areas across all our API development:

1. Root URL

This should be the easiest one to address. All Ticketmaster APIs should have the same root URL. Something like https://app.ticketmaster.com OR https://api.ticketmaster.com. One or the other.

// Good API
https://app.ticketmaster.com/endpoint1/
https://app.ticketmaster.com/endpoint2/
https://app.ticketmaster.com/endpoint3/
// Bad API
https://app.ticketmaster.com/endpoint1/
https://www.ticketmaster.com/api/endpoint2/
https://api.ticketmaster.com/endpoint3/

At a global company like ours, some could argue that we need a separate root URL per market (i.e. US, EU, AU, etc). Logically, that makes sense. But from a developer experience perspective, it’s better to put the localization in the URI path, which is what we’ll discuss next.

2. URI Path

Agreeing on a URI path pattern is going to be one of the most critical decisions our team will have to make. This will heavily impact how predictable, intuitive and efficient our API is. For Ticketmaster, I think the following pattern makes sense:

/{localization}/{resource}/{version}/{identifiers}?[optional params]

localization: The market whose data we’re handling (i.e. us, eu, au, etc)
resource: The domain whose data we’re handling (i.e. artists, leagues, teams, venues, events, commerce, search, etc)
version: The version of the resource NOT the API.
identifiers: The required parameters needed to get a valid response from this API call
optional params: The optional parameters needed to filter or transform the response.

I believe this pattern could help us create endpoints that make sense and are PIE-compliant. Here’s some examples:

// sample endpoints
/us/commerce/v1/cart/create
/us/commerce/v1/ticket/22355050403
/us/artists/v1/taylor+swift
/au/artists/v1/all
/ae/events/v1/all
/us/leagues/v1/nfl/all

What matters here is not the URI pattern itself, but rather sticking to one pattern across all endpoints, which helps make the API predictable and intuitive for developers.

3. HTTP Status Codes

The most important guideline for HTTP header usage in an API context is ensuring the API response status code is a) accurate, and b) matches the response body. This is key in making the API predictable to developers since status codes are the standard in communicating the status of the API response and whether or not a problem has occurred. The main status codes that need to be implemented are:

/ 200 OK
/ 201 CREATED
/ 204 NO CONTENT
/ 400 INVALID REQUEST
/ 401 UNAUTHORIZED
/ 404 NOT FOUND
/ 500 INTERNAL SERVER ERROR

We might also want to define some custom status codes around API quota limits, etc. Whatever we end up deciding, we’ll make sure it’s consistent across all our endpoints.

4. Versioning

Versioning is essential to any growing API like ours. It’ll help us manage any backward incompatible changes to the API interface or response. Versioning should be used judiciously as a last resort when backward compatibility cannot be maintained. Here are some guidelines around versioning:

  • As mentioned earlier, make the API version part of the API URI path instead of the Header to make version upgrades explicit and to make debugging and API exploration easy for developers.
  • The API version will be defined in the URI path using prefix ‘v’ with simple ordinal numbers e.g v1, v2.
  • Dot notations will not be used i.e v1.1, v1.2.
  • First deployment will be released as version v1 in the URI path.
  • Versions will be defined at the resource level, not at the API level.

Versioning eliminates the guessing game, making a developer’s life much easier.

5. Payload Spec

Another key area affecting PIE compliance is using a payload that developers can easily understand and parse. Luckily, JSON API offers a standard specification for building APIs in JSON:

If you’ve ever argued with your team about the way your JSON responses should be formatted, JSON API is your anti-bikeshedding weapon.

By following shared conventions, you can increase productivity, take advantage of generalized tooling, and focus on what matters: your application.

Clients built around JSON API are able to take advantage of its features around efficiently caching responses, sometimes eliminating network requests entirely.

Sold! JSON API is well supported with many client libraries, which is guaranteed to put a smile on any developer’s face. It did on mine🙂

So what about XML? Are we going to support it? I personally think it’s time to say goodbye to XML. It’s verbose and hard to read, which makes it a major buzz kill for any developer. Also, XML is losing market share to JSON. It’s time. Goodbye, XML.

I’d like to call out a few things in the JSON API spec that we should pay close attention to:

5.1 Links and Pagination

A hypermedia API is discoverable and easy to program against, which in turn gets it closer to being PIE-compliant. The links spec in JSON API helps with that. For data collections, providing a standard mechanism to paginate through the result set is very important, and that’s also done via links.

A server MAY choose to limit the number of resources returned in a response to a subset (“page”) of the whole set available.

A server MAY provide links to traverse a paginated data set (“pagination links”).

Pagination links MUST appear in the links object that corresponds to a collection. To paginate the primary data, supply pagination links in the top-level links object. To paginate an included collection returned in a compound document, supply pagination links in the corresponding links object.

The following keys MUST be used for pagination links:

  • first: the first page of data
  • last: the last page of data
  • prev: the previous page of data
  • next: the next page of data

Keys MUST either be omitted or have a null value to indicate that a particular link is unavailable.

Concepts of order, as expressed in the naming of pagination links, MUST remain consistent with JSON API’s sorting rules.

The page query parameter is reserved for pagination. Servers and clients SHOULD use this key for pagination operations.

5.2 Sorting

The spec on sorting is as follows: use sort query parameters with fields separated by commas. All sorts are by default ascending unless prefixed by “-“, in which case it’s descending.

// Examples of sort
/us/events/v1/all?sort=artist,-date
/us/artists/v1/323232/reviews?sort=-rating,date

5.3 Filtering

Using filters to control the result set of the API response is a great way for us to deliver an efficient API to our developers. We’ll need to discuss our filtering strategy as a team before deciding on how to do it.

5.4 Error Handling

Eventually, things will go wrong. A timeout, a server error, data issues, you name it. Part of being a predictable API is communicating errors back to the developer with some actionable next steps. The error object spec in JSON API helps with that:

Error objects provide additional information about problems encountered while performing an operation. Error objects MUST be returned as an array keyed by errors in the top level of a JSON API document.

An error object MAY have the following members:

  • id: a unique identifier for this particular occurrence of the problem.
  • links: a links object containing the following members:
    • about: a link that leads to further details about this particular occurrence of the problem.
  • status: the HTTP status code applicable to this problem, expressed as a string value.
  • code: an application-specific error code, expressed as a string value.
  • title: a short, human-readable summary of the problem that SHOULD NOT change from occurrence to occurrence of the problem, except for purposes of localization.
  • detail: a human-readable explanation specific to this occurrence of the problem.
  • source: an object containing references to the source of the error, optionally including any of the following members:
    • pointer: a JSON Pointer [RFC6901] to the associated entity in the request document [e.g. "/data" for a primary data object, or "/data/attributes/title" for a specific attribute].
    • parameter: a string indicating which query parameter caused the error.
  • meta: a meta object containing non-standard meta-information about the error.

6. Authentication

In our business, we’d always want to know exactly who is making API calls and getting our data. Therefore, solid and secure authentication is required to give anyone access to that data. The authorization standard in the market place today is OAuth 2.0. The trick here is making it dead simple for developers to get their access token so they can make API calls as quickly as possible.

I believe those six API design guidelines will help us develop Predictable, Intuitive and Efficient API capabilities for us and our developer community. I told you this was an exciting time at Ticketmaster🙂

Your Feedback

We want you to get involved to help guide this process. Do you think we’re missing something? What are some of the APIs you love? Why do you love them? What are some of the APIs you’d expect us to deliver?

You can join us on this very exciting journey by subscribing to this blog. You can also follow us on TwitterFacebook and Medium.

Happy Coding!

2015: Year of the Android

On the last night before employees were leaving for Christmas break and the 2015 New Year, desks on the 10th floor of the Ticketmaster Hollywood office were littered with red plastic cups foaming with champagne, and greasy slices of fresh Raffallo’s pizza, a Hollywood staple. The normally tepid office was buzzing with enthusiastic conversation and clapping from our Engineering, QA and Product teams, still high on adrenaline from last-minute bug crushing and testing.

The mobile team was celebrating an audacious feat of full-stack software engineering. In a span of only 2 months, the iOS team, along with their counterparts across many teams and offices at Ticketmaster, had successfully implemented Apple Pay as a new payment method within the iOS Ticketmaster app. This was a daunting and technically ambitious success story for a company with many discrete payment and ticketing systems, not to mention business and legal requirements as Ticketmaster. With the new version of the app being released before Christmas, Apple had selected the Ticketmaster iOS app to be featured in the App Store as one of the first adopters of the new payment system.

Meanwhile the Android team hadn’t released a new version in over 6 months. Buried under a mountain of technical debt, a backlog of style updates and new feature requirements, externally there was probably very much a feeling that the Android app was falling even further behind its iOS brother. Internally, as the Android developers left the offices for their vacation, there was reason for optimism within their own development and QA teams as well. Only one message was scribbled along the corner of the whiteboard in the Android development area:

“2015: Year of the Android”

Build Happiness

I moved to California to take up a Sr Android Developer position at Ticketmaster in September 2014.

I was excited about the great Los Angeles weather but I was also pleased to see the metrics associated with the application team I was joining. The metrics showed great year-over-year growth for both the Android application and mobile in general. More and more people were purchasing tickets using their mobile devices, ensuring that the role would have an immediate impact on the company’s short and long term strategy.

Fig_1-tm-stats

Since 2014 there was a 20% increase in mobile visits and a 35% increase in mobile ticket sales

Unfortunately this wasn’t the full story. For years the Android app had accumulated technical debt and there were other issues that required immediate attention before we could focus our attention on some of the exciting opportunities we had identified.

One of the growing philosophies at Ticketmaster is Lean and to think of products in terms of continuous improvement. The idea can be concisely summarized with the sketch from Henrick Knilberg’s “Succeeding with Lean Software Development.”

Fig_2-mvp

Deliver usable products to allow learning to take place. Illustration by Henrik Kniberg.

To help visualize the Android Application in this manner, I liked to pretend the app was itself a concert venue. Just like its real world counterpart, the Android app was serving a growing line of consumers who were migrating from the desktop web application to the mobile app. The app was already serving thousands of people per day, but it had plenty of room to grow. It was this metaphor that made a lot of the problems jump into focus.

Fig_3-android-concert-venue

A metaphor for the Ticketmaster Android app and a small concert venue

Our user base was growing, but how were we handling this increased load?

Fig_4-low-security

Very little security or organization in the main entrance

The line of users looks a little bit unorganized. What exactly are they doing while they are waiting in line? It doesn’t seem like we are providing very much guidance. Was everyone coming to the mobile app from the same place? Were we funneling everyone to the right place? Also, the security sure seemed a little bit light at the entrance.

Fig_5-outdated-navigation

Users confused by outdated and ambiguous navigation elements

Once people actually got into our application (And our “venue” in this metaphor), they were often confused by ambiguous navigation and UI elements. The app was a lot more complex and confusing in terms of UX experience than most of our users were expecting. A lot of this was due to legacy code and design. A lot of the designs from 2011 were persistent in our 2014 app, despite plenty of design stories in the backlog.

Maybe this wasn’t such a great opportunity after all! How do we address all of these issues at once? It was going to be a long time before the Android team was ready to celebrate its own release.

The first key was to make building the app a source of joy for developers and the QA team. The 10-minute build times using Maven and an increasingly confusing set of dependencies had to go. Luckily, there was a new company whose motto was “build happiness.”

Gradle To the Rescue

gradle

The migration to Gradle provided challenges to both our development team and our QA team. But once the migration was made, everything was easier. Build times went from 10 minutes to under 2. With advice we got at the Gradle Summit (a conference hosted in Santa Clara this year) , we were able to decrease our development build times to under 40 seconds using the Gradle daemon and the incubating features like parallel builds and configure-on-demand.

Fig_6 slack_comment

An Android team member discusses the improvements to our build time over Slack

When I was flying home from the Gradle Summit, I used the offline and no-rebuild flags to allow me to use my most recently downloaded dependencies and allow up to date builds while offline. Gradle plugins, such as the Jenkins plugin allow synchronisation between your local Gradle build arguments and the ones you are using in Continuous Integration.

Gradle has been a star tool for Android development, with new training available at:
http://gradle.org/getting-started-android/ and https://www.udacity.com/course/gradle-for-android-and-java–ud867

Every Android developer needs to learn how to best take advantage of this awesome tool. Gradle is more than a build and dependency management tool, it is truly a swiss army knife, as we quickly learned this past year.

As an example, we had to figure out a way to allow our automated UI tests to bypass some of our security measures without risking leaking these bypass mechanisms into the production app. These complex requirements became easy once we approached them as gradle tasks.

Track and manage Technical Debt

One of the lucky parts of starting at Ticketmaster is that I never felt like I was alone on an island as a developer. The QA team at Ticketmaster is the strongest I have ever worked with. They are engineers more than testers and their knowledge and experience with the Android app allowed me to grow into my role. The QA engineers rapidly built up our automated UI testing suite using Cucumber and Appium and added it into our continuous integration pipeline. Even though the app had a long way to go, it was definitely an aha moment when we got our first 100% passing UI tests report that was completed overnight.

Fig_7_1-automated_testing Fig_7_2_sonar

Automated testing and nightly reports

Our automated tests would immediately detect and problems with our ongoing development. But the UI tests that our QA team was building wasn’t enough. They were great at catching new issues within our User experience. But they didn’t take account of technical debt and insidious bugs within the code base.

Most software developers have heard of the testing pyramid, and with the Android QA team pulling its weight with Automated UI testing, the need for unit tests was even more glaring.

We found the first step to cutting down our technical debt was measuring it. And SonarQube provided a way for us to do this within our build pipeline. Sonar calculates a number “Technical Debt” using a combination of factors: Unit test and Integration test coverage, java logic issues and duplicate code.

Adding unit test coverage was not easy. We had to figure out the JaCoCo gradle plugin so that we could calculate the coverage numbers before these could be added to our Sonar console.

When we first got Sonar integrated into our build pipeline the numbers were grim. Over 700 days of technical debt for the Android app alone. At least we were measuring it now and had a starting point. From now on, conversations between developers about technical debt had a point of reference that we could come back to as we debated approaches toward implementing new features and bug fixes. Often times there were opportunities and “low hanging fruit.”

If a developer completed a new feature, we could immediately examine its impact on technical debt. Were we adding any new major java logic issues? Had we added any technical debt to our already large code base? Were there deprecated classes or features we no longer needed that could be tackled within this story?

In January we were able to reduce our technical debt estimate by 158 days, and increase our unit test coverage by 20%. Part of this was easy; we had some unit and integration tests that hadn’t been measured by sonar. Other gains were made by getting rid of deprecated code that was no longer needed.

Code Quality Summary (Jan 01- Jan 30)

Technical Debt – Reduced by 158 days
Unit Test Coverage – Increased by 19.4%
Major Issues – Reduced by 458

Fig_8-Sonar-stats

SonarQube measures delta improvement to unit test coverage

In February we ripped off another big chunk, 298 additional days of technical debt removed from the app. As we moved closer to releasing our new re-skinned app, we were also cutting down technical debt. This two in one combo of adding features and making the app easier to maintain allowed us to make quicker and quicker improvements for our users. Not to mention, this also improved our build times even further.

Security isn’t an Afterthought

For many months, our focus was on getting the “Android Design Update” out the door, and we were aiming for a February release. Unfortunately, there were security issues that forced us to change our priorities and release schedule.

Going back to our “Event Venue” metaphor, we were hearing more and more from our API team and our data science team that the Android App was getting abused by malicious users.

Fig_9-bots

Bots were grabbing the some of the best seats in the Ticketmaster Android App

This was a major issue that we needed to address. In Android, the basics of security come from three places:

  • Secure coding practices
  • Code obfuscation and minification using Proguard
  • TLS network security

The Android app was using these practices already but it wasn’t doing enough. With a high profile application like Ticketmaster, malicious users are likely to go to extra effort to defeat traditional measures. We needed something extra.

Enter Dexguard. If you haven’t heard of Dexguard, it is an extra layer of security provided by the makers of Proguard but designed specifically for Android applications.

Dexguard allowed us to add not just one, but many additional layers of security.

Fig_10-dexguard

Dexguard added lots of new security features to the Ticketmaster Android App

Some additional security layers provided with Dexguard that were not possible with Proguard:

  • name obfuscation
  • string encryption
  • method reflection
  • tamper detection (at runtime)
  • environment checks (at runtime)
  • class encryption

When using tools to decompile the Android app, Dexguard makes things much more difficult for reverse engineering. Dexguard allowed us to push a lot of the bot traffic out of the Android app immediately.

Continuous Integration

The 6 month delay between releases was too long, If not for the security updates, we would have gone this long without delivering improvements to fans. With Gradle, Sonar, and automated UI testing, we had the tools we needed to deliver constant updates without suffering out code quality. With each commit, we trigger tests, our lint and sonar code quality metrics, and our automated UI tests. All of these build steps are triggered by gradle tasks.

Fig_11_2-CI-pipeline Fig_11_1-CI-pipeline

With a revamped CI pipeline with automated testing, new features can be added more reliably without incurring additional technical debt

After we released our Android Design Update update in late February, we started with bi weekly updates. After going through only 2-3 updates (sometimes only for security reasons) in 2014, we have released more than 10 updates to the Android app since February. This is the reason the Android developers were secretly happy back in December, we knew were close to putting it all together.

Think Big Picture

As an Android developer, it’s very easy to get sucked into the issues that affect you on a daily basis. How do I make this list update faster and make this UI consume less memory? What is the best way to create a clickable span within an expandable list? But it’s equally important to see the big picture. How will this app become what it is capable of

Fixing the process, and tracking was the most important step. Taking some time each week to think about the pipeline and how it fits into the larger picture was key. For mobile architecture, I strongly recommend using C4 diagrams which allow developers to layer additional details for more sophisticated engineers.

When I am introducing a new members to the Android code base, I can start with a Context diagram, and once I feel they understand it, move on to the containers, components, and of course, our existing technical debt and long term goals. Using C4 diagrams it is easy to visually pinpoint which parts of the codebase are holding more technical debt. It’s a springboard to software fluency.

There will always be new unforeseen challenges. A few weeks after we released our Android Design Update, most bots moved to attack the iOS app. Other more advanced hackers found a new way to exploit our existing security, requiring us to display Google’s reCAPTCHA to all users attempting to purchase tickets on the Android app. Our users were not happy.

Working with our API team, we are going to implement new features to overcome this issue and we will always have new challenges. With all the improvement that have happened to the Android app over the past year, it really validates the team’s work when we see feedback like this in our user reviews:

review

The scribble on the whiteboard was just an idea.  A simple idea that our team could make the kind of improvements that had started become a part of our process and that releases could just become routine.

And then it happened.


References:

Gradle: http://gradle.org
Sonar: http://www.sonarqube.org
Dexguard: https://www.guardsquare.com/dexguard
Jenkins: https://jenkins-ci.org
C4/Structurizr: https://www.structurizr.com/
JaCoCo Gradle Plugin: https://docs.gradle.org/current/userguide/jacoco_plugin.html

Jeff Kelsey is a Sr. Android Developer at Ticketmaster