.

Under the hood

Posted by on Oct 4, 2013 in Technology

under-hood

The new TED.com is more of a rebuild than a redesign. While what you see on screen will look very familiar, what’s radically different is the backend of the site. TED.com was born in 2006 — the year the Nokia 9000 phone ruled and Suri Cruise was born. It’s 2013, clearly past time for us to move from our old framework to a modern set of software tools, but doing that is never easy. In fact, it’s kind of like replacing an engine on an airplane mid-flight.

Here, a conversation with our chief technology officer Gavin Hall, software architect Michael Twentyman (better known as 20) and front-end developer Joe Bartlett about what’s different in TED 2.0. They discuss what the new code will mean for the user — and why they are open-sourcing every bit of it.

Will you talk us through the evolution of the engine behind TED.com? How did it begin, and how did it evolve from there?

Michael Twentyman: The first version of the site started fairly simply with a development team of four at the agency Method, using a popular PHP web framework known as Symfony. It was rapidly prototyped over 4 months in 2006-2007. After the initial build and launch, development moved in-house so TED could guide its destiny. What followed, though, was a lengthy period of rapid organizational growth that was mirrored in the software that drove the website; over time, it became unwieldy, with interconnected modules and one-off requirements. Now, we are separating the logical entities from the code into separate services, making each one much more independently replaceable than before. The big benefit is that when one of those modules is no longer working for us, it’s a much simpler replacement. The separate entities remain loosely coupled and better able to evolve independently.

Gavin Hall: The old system, like 20 said, is a giant, monolithic system where everything was plugged together. It’s how websites have been built for a very long time. Moving forward, we’re switching to a small-engine mentality — what’s known as service-oriented architecture. What this allows us to do is trade out pieces wholesale, if they don’t work for us. For example, let’s say that we have our commenting engine in and running. If we wanted to decouple that from the current 1.0 site, it would require going into the whole thing and ripping the guts out and putting a new one in. With 2.0, if we decide six months — or six years — down the road that this commenting engine no longer meets our needs, we can just pull that one piece out and put another one back in, without ever having to touch any other systems. It gives us more flexibility moving forward.

What is Symfony? And why was PHP selected as the language of the old site?

Gavin: Symfony is one of the many frameworks that are out there — at the time, it just met the needs. PHP was selected because, when we started building in 2006, it was a very popular web language and, more important than anything else, the developers knew how to use it. On top of that, there are a lot of things that are web-scale that are built in PHP — from a lot of Yahoo! to Facebook — so the scaling of PHP is a known trick.

20: The only thing I would add is that it was fairly performant. It was doing good things relative to some of the other things that were going on. The issue became that it was not very easily upgradeable — in fact, migration from 1 to 1.2 was pretty troublesome. So when we were looking at upgrading, there is a 2.0 of Symfony, but it would involve a wholesale rewrite of everything. It’s just not the same framework.

Was there an a-ha moment where you said, “We have to re-tool!”

20: We ran into issues where things that should take a day or two in a more modern framework ended up taking us a couple weeks, just because there was so much old rapid-prototype code. That code tends to work, but not be very well-structured.

Gavin: Overall, we decided to make a switch because we switched from a waterfall method — which, again, was a very popular way of building applications in 2006 — to what is known as an agile method. The Symfony framework, how we were using it, doesn’t support that as well as some frameworks and languages that are out there. We need to be able to build small, rapid-prototyping tests so that we can try something really quickly, see if it works and then pull it back out or change it. And that’s what 2.0 is all about: How do we make sure that what we’re doing has the biggest impact possible. It’s easier for us to test that in our new system than our old.

How did you guys decide to go with Ruby on Rails?

Gavin: The team we’ve built has a lot of expertise in it. And it met our goals for rapid prototyping. Since we offload a lot of our heavy lifting to CDNs, such as our video player and stuff like that — and with how well caching has evolved over the last 5-7 years since we built the site — for us, it is more important to be able to move fast and patch things than it would be to go low-level with something else.

Why service-oriented architecture?

Gavin: Great question. Service-oriented architecture allows us to be agile, allows us to move quickly and test a lot, it allows us to make sure that we’re never getting into a system or into a situation that we can’t easily pull back out of. On the flip side, service-oriented architecture takes a lot more planning. It takes a larger team. You have to really understand how all the pieces fit together. Our team seems to be big enough now to be able to handle that correctly.

20: One of the concepts in computer science that you try to design your programs around is loose coupling — and this is very much related to that. It’s making sure that the points of contact between either individual systems or libraries are fairly lightweight, so that when and if you need to replace them, you’re not stuck with a huge migration task. Think of it as gears — it’s just a few little pieces of machinery that need to work together, and that’s it. You don’t need to have a horrible enmeshing of thousands.

How is this beneficial to the user? What will they see?

Gavin: TED is a culture of questioning our basic assumptions and trying everything in different ways. And what the modular decoupling, what service-oriented architecture, and what agile gives us is the ability to try new things, to never rest and say, “That’s a good idea, but we don’t know how we would do that,” or “it’d take too long to do.” This allows us to try. Users will see iterations of the site quickly moving, where each time you visit, you might see some tweaks to it that help it perform better, that help spread a talk, that help expand the idea, that help you spend more time finding ideas that you love.

Joe Bartlett: It also opens the door to much closer QA testing. Currently, really the only way we can QA-test the site is to write a complete integration test, which is extremely time-consuming. With a service-oriented architecture, we can run tests that are much closer to the APIs that we’re producing that connect the individual parts together.

Have there been any pain points with this re-platforming?

Gavin: The biggest pain point is that we are running two sites in parallel. This wasn’t one of those things where we could just go off into a corner and say, “Okay. When we started this site in 2006, we thought we were putting a couple of talks online — we didn’t have a talk every weekday, we didn’t have TEDx, we didn’t have TED Books, we didn’t have our blog. We now have a better idea of what we’re doing and where we’re going.” Ideally, we would’ve just gone away for a couple of years, and then come back out with a brand-new version. I don’t think this problem is unique to TED in any way — that’s a very typical engineering problem that many people deal with, but for us, it’s been a big one. There’s an overlap where both sites will be running for a course of six months. We’ll slowly try to move people over and make sure that the experience is what everyone expects — meanwhile, we’re getting feedback and we’re iterating.

20: Fundamentally, running things in parallel does make it tricky, but it’s also that you’re running through the whole of the old code base that does a lot of things in many disparate ways. There’s a lot of context-switching going on: how do I move this, because this is not like that? You’re touching five -or six-year-old code in there, that hasn’t been looked at since, and you ask: Why did you do it this way? There’s just a lot of unexpected things you run into. If you’re working on a more modern framework and you switch back to an older one, it does things in somewhat strange ways.

Joe: To elaborate on that a little bit more, the key part of service-oriented architectures is that we have these different APIs that, in theory, run as separate services that we kind of accrue together in the Rails app. And the Rails app just kind of reaches out to each of those individually. But because we’re having to run the site in parallel, it’s exactly what 20 said — because where it’s having to run things again at the old site at the same time simultaneously, we’re actually having to develop most of those APIs in the old site itself, using the old site to kick out the data because that’s where data happens to be held right now. And one of the problems with that from my perspective is, of course, that it increases the time to generate each individual page quite a bit. Just because it’s then having to go through two complete systems before it reaches me.

What was the reaction in the company when you said, “We’re going to be rebuilding the site from scratch?” How did you explain to people why this was something that had to be done?

Gavin: I think there was a lot of support in the company for two reasons. The direction that the site was originally at versus the direction that we wanted to go with it — they were so different from a technology-stack standpoint that rebuilding it was really our only choice. So when we laid out the same information that we were looking at as a technology team for the rest of the organization, it was pretty obvious that this was the right choice to make.

It’s not just the needs of TED that have changed. When TED.com launched, you had either “register for the conference” or “here are some of these great talks we want you to look at on the web.” Not only have our needs changed, but how people consume content on the internet has drastically changed. In 2006, there was no iPhone. The browser on your mobile phone wasn’t even thought of. No one was watching TED on their TVs. The only thing you had to plan for were different browsers:  How did this look in Internet Explorer, how did this look in Firefox? Google Chrome wasn’t even out at that point.

Our users’ needs have changed. In order to get there — and we don’t want to follow the trend, we want to be ahead of it — we’re building everything with API backends, which is how service-oriented architecture works. So that no matter which device our users want to consume their talks on, they’ll be able to. If they want to watch on their phone on the Metro, and then switch to their TV when they get home, and use their lunch breaks at work to discover and add new talks to their queue — all of that’s available. “Platform agnostic” would be the technical way to describe that.

20: The other thing that we kind of put forward is that there are technical reasons why we didn’t really have much of a choice. The Symfony framework itself has a pretty bad lag in terms of generating a page — even the least complicated page that we would want to render takes 160 milliseconds to load, and that’s just from the machine. That’s a handicap going forward. We can’t get any faster than that without gutting the whole thing anyway. Think of it as trying to win the Indy 500 with a parking boot on your car.

Gavin: Like I said, it wasn’t difficult to convince the organization.

How will you make sure that the new system won’t feel like a hindrance in five years?

Gavin: We definitely won’t say it’s totally future-proof, because that’s just deceiving ourselves. But the service-architecture approach that we’re taking, and this API-based way of moving forward, allows us to have a standard way to consume the information coming from our services. That allows us to be, again, agnostic to whatever platform we want to build on going forward. So if you want to build on the Xbox 2, the iPhone 5, the iPad, or to something else that doesn’t even exist yet, as long as it can understand how our information is displayed, you’ll be fine.

With the modular design, we can take out pieces that no longer work. So if how we do our blog today, or how we do our talk recommendations — if there’s some giant leap as far as how things are done — we can always pull that piece out and replace it without having to replace the whole site.

20: The other really critical thing is that we have much better resources today than years ago. So we’ll actually be able to do the maintenance that’s required to keep things running in tip-top shape.

What element or outcome of the new site do you feel the most personally proud of?

20: What I’m most proud of is the speed. Right now, it’s not as fast as it will be, because we have a lot of work to do in terms of some of the caching methods that we’ll use. But the framework is capable of delivering a page much, much, much faster. The other thing that Joe’s done is fractured all the Javascript to load in a very efficient way. Part of that is that we’ll be able to cache the entire page and deliver that via our CDN, so the fetching time to get an individual page will be drastically lower. I think that’s one of the best things that we’re going to be doing for ourselves and our users.

Gavin: For me, it’s empowerment. This new platform allows our technology team — the full site, not just our engineers, not just our operations team, but even all the way to our UX people, to our product people — to be empowered to try new things. We can try things and test them without huge effort. We’re able to say, “Hey, let’s go ahead and try this” and we can get it out the door. It allows us to do more playing — allows people to come up with a great idea at night and then, the next morning, jump in, try it out and move forward with it.

Another thing I’d like to mention: our infrastructure stuff. Before, we were in a single data center, and we’ll now be into multiple data centers, which means that the site will be fast, like 20 said, anywhere in the world as we move to a more global site. Also, the chance of it going down are much less reduced, because Hurricane Sandy, or Hurricane Katrina, or something like that couldn’t take out our data center and cause problems. It’s hard to do.

Joe: For me, it’s our responsive system for working with CSS. We have a mobile-first system that is working really nice for us. We’re more heavily using Stash, which is a preprocessor. We’re using some of its features to standardize the way that we write CSS, so that we always write it with mobile devices in mind, as the primary recipient of style sheets. And I think it’s going to speed up the rendering time for the pages in general. There are always going to be opportunities for us to improve, but I think it gives us a strong footing.

Also, something that gets a little obscure. One of the things we’re doing with Rails is a little unusual. Whenever I’ve spoken to my friends and developers, their eyes kind of like glisten and they’re like, “Wow! You’re doing that? That’s awesome.” Instead of using Rails as a default asset pipeline, we’re using a system called Grunt JS to compile all our front-end assets. It gives us a little bit more control to fine-tune the way that assets are built. The way that Rails develops all the front-end code is great for a lot of websites — its out-of-the-box defaults — but we just wanted a little bit more control to fine-tune performance. One way you’re going to do that is by using a build system called Grunt.

Gavin: At TED, we’re trying to spread ideas as much as possible. Internally, we have the motto of “radical openness”– we try to share as much information as possible. We’re trying to take that same approach for technology. We are a 30-year-old nonprofit, but we don’t behave that way. Especially from a technology side, we run very much like a high-tech startup. And because of that, we want to go in two different directions.

One, we want to open-source a lot of the work that we’re doing, and we’ve started on that process with some of the things that we can. We want to help other nonprofits — and other people at large — that don’t necessarily have a technology team as large as ours. That’s something that excites me personally, because it’s always great to give back to the community and be able to help out. You never know what will happen.

The second thing is our data. So not only do we want to provide code in the open-source way that would allow people to do what they want with it and use it for their own nonprofit, but we want to provide all of our data from every single event that we’re doing, so people can mashup however they want — so that they can ingest it, they can pull stuff out of it, and really see what else the community can come up with. Sometimes, the best ideas aren’t always given through the spoken word. Sometimes they might come in a different format — maybe a visualization. We would love to see what the community is able to do, once we publish all of our data.

 

hello_twentyMichael Twentyman is a software architect. He also excels at Tuvan throat singing.

hello_gavinGavin Hall is chief technology officer as well as chief baby teleconferencer at TED.

hello_joeJoe Bartlett is a front-end engineer, and perfect front-man for your next band: Mobile Friendly.

hello_kateKate Torgovnick is a writer at TED. She can solve a Rubik’s Cube in 1:58 seconds.