Reading "The Streaming Book" by @ballmatthew

What We’ve Learned About Content (Data)

One of the central themes of early streaming video was how data would produce better content and better batting averages. In 2011, Netflix boasted about the role of data in greenlights for shows such as House of Cards, which went on to be a commercial and critical hit. Hollywood found the notion absurd—the series was an adaptation of a very successful British dramatic series initiated and directed by the universally acclaimed David Fincher and set to star two-time Academy Award winner Kevin Spacey. One need not read thousands of server racks of “personally identifiable information” (PII) or engagement data to know this might be a good swing—the very reason that both HBO and FX bid on it. In the years since House of Cards, Netflix has progressively downplayed the role of the data in greenlights. By 2015, Ted Sarandos was saying their greenlight decisions were based 70% on data, 30% on judgment, but the judgment “comes first.” By 2018, it was the reverse—70% judgment, 30% data. By 2019, it was 80% judgment, 20% data.

Some argue that the shift in Netflix’s talking points was strategic. To build up its streaming empire, Netflix required tens of billions in debt and equity support from Wall Street. As that time, Netflix benefited from telling investors that even as a late entrant to “Hollywood,” the company’s abundant data collection and analysis would provide its development executives with the ability to “moneyball” the content made with investor dollars. But eventually, after the access wave had crested, Netflix’s empire had been established, and the Content Wave had begun, it was more important that Netflix could outcompete for creative talent, not investor capital.

Irrespective of intentions, a few things are obvious in 2023. First, “data” might marginally improve a show’s chances of success, but this benefit is dwarfed by the role of creative development, casting, production, marketing, timing, the ability to win top projects and talent, IP, etc. Second, the analysis performed by streamers is not remarkably more helpful—nor utilized!—than the data that Hollywood has used for decades. The differences is that while Hollywood could only issue surveys, collect broad “ratings,” and observe test audiences, streamers had access to real-time and customer-specific datapoints such as “rewatch rate” or “completion rate” or “time to finish.” We also know that evaluating a series on total viewers or cost per hour leads to misguided valuations and perverse incentives, such as the conflation of “watched” with “loved” and bloated episode counts and run times.

The role of data in recommendations, too, has declined. Log onto Netflix today and the rows of personalized recommendations have been displaced by popularity-based lists of “Trending Now,” “Top 10,” and “Popular on Netflix,” as well as Netflix’s featured Originals. This seems to reiterate the aforementioned role of network effects: while Viewer A might prefer Show A to Show B, ceteris paribus, they’ll enjoy Show B more if their friends B–E are watching it too. I’ve argued in the past that the importance of product is negatively correlated with the intentionality of viewing. When Netflix had millions of titles in its DVD service, none of which were Original and thus of particular financial or strategic importance to Netflix, it was essential that the company could pick the “right” titles for its viewers, or at least help. But when Netflix shifted to streaming, its catalogue was cut down to tens of thousands of titles. Furthermore, recommending the wrong pick became far less costly. It was a big deal for a customer to receive a DVD they didn’t love, as that would leave them with nothing to watch on Saturday night, while Netflix would be out roundtrip postage, would tie up costly inventory (DVD), and would be at risk of losing a customer. But on a streaming service, the cost of sampling shrinks to minutes, and no customer unsubscribes because a recommendation was not of interest. And as Netflix’s licensors became competitors who wanted their own content, Netflix’s catalogue shrank further still. In the United States today, Netflix has 7,000 titles, meaning that every time a user logs on, they might see 1-2% of all options, and 10%+ in a month. This doesn’t require nearly the same sieve that Netflix’s DVD services did; after all, tens of percentage points of the catalogue is relatively easy to rule out on a customer-by-customer basis (not many Van Damme fans watch Cocomelon). And as the content wave kicked off, services drove subscriber acquisition and marketplace differentiation through their marquee exclusives. This meant customers would subscribe to—and log on for— access to a specific title, rather than access to some ill-defined “content.” Meanwhile, it becomes strategically essential that big bet originals do become hits for their service—so they push it heavily, even to users who would enjoy a licensed title more.

The launch of Disney+ in late 2019 was another brilliant example of how intentionality relates to quality of technical product. At the time, the service was constantly crashing, the UI barely worked, and recommendations were nonexistent. The home screen would load the wrong title name (e.g., The Mandalorian) for the thumbnail (WALL-E) and the service rarely supported basic “continue watching” functionality; viewers had to manually scrub to where they had left off. While the service did recommend titles, they were effectively hard-coded. If you watched Cars and Cars 2 back to back, Disney+ wouldn’t even recommend Cars 3. But this didn’t really matter. Those who wanted Disney wanted Disney+, and it wasn’t that hard to click on “Pixar” and find Cars 3, even if technically, Disney should have recommended it—nor would you decline to play it for your kid even if it was hard to find. And at the end of the day, many Disney+ users don’t want to discover a new title – they just want to watch Frozen a 200th time.

Where data has been essential is in slate composition. In the pay-TV era, the sole goal of a network was to maximize their viewership in every time slot. This is because every 1% increase in viewership meant, all things being equal, that the network could “sell” roughly 1% more viewers to advertisers and collect another 1% in ad revenue (or even more, as the ad slots became scarce). In addition, the more popular a network’s shows, the more it could charge pay-TV distributors. Due to these incentives, networks were broadly indifferent to the “cohesion” of their programming, or what was scheduled at 8 p.m. or 9 p.m., 930 p.m., 10 p.m., and so on. A night might therefore swing from a live reality singing competition to a half-hour scripted comedy series and then a 60-minute crime drama. Similar tones/genre/styles helped only insofar as it aided in audience acquisition and retention (think Thursday night comedies on NBC)—if the 8 p.m. and 9 p.m. shows appealed to the same audience, the two shows could be marketed together and the viewers at 8 p.m. just might stick around for the 9 p.m. show, while fans of the 9 p.m. show viewers might show up early and discover the 8 p.m. one. But if the 9 p.m. slot would reach more viewers by focusing on a different genre, despite losing a greater share of the 8 p.m. audience, then a network would swap in a different show. This mercenary approach was furthered by the finite number of timeslots. A broadcast network, for example, had only 15 hours of “primetime” programming per week (8–11 p.m., Monday to Friday), meaning they could pick 22 or so shows to air a week (some of these lasted 30 minutes, hence 22 exceeding the 15 hours). For related reasons, it was relatively simple for pundits to predict whether a given primetime show might be renewed or canceled—they’d just compare its rating to the average rating of a network’s overall primetime programming.

Streaming services operate very differently from a linear network. The majority of revenue comes from a monthly subscription, meaning success comes from earning another month’s fee, not another minute of attention. The slate, meanwhile, has no preset cap of 15 hours or thirty 30-minute blocks. And the service? It wants every single video watcher to subscribe, and because it can air multiple titles at the same time, picking one target audience doesn’t mean ignoring all others. These attributes produce considerable complexity because they make performance metrics so much more diffuse. As such, we see different approach to programming and investment.

In the streaming era, a show’s viewership still matters: more people watching means more customer engagement. But a streamer’s goal is to maximize both ARPU (which is advantaged by engagement growth) and the number of subscribers. This means that, keeping all things constant, it’s better to have two hit shows addressing different customer segments than to hit the same segment with two hit shows. But at the same time, it’s better to richly address several segments than thinly address many. Data is essential to these objectives, enabling a service to understand exactly which customer segments are consuming which sorts of content and what additional content they need to retain these engaged customers. Consideration is also paid to how a title contributes directly to retention and/or prompts additional consumption.

Data is also invaluable when it comes to predicting the headroom for an ongoing series. A streamer will review how much of its overall subscribership and a series’ likely viewers have already tried the title as well as how many viewers complete it versus just sample it or finish the first three episodes, and whether additional marketing spend or on-platform promotion leads to greater-than-average clickthroughs.

Such complex mathematics explain why so many streamer decisions feel inscrutable, for instance why so many seemingly popular series end up canceled or otherwise end after only a few seasons. Why HBO Max removed hundreds of episodes of Looney Tunes, while also leaving a few hundred episodes on the service; why services such as Disney+ are reducing, not growing, their Marvel output despite the fact it has more subscribers and revenue than ever. And so on.

But can "complex mathematics" make better shows? Or at least, tell us more about better shows?