Monday, May 2, 2016

SETTING THE SCHEDULE - PILOT TESTING

The most dreaded part of the pilot process, especially for the creative executives, is sitting down with the Network research team and hearing the results of the testing on the shows that the creatives have devoted their time and efforts towards over the past half year. This is the moment when the voice of the viewers, represented by the research execs, is heard for the first time in the process. It’s often not very pretty. I have sat in several rooms where, following the test results, we all looked at each other and realized that we didn’t have the goods to put together a schedule and we needed to scramble. I have seen development executives updating their resumes in their mind while we go over the research.

Now don’t get me wrong. A lot of poor testing pilots get on the schedule every season either because we need to fill all the slots or because of a rejection of the testing. Here’s what I know to be true: A pilot that is rejected in testing will never succeed but a high testing pilot is no guarantee of success. We get a lot of false positives; we rarely get a false negative. The art of using testing is to look at all those shows that deliver an average test and determine if there is something in the data that indicates there may be a television show here. It is also important to understand why a pilot did not test as well as expected and then to determine if those issues can be addressed in series or are those issues at the essence of the show.

One of my favorite examples is “Mad About You”, a show that came along early in my scheduling career and ran for seven seasons despite having a below average test. We all loved the pilot but the testing was a disappointment. This was 1992, twenty-five years ago. The world and what was on television was different than it is today. Some of you may remember that Mad was about newlyweds Paul and Jamie Buchman. In the pilot the couple were wondering whether the spark was out of their marriage. The pilot ended with Paul and Jamie having sex on their kitchen counter while their family was in the next room of their small apartment.

That final scene put away the pilot in the eyes of the respondents to the pilot test. We really needed this show since we had  weak comedy development that year. What was clear from the testing was that the idea of a show about a recently married couple was appealing and the respondents liked the two leads. They just didn’t like them having sex on their kitchen counter. Problem solved. By the way “Mad About You was originally titled “Loved By You”. The pilot featured the James Taylor rather than the Marvin Gaye version because…. well you know. Anyway, when “Loved By You” didn’t clear I suggested, “That’s My Shiksa”. It was rejected.

The point of this is that there is an art to reading pilot testing results and often the only thing that programming execs hear is either it was a strong test or a weak test.
Strong testing pilots fail for several reasons. I have all sorts of theories as to why. The most obvious is the amount of money put on the screen for a pilot. The cost of the pilot far exceeds the budget of an episode. You often get a different director after the pilot so you have a different vision. I don’t know how many times I looked at someone in the screening room and said, “You’re never going to see that again.” A pilot can test well for several reasons but those reasons are sometimes not articulated to the producers by the creative executives and, by episode two, the show is off on a path that was not reflected in the testing.

I have always believed that the genre with the largest number of false positives is Sci-Fi/Fantasy. As a group these shows test above the average for drama pilots. They are generally more expensive than more conventional pilots but they are often concept driven rather than character driven. When you look at the testing among this group of shows you often see high scores for the “idea” of the show but mediocre scores for the characters. Most other genres are character driven so if you have strong leads you can overcome a show that has a conventional idea. Weaker testing procedurals will succeed more often than Sci-Fi/Fantasy because they are more character driven. Sci-Fi/Fantasy pilots have another flaw, which is that there is often a secret driving the show and once the secret is revealed the show is over. If the secret is not revealed fairly quickly to the audience viewers often start to wonder if the creators even know where the show is heading. “Alcatraz” was a pilot that we did at FOX and it went to series although no one, including the creators, had a clue as to why prisoners were returning from the past. So a good rule for me was that any pilot where the idea scores dominated the character scores had a better than average chance of failing. I have no idea what the testing was for “Lost” but I have to believe it scored high with several characters. Over at FOX “Prison Break” hit the sweet spot of a strong idea with several strong characters. “Fringe” found that sweet spot until it went off the tracks.

Execs often get very excited about these Sci-Fi/Fantasy shows. Marketing execs love them because they generally don’t have to do much work to sell them. They are noisy. Since they generally test well (even if for the wrong reasons) program execs generally put them on the schedule in their minds even before seeing the testing. That happened this season at FOX but my favorite example was a pilot called “Them”. It was an aliens among us concept. Melva Benoit who ran FOX research and reported to me, and I scratched our heads when we read the script and felt even more certain this was a disaster once we saw the pilot. Other executives did not share that opinion. They were convinced that they had a hit on their hands. That year, for whatever reason, the top execs asked us not to reveal the testing to anyone (including them) until after all the pilots were screened. The evenings before the testing results were going to be made public Melva and I sat down with the top two programming execs to share the results. “Them” was the lowest testing drama pilot that season and one of the lowest testing drama pilots ever. That’s saying a lot for a sci-fi pilot. The next morning, after we shared the research with the larger group, we were actually accused of fixing the results. That’s how strongly some believed in this pilot. Fortunately we had video of the focus groups that showed the reaction of groups to this pilot. It was not pretty. The results of the testing and the groups were so strong that the pilot did not make the schedule.

Over in comedy one of the biggest drivers of a false positive has to do with whether the pilot is about “People Together” or “People Apart”. A simpler way to put it is to determine whether it is a premise comedy pilot or not. A premise pilot (people apart) generally sets up the idea of the show and usually ends with the words “wait” “don’t go” “hold on” or some variation of those words as the star of the show is walking out the door. You often feel good at the end of a premise pilot but you have no idea what the series is and often the producers don’t either. “People Together” comedy pilots start with another day in the life of a group of people (family or friends) who care about each other. Some event may happen in their lives in the pilot (Rachael running into the Coffee Shop, Mitch and Cam bringing their adopted child home, Jess coming to the loft) but there’s no “wait” moment. These people like each other and care about each other and you do too. Those comedies are far more likely to result in a false positive. Two of my favorite comedy pilots in the last few years were “Modern Family” and “Jane the Virgin” both of which have strong, well-defined relationships at the start of the pilot.

Pilot testing often varies among the networks. When I was at NBC we would test the pilots on cable systems throughout the country. We would put the pilot on a channel and then recruit viewers in the market to tune in at a designated time to watch the pilot. We would then call them after the airing and they would answer a series of questions. Anyone on the cable system could wander on to the channel and watch the pilot. We knew we were on to something with ER (one of the highest testing pilots ever) when cable operators who were carrying the pilot told us that they were being deluged with calls from customers who came upon ER and wanted to know when the second episode would run.

When I first came to FOX we would send out cassettes to subjects who would then be contacted for their input. In recent years we have been doing mall intercepts throughout the country where the subject would watch the pilot and then answer questions on a screen. So what does the test tell us? I‘m sure all the networks do some version of asking the subjects to rank the show on a scale of Excellent to Poor. They then compare the score for a pilot with the average score of all previously tested pilots. Next they will ask a question to try to determine how much of an effort will the subject make to view another episode of this show. Next the characters are evaluated. There are generally norms for leads and support characters. High testing pilots are above the norms in Excellent, Special Effort and Characters (you want to see several characters pop). Finally a series of diagnostic questions will be asked to determine network fit, the strength of the idea, level of involvement etc.

We also do theater testing in the Los Angeles area. We try to recruit equal groups of men/women young/old (18-34-35-49). While they watch the pilot in a theater they are asked to move a lever up (positive) or down (negative) to express their feelings about what they are seeing.  At any point during the screening the subject can indicate that they has “tuned out” the show. Research execs and others watch screens with an overlay of the lines broken out into the four quadrants.

We would be looking at the age and gender split and we would be interested in the growth of the lines over the course of the pilot. In a perfect world you would want to see little difference in the four quadrants and you are looking for the line to build over the course of the pilot. You can see points where the pilot drags and how long it takes to get the subjects invested in what is going on. “New Girl” had the classic line with all four quadrants in sync and moving upward throughout the episode.

To me, the importance of audience testing was to try to find the “why” in a show. I was less concerned about what viewers liked about a specific show but what were the universal elements that can be found is all successful shows. I would often ask our Research Department (at both NBC and FOX) to do some testing on successful shows on other networks to see if we can get at the essence of why the show was working. Over the years I discovered a couple of recurring themes:
·      Ordinary people in extraordinary situations
·      Man  (or woman) on a mission
·      Fish out of water
I’ll leave it up to you to think of successful shows (both scripted and unscripted) where you find these elements.

For procedurals we have found that the core elements are:
·      Two leads with a pinch of sexual tension
·      One lead a cop, FBI, CIA whatever
·      The other lead has a “super power” used to solve crimes
·      Support group of really smart people.
I just want to be clear that if a procedural has these characteristics there is no guarantee of success, it just seems to increase the chances of success. By success of course I mean ratings.

According to Brandon Tartikoff “all hits are flukes”. It’s hard to argue with that but I have always looked at this business in terms of reducing failure and investing in success. Testing is one of the ways to do that. At FOX when we reported on the testing we always presented the data (after we showed the top line) in terms of what needs to be addressed if this pilot will be moving to series. The point being if these issues were not addressed we were probably increasing the chances that the pilot would not succeed in series.

At NBC Don Ohlmeyer asked Eric Cardinal, our head of research, and me to look at all of our pilot testing and see if we could come up with a set of “Research Homilies”. These were truisms that were found in the successful pilots. If none of these were found in the pilot the chances of success decreased. Here’s what we delivered to Don:



Against our wishes Don passed these out, as we were about to screen the pilots one year. It was not well received by the creative executives who thought we were reducing the pilot process to a cookbook. Someone even leaked the homilies to TV Guide. All we were trying to say was that successful pilots share several of these elements while failed pilots are often lacking in them…that’s all.

This business is going through significant changes and I think why and how, and even if we test shows, will change with it. I do know we need audience feedback that goes beyond ratings. It’s always helpful to understand why something is resonating with an audience. It’s dangerous to leave that totally in the hands of critics. You need to listen to the consumer. I have a feeling though that in the next few weeks there will still be a meeting where the band-aid will be painfully ripped off. There will be good news and bad news. There will be surprises. The only thing I hope is that everyone listens.


1 comment:

  1. Hi,

    In your experience, which methodologies are most robust for minimizing false negatives (FGD with first episode, FmRI, any others) Which methods are promising w.r.t decreasing false positives.

    I work in a country where culturally, people rarely like to talk "badly" about free samples. Hence, we can easily avoid false negatives (if a pilot rates badly with this kind of polite crowd, it is definitely not going get good ratings).But we do have huge problem of false positives (we try to benchmark the amount of "politeness" but still, there is room for improvement).

    Would love to know your opinion on the methodologies

    ReplyDelete