Background
There are already 65 million people using Netflix across more than 50 countries, and they watch a combined total of over 100 million hours of television shows and movies every single day. According to some estimates, Netflix is accountable for one-third of all peak-hour Internet traffic in the United States. The data that is collected and analysed from the viewing behaviours of millions of subscribers is used to study our viewing patterns. However, the amount of information provided by Netflix is enormous in more ways than one. Because it utilises such data in conjunction with sophisticated analytic approaches, Netflix qualifies as a genuine Big Data company.
How exactly does Big Data come into play in the process of finding a solution to the issue?
William Goldman, one of the most successful screenwriters in the history of Hollywood, is credited with saying: “Nobody, nobody – not now, and not ever – understands the least goddam thing about what is or is not going to work at the box office.”
Since then, Netflix has made it their mission to disprove him by basing their entire company on properly anticipating our viewing habits. He said this before the advent of the Internet and Big Data, but since then, Netflix has made it their purpose to debunk him.
Is there any way that big data can be applied in the real world?
A quick look at Netflix’s careers page is all it takes to see how seriously the company takes matters pertaining to data and analytics. Teams are established with people who are experts in applying analytics to a range of different business domains, such as personalization, messaging, content distribution, device analytics, and so on. These experts are then brought together to work on problems. Netflix uses Big Data in every aspect of the business, but determining what content viewers will want to watch has been the holy grail for the company. Big Data analytics serves as the driving force behind these recommendation engines.
The first steps were taken in 2006, when the company was still operating primarily as a DVD mailing service (streaming began a year later). When they sought to predict how their customers would rate a movie based on their previous ratings, they announced the Netflix Prize, which awarded $1 million to the group who came up with the best algorithm. This allowed them to predict how their customers would rate a movie. The concepts that underpinned the winning entry are still an essential component of the recommendation engine, despite the fact that the algorithms are constantly being improved upon and extended.
In the beginning, analysts were limited in what they could do because of the limited amount of data they were given to work with (customer ID, movie ID, rating, and date of movie viewing). When streaming took over as the primary means of transmission, a vast amount of information on their customers that had previously been unavailable became accessible. Netflix was able to construct models that could anticipate the “perfect storm” of customers constantly being given with movies they would enjoy as a result of having this additional information at their disposal. In the end, customers who are happy with the product or service are much more likely to continue making payments for it.
Netflix makes use of a method called “tagging” to assist in determining which movies its subscribers will enjoy watching. Employees of the company are compensated for watching movies and adding tags to the movies based on the substance of the movies that they see. They will suggest other works that are comparable to the ones you enjoyed since they have similar qualities. This is the origin of the “suggestions,” which are sometimes presented in a manner that is a little out of the ordinary. Are you looking for a comedy aimed at adolescents that features a strong leading lady? It’s also the reason why the site recommends (very frequently in my case) that I watch movies with poor ratings from other users. This seems to run counter to their intention of entertaining me, which was to have been the result of their actions. However, the weighting of these scores has been surpassed by the prediction that the content of the movie will appeal to audiences. Netflix has effectively established 80,000 new movie “micro-genres” by using user preferences as the primary factor.

Netflix has, however, been transitioning its brand identity as of late to emphasise the company’s function not only as a distributor of material but also as a creator of original content. In addition, this strategy is greatly influenced by statistics, which showed that subscribers go absolutely bonkers for shows directed by David Fincher and starring Kevin Spacey. These findings established the basis for this method. After winning the bidding battle for House of Cards, Netflix was so certain that the show would be a good fit for their prediction model of the “perfect television programme” that they decided to bypass the pilot episode of the show and instead order two full seasons of the series (for a total of 26 episodes). Netflix employed data to influence every stage of the production process; for example, the show’s cover art incorporates a colour palette that was meticulously curated and selected to peak the viewer’s attention.
The end goal for Netflix is to raise the overall amount of time that its subscribers use the service each and every month. The statistics don’t lie when they reveal that users who don’t use the service very often are more likely to terminate their subscriptions because they don’t believe they are getting their money’s value from the service. For the purpose of achieving this objective, models are constructed to study how the “quality of experience” is affected by a wide range of conditions. Your opinion on how the content’s physical location affects your viewing experience can be used to inform estimations regarding data placement, which in turn helps ensure that the best possible service is provided to the most number of homes possible.
What were the findings?
In a letter to the company’s shareholders that was sent in April of 2015, Netflix detailed the effectiveness of its Big Data approach. They added 4.9 million subscribers in the first quarter of 2015, which is an increase over the four million consumers they added in the first quarter of 2014. Netflix attributes a large portion of their success to their “constantly evolving content,” which includes shows like “House of Cards” and “Orange is the New Black.” This one-of-a-kind content is a big contributing factor in bringing in new members and retaining the ones we already have. At least one item of Netflix’s original content has been viewed by 90 percent of the company’s members. It is obvious that a significant portion of their success is due to their ability to accurately predict what their audience will enjoy.
What about their key indication, which is the sum of the amount of time that all of their users spend using the service? Users of Netflix watched a total of 10 billion hours of content over the first three months of 2015. As Netflix continues to hone its Big Data strategy, it is highly conceivable that an increase of this magnitude will take place.
What Kind of Data Was Utilized?
The content that users watch, when they watch it, how long they spend picking movies, how often playback is halted (by the user or due to network constraints), and ratings are all factors that go into the formulation of recommendation algorithms and the selection of content. In order to evaluate the overall quality of the viewing experience, Netflix gathers customer data such as rebuffer rate (which refers to the delays caused by buffering), bitrate (which determines the visual quality), and consumer location.

What exactly are the nuts and bolts of the analysis?
Their vast collection of movies and episodes of television shows is kept on Amazon Web Services (AWS), but it is also copied on a number of servers and Internet service providers (ISPs) located in different parts of the world. ISPs save money since they do not have to download the content from the Netflix server before providing it to viewers at home. This not only enhances the user experience but also reduces the amount of lag that is experienced when streaming video throughout the world.
As of 2013, it was speculated that their entire collection took up more than three petabytes of space. Because of the vast number of devices that are capable of streaming Netflix content, the company stores many of its titles in up to 120 different video codecs. This accounts for the massive amount of data that is stored.
In the beginning, they relied on Oracle databases, but later on, they switched to NoSQL and Cassandra so that they could do more complex, Big Data-driven analyses of unstructured data.
Kurt Brown, who is in charge of leading the Data Platform team at Netflix, gave a presentation at the Strata + Hadoop World conference. In it, he emphasised the company’s ongoing efforts to improve the quality of its data infrastructure. In addition to more traditional business intelligence tools like Teradata and MicroStrategy, Netflix’s data architecture includes a number of Big Data technologies such as Hadoop, Hive, and Pig. Additionally included are the open-source innovations from Netflix known as Lipstick and Genie. In addition, just like the rest of Netflix’s critical infrastructure, it is hosted in the cloud by Amazon Web Services (AWS). Netflix has plans to examine Spark in the future for use cases involving streaming, machine learning, and analytics. At the same time, the company intends to develop additional capabilities for their own open-source suite.
What were the challenges in finding this solution?
The early success of Netflix can be attributed to the fact that the company recognised early on that a lot of valuable data is also stored in the messy, unstructured content of video and audio. This is the case despite the fact that a large portion of the metadata collected by Netflix (such as which actors a viewer likes to watch and what time of day they watch films or TV) is straightforward, easily quantifiable structured data.
It was necessary to have a measurable representation of this information in order for it to be able to be utilised in computerised analysis and therefore for its potential value to be realised. Netflix was able to accomplish this by paying thousands of people to watch hours of TV and tag everything interesting that they saw while doing so.
These paying customers watched the show and then annotated a 32-page guidebook with notes about the various themes, topics, and motifs they saw on screen, such as a hero having a religious epiphany or a strong female character making a difficult moral choice. The guidebook was then given to other customers who also paid to watch the show. Netflix has categorised its content into about 80,000 “micro-genres,” such as “comedy films with talking animals” and “historical dramas with gay or lesbian themes.” Based on this information, Netflix has developed its classification system. Netflix is now able to more accurately predict what you will like to watch based on your film choices than it was previously able to do so simply by knowing, for instance, that you enjoy thriller or spy movies. This provides a context within which the seemingly chaotic data may be assessed statistically and is one of the fundamental notions underlying Big Data.
It has been reported that Netflix has already begun automating this process by developing scripts that can capture the footage in JPEG format and analyse what is happening on screen using cutting-edge technology such as facial recognition and colour analysis. This was done in response to reports that Netflix has already begun automating this process. These screen captures can be taken at predetermined intervals or anytime the user performs an action, such as pausing the playback or starting it back up again. If it discovers that a user, like other people who share their profile, is likely to tune out during violent or sexually explicit scenes, it will propose content that is more relaxing the next time that person sits down to watch something on their device.
Important Ideas and Concepts
The television networks, distributors, and producers all make a lot of money by gambling on what people will want to see next (all roles that Netflix now fill in the media industry). You can rest assured that competitor services like as Hulu, Amazon Instant Box Office, and (eventually) Apple are all hard at work perfecting and growing their own analytics systems, despite the fact that Netflix is the dominant provider in the market. The fierce competition that exists today is expected to propel the development of predictive content programming in the years to come.
Netflix has begun building the framework for “personalised TV,” in which each viewer will have their own entertainment schedule based on data acquired about their viewing behaviour. This information will be collected by Netflix as they monitor viewers’ viewing patterns. TV networks have been talking about this topic for years, but now that we live in the era of Big Data, it is finally becoming a reality.