Video and User Insights for Media: AI and Big Data Analytics

As data becomes increasingly ubiquitous for media companies broadcasters are increasingly looking for more effective and creative ways to use that data to optimize all aspects of their business. At Prizma we have been innovating around collecting and using data to optimize video discovery since we launched, but along the way we realized that we could use the same systems to generate actionable analytics for product, editorial and marketing teams - leading us to begin developing our new Media Intelligence Layer. Prizma was invited to co-present our new offering at NAB on stage alongside the Google Cloud product team.

The Prizma Media Engagement and Intelligence Solution is powered by artificial intelligence (AI) and enables premium media brands and digital platforms to effectively engage with their target users with content across various digital environments and to develop a deep understanding of their audience.

Our solution can be leveraged for a variety of business use cases, from predictive behavioral analytics that inform editorial teams on what content to create, how and where to market their video content, how to optimize the distribution of media assets, or create an automated personalized content discovery experience on their owned and operated digital networks.

The Prizma Offering:

Many media clients are interested in using a combination of content metadata, user data (demographics, usage behavior, psychographics), video views, web traffic data, and monetization data to answer critical business questions, such as:

  • Which kinds of videos resonate with different audience segments? Are there specific categories, topics, or personalities that seem to generate better (or worse) engagement with different target segments?
  • How can we compare performance across a variety of distribution channels to extract generalized, usable insights for various teams and uses?
  • What kinds of videos should my content teams be producing (enable rapid editorial response to viewer demand and predictive performance analytics)?
  • Which videos should I distribute by platform?
    Which traffic sources give me the most engaged users?
  • How can we predict video engagement to help inform editorial, distribution and marketing decisions (especially by user segment)? 

Google Cloud Media Offering:

The Google Cloud Media and Prizma teams demonstrated how media customers can easily extract business insights using the Google Cloud stack and data analytics pipeline (especially data from Google Services, such as YouTube, Google Analytics or DoubleClick), while layering Prizma’s Media Intelligence Solution on top of user and video data collected by Prizma. Using BigQuery to compile the data from multiple sources and Data Studio for rapid and flexible visualization, the Prizma team was able to demonstrate how, over a particular time period, different stories, celebrities, and topics resonated with audiences on YouTube vs. O&O and provided insights on how to create higher levels of engagement for different audience segments by platform.

The two teams also demonstrated the seamless integration of these components together, and how the combined offering can deliver better business results quickly.

    Subscribe to Prizma Blog Updates

    Modeling Long-Term Drivers of User Engagement

    Video engagement is driven by many factors. On the one hand, topics can increase in popularity based on events and social traction in ways that are highly variable, but on the other hand, users also have fairly stable preferences over the kinds of content they engage with in general. At Prizma, in addition to using highly responsive adaptive learning systems to continuously respond to what your users are engaging with right now, we also utilize advanced machine learning algorithms to predict the longer-term, deeper and more abstract drivers of your users' interests.

    We track a variety of user interactions with our videos, and use this data over relatively longer periods of time to train models to detect persistent and more general patterns in what your users find interesting and enjoy watching. These models provide us a priori estimates of video performance even before any user data is collected. This enables us to ensure high user engagement as as soon as new content is available

    The feature space used in these models consist of a variety of textual features extracted from video metadata, including keywords, sequences of words, closely related words and important people, places and things to understand the topic and substance of a video. We include detailed standardized relationships between key entities which enable us to understand more deeply their more abstract characteristics that are interesting to your users. We also use these features to help go beyond and infer other psychographic dimensions which include "motivations", i.e. the reasons why someone might be watching a video.  When we build our models , these broader, more abstract features are often some of the most important pieces of information for predicting viewer engagement.

    Below is a simple visualization of the relative contributions of thousands of video features including keywords, named entities and Prizma’s psychographic features to video engagement on one of our partners over the last six weeks.  As you can see the vast majority of these features are relatively neutral, with a handful of salient features showing real positive or negative predictive value over time.

    In this model, among the top features contributing to video engagement were: 1- motivations, including “wanting to laugh”, “wanting to take care of yourself”, and “wanting to know other people's opinions”  and 2 - details about the key celebrities, e.g. whether they are comedians or politicians. With respect to predictive power, these more abstract features often carry more weight than the more specific counterparts in the traditional metadata column, exceeding even those that are highly correlated to these features.

    The ability to estimate video’s performance a priori significantly reduces the time and data required to maximize user engagement. This is especially important for environments where the specific popular topics vary rapidly, for example news sites, or where less data is available due to low traffic, or for partners with large and rapidly growing video libraries. This is also helpful when the usual context or personalization based signals are weaker, for example on a site’s home page.

    These estimates have resulted in significant improvements in video engagement for our partners. In one A/B test, we compared the performance of our recommendations with and without predicted scores on tens of thousands of users. We found that using these long-term performance predictors increased number of initiated views by about 20%, but they had an even larger impact on the completion rate for those views which increased by >40%. Since these models emphasize the deeper interests of your users, while also improving click rates, we are able to create much larger improvements in further downstream signals of user retention. The results are summarized below.

    The success of these algorithms is determined partly by our extensive, in-depth human-centered feature space. Because these dimensions are interpretable and usable, these algorithms can provide deep actionable insights about the wider interests and motivations of your users. We will not only continue to use these models to ensure the best experience for your users, but we also hope to provide insights from these models to help you understand your users better.

    Subscribe to Prizma Blog Updates

    How we built our analytics pipeline


    At Prizma, analytics are our lifeblood.  We collect up to 50M events per day, which is used to display contextually relevant and personalized video content. They let us track performance in real-time, continuously improve our recommendations, enable personalization  as well as provide critical metrics to our partners via the Prizma dashboard.

    We needed a solution for storing this data that allowed us to query it in real time while managing costs (after all, we’re a startup).  We explored a number of different solutions before we found one that was a fit for us.  This blog post will explore our process and share our final conclusions. The intended audience is other engineers and data scientists, although we won’t get too far into the weeds technically.

    Choosing a data warehouse

    The most important decision to make when designing our analytics infrastructure was choosing a data warehouse.  We had been using, a managed solution for storing and aggregating event data.  However, we found ourselves approaching the limit of queries we could run over our data. .  Anything more complex than a single unnested SELECT statement would require custom code to orchestrate the execution.  Queries that joined our event data with other sources of data were infeasible.

    Another sticking point was pricing. We were being charged by the number of events ingested and our event volume was putting us at the limit of our pricing bucket.  We didn’t want our decisions about what data to collect to be driven by cost, and furthermore, we knew that the underlying storage and bandwidth were cheap enough that there had to be another cost effective solution.

    Having had positive experiences with columnar data stores previously, I knew that they were the way to go for Prizma’s data warehouse.  Since we’re a small team and didn’t want to manage our own infrastructure, this left us deciding between Amazon RedShift and Google BigQuery, the two most popular managed columnar data stores.

    RedShift vs BigQuery

    RedShift is Amazon’s product in this space.  It runs on virtual machines that Amazon provisions on your behalf.   BigQuery, on the other hand, is a fully managed service.  You don’t have to worry about virtual machines, you just give BigQuery your data and tell it what queries to run.  We are heavy AWS users, which would make RedShift seem to be a more attractive option, however the pricing concerned us.  In order to model the total cost, you need to know how many instances you’ll need, but Amazon’s documentation is of little help with this. All they tell you is that the number and type of instances you need depends on the queries you will run.  In other words, to determine our pricing, we’d have to build a RedShift cluster and test out real queries on real data.  BigQuery, on the other hand, is priced on the amount of data accessed by your queries.  This is straightforward to estimate if you know roughly the size of your data sets and what queries you’ll be running, without actually having to build anything out.

    Since we wouldn’t be able to do an accurate price comparison without investing engineering resources in RedShift, and since two of our engineers already had experience with BigQuery, BigQuery was the clear choice.  We also liked that the billing model meant that we wouldn’t be paying for compute time when no queries were being run.   There were some other BigQuery features that helped sway us, like support for streaming inserts and nested and repeated record types.

    Event pipeline

    Now that we had settled on a data warehouse, we needed a way to get our events into it.  We were already using fluentd as our event collector, which meant that changing our data store  was just a simple configuration change.  We had a choice here between using BigQuery’s streaming inserts feature or regular load jobs.  With streaming inserts, you can add records as often as you’d like, with or without batching.  On the other hand, load jobs are free, but require batching since you are limited in the number of jobs you can run per day.  In the end, we decided that even though we could batch inserts with fluentd, streaming inserts were cheap enough that it wasn’t worth worrying about hitting any limits with load jobs.


    Fluentd is an open-source daemon that sits between data sources, like event streams or application logs, and data stores, like S3 or MongoDB.  It decouples the concerns of data collection and storage, while handling details that don’t fit nicely into the request-oriented nature of web applications, like batching.  It’s also blazingly fast, with an advertised throughput of around 13K events/second/core. Since Fluentd already had a plugin for BigQuery, we were able to set up our configuration change and have events written to BigQuery with only a few hours work (mostly setting up access credentials).  We also used fluentd to stream events to our backup storage on S3.

    The pipeline

    The pipeline



    Building this pipeline, we ended up optimizing for simplicity and flexibility. This allowed us to get an event aggregation solution off the ground that allowed us to collect a large amount of data and process it in real time while managing costs. However, since we don’t pre-aggregate any data, our queries end up performing some redundant calculations.  If we did pre-aggregate, we would have to choose between aggregating in real-time or in batches, each with its own downside.  With real time aggregations, new metrics will have to be backfilled, and batched aggregations mean forgoing real time metrics.  In the future, we may explore using tools like Google Cloud Dataflow  which has a novel computational model that can be used for both real time and batch processing, potentially offering the best of both worlds.

    Subscribe to Prizma Blog Updates

    Why you should care about psychographics? Taking a human centered approach to engagement

    At Prizma we are always trying to understand the “why” of what people are watching.  It’s one thing to know that people are watching a lot of videos about politics, but it’s another to know whether what they’re watching is coming from a desire for information, or driven by outrage around the news, or the desire to empathize with other people.  Understanding that “why” is part of how Prizma is able to drive consistently high performance while surprising and delighting users.  


    In order to do this, we use machine learning to generate a “psychographic” feature space that covers some of the underlying reasons why users might be engaging with content.  This enhanced feature space informs our entire approach to both recommendations and optimization and allows us to pinpoint not only which videos are doing well, but why.

    Screen Shot 2017-01-20 at 12.12.42 PM.png

    For example, when we think about people’s preferences we often think about topics, or perhaps their favorite celebrities (including the ones that they love to hate) as what drives their engagement.  However, these kinds of tags are often incapable of capturing the emotions or driving forces behind that engagement.  When we looked at data from the last several weeks on one of our partner’s sites we found increased performance from videos that covered politics and the incoming President, both did a little shy of 50% better than videos that didn’t cover those subjects, but when we considered our tag of “outrage” we were able to identify the videos that drove nearly 3× the engagement of other videos. It is clear that while people have a renewed interest in politics, “outrage” is one of the key emotions that gets users to watch videos in the current climate.

    Screen Shot 2017-01-18 at 8.12.21 PM.png

    In practice, our ability to infer these more general features translates into tangible value for our customers. In one A/B test we compared the performance of our recommendations pipeline with and without our extended feature space. We found that by utilizing psychographic features we were able to increase the number of initiated views by 10% and the number of completed views by >30%, indicating that while these features are important to the “click”, they are even more important for generating sustained user attention and engagement.

    We have calibrated and honed our psychographic dimensions based on how well they describe and distinguish our partners’ content; with the aid of intuitive psychological models that use some of the language used by users and creators alike to describe content. This human-centered approach has the capacity to provide actionable insights for our partners. We train these models on highly diverse data and utilize these dimensions throughout our pipeline. Using our psychographic dimensions reduces the resources and data points required to generate more abstract representations of both videos, and user preferences  - allowing us to create high-quality video discovery experiences for any publisher and environment.

    We believe that this human-centered approach to understanding content is the key to driving deeper video engagement. As we expand our offering, we hope to go beyond using these psychographics to inform our own optimizations, to provide deeper insights to content creators and advertisers helping them to understand their users better, and create the most engaging content for their audience. 


    Subscribe to Prizma Blog Updates

    A Prizma Powered Year in Review

    By all accounts, 2016 was both a year to remember and one that many of us would like to forget. There were outstanding surprises, such as Beyoncé captivating the world and social media with the surprise release of her new album and short film, Lemonade. There were also surprise outcomes that still leave many of us wondering and aghast, most notably, the turn of events in the 2016 Presidential Election and the campaign media cycle that all of us endured (thank you Saturday Night Live for the comedy relief!). 

    We sat on the edge of our seats and watched the astounding Chicago Cubs win their first World Series since 1908. Of course, what is 2016 without the artistic phenomenon of Hamilton, the musical about the life of Treasury Secretary and founding father, Alexander Hamilton, which set Broadway afire and represents a prescient political and cultural statement at a critical moment in American history.

    Prizma works with some of the best content creators in the industry. To commemorate the inspiring as well as the challenging moments of 2016, we wanted to feature some of the best videos of the year from our partners, videos that we think best capture the feelings and sentiments of 2016. 

    Enjoy! Best wishes to all in 2017.


    1. One of the most touching videos of the year, this video features an observant and compassionate teen who uses his wits and kindness to save a woman from her kidnapper. [Source: Complex Media] 

    2. Dear Prince, Thank you. Remembering your Purpleness. [Source: Complex Media]

    3. The 2016 election was one twist and turn after another but when the dust settled Dave Chappelle and Kate McKinnon emotionally awed audiences on SNL [Source: Complex]

    4. And in one of the years most stunning developments, the Chicago Cubs came from behind in the Series and won Game Seven. [Source: ABC7 Chicago]

    5. We learned some cool things in how the pyramids were built! [Source: Discovery Digital Networks]

    6. ...and what it will be like to be a human being in 2116!  [Source: Discovery Digital Networks] 

    7. Beyoncé sent Twitter and the music world into a frenzy with her surprise album and short film, Lemonade [Source: Complex]

    8. We all heard what President Elect Donald Trump had to say about Muslims during his campaign...but did you hear what these Muslim girls had to say to him? [Source: Fusion]

    9. Keira Knightley protested the Photoshopping of women's bodies with a very unique photo shoot. [Source: Complex Media] 

    10. In time for the Holiday Season, here are Kids trying 100 years of Christmas desserts - cute and hilarious! [Source: Bon Appetit]

    11. Shakira says Zootopia has an important message about defying stereotypes

    Subscribe to Prizma Blog Updates

    Introducing Per-Video Analytics with Prizma

    When you're creating lots of videos, it's often helpful to see how they're performing in aggregate. How are you tracking against your goal views for this month? But sometimes seeing how just one video is doing is just as important: how is our new series doing? Are viewers watching the whole thing? Prizma Analytics now makes it possible to quickly answer these questions on a per-video basis. 

    Top & Trending Videos Modals

    Now, when you visit your Top Videos and Trending Videos panels, you can see exactly how they're doing. Simple click on them to summon a modal like this one:

    The metrics you expect are there: 

    • Views: this is the Views metric you've been seeing with Prizma all along, a representation of the number of play events each video has had. This means a user clicked a thumbnail (or watched a whole video and started another one) to see your video.
    • Total impressions: this is the number of time's this video's thumbnail was seen by viewers, whether or not they ended up clicking on it.
    • View rate: this is some simple division: the number of views divided by the number of impressions. This gives you an idea of how often viewers are actually watching the video when they have the chance to.

    But the big feature is the Audience Retention graph: it shows you, at any given five-second interval, how many viewers were still tuned in to the video. Additionally, clicking any point on the graph will cause the video to play from that spot, meaning any stark changes in activity can be traced directly to the content. Would a video benefit from being clipped? Are certain segments in need of retooling? These are the questions new per-video analytics enable you to answer, and adjust your video strategy accordingly.

    What do you want to know about your video performance? How does knowing about individual performance affect your content creation strategy? Let us know in the comments!

    Subscribe to Prizma Blog Updates

    Prizma is Google Tag Manager friendly!

    Here at Prizma we’ve been working hard to make it easier for a publisher to integrate our widget on their site. Today we are pleased to announce that Prizma can be used in Google Tag Manager! This means you, the publisher, can quickly and easily configure our tag within your account and specify where on the page you want the widget to trigger. So how does it work? A detailed walk through can be found here

    Getting started

    First, log in to Google Tag Manager and choose the Container that will contain the Prizma snippet. Then, create a new tag, select the Custom HTML Tag option, and drop in the script. Keep in mind that the id parameter can be named whatever you like. Next, create a trigger that will either fire on all pages or some pages. For optimal performance, we highly recommend enabling it on all pages. Once completed, save and publish the tag.

    You're almost there

    Now, pat yourself on the back because you’ve set up Prizma in Google Tag Manager. But wait! There are just two more steps and you’ll be home free.

    Within your website’s HTML structure, insert the Prizma div tag where you want the widget to automatically appear on the page. Please make sure the div’s id parameter matches the script’s id parameter you set in Google Tag Manager. Then, go to your website and you should see the Prizma widget in all its glory displaying the most compelling and relevant content.

    Still have questions?

    Head on over to our documentation page. It provides a more detailed step by step instructions on how to implement. As always, if something is unclear feel free to send us an email at

    Subscribe to Prizma Blog Updates

    Track your Audience Interests at a Glance with Prizma Analytics 2.1

    At Prizma, we strive to not only promote content that your customers will enjoy — we also aim to empower you, the publisher, to engage with your core audience in exciting new ways. That's why today we're excited to announce a new addition to the Prizma Dashboard: the Trending Now Module. With it you can now track the stories, movements and topics that are on your audience's minds at this very moment — we make real-time engagement and marketing as easy as a glance.

    The Trending Now Module currently consists of two related Panels, with more to arrive in the near future. The Trending Videos Panel shows you the  most engaging video content of the moment. One glance shows you the  topics your audience wants to see 'right now'. The other panel is the Trending Personalities Panel, which further analyzes your trending content to see who's the driving force behind the latest  trend, looking across multiple videos to go deeper in what your audience is engaging with. The red bars show you the relative  “trending score” of each personality . Soon we hope to add further Panels to simplify your real-time engagement even further.

    As with the rest of the Prizma Dashboard, the Trending Videos and the Trending Personalities Panels are implemented with Prizma's proprietary continuous learning platform. We analyze the people, places, and things that are buzzworthy right now based on our sophisticated natural language processing (NLP) technology which automatically parses and identifies real-world entities. We measure and analyse your traffic and adjust our model to fit your Web site's individualized needs — whether you are a small blog or an established media behemoth, the Prizma Platform customizes and reinvents itself to precisely suit you and your audience.

    Give it a go today — understand and interact with your audience at lightning speed!

    Subscribe to Prizma Blog Updates

    A new way to track and understand your videos: Prizma Analytics 2.0

    Today marks the official launch of version 2.0 of the Prizma dashboard, and it's one of our most feature-packed product updates ever.

    There are tons of insights that Prizma's data can help surface for partners, but until now, most of them were hidden away on our servers rather than displayed front and center. With the goal of showcasing the kinds of information partners can report on, use, and learn from, we've completely reorganized the dashboard and packed it with new insights. The new Prizma dashboard is organized around areas of insight into three main modules: Views, Audience, and Videos.

    Views module

    The views module is your go-to spot for understanding how many people who see a Prizma widget convert to viewing a video, the breakdown of devices your views come from, and other metrics from version 1.0. Everything's presented in a way that makes metric-to-metric relationships much more clear.

    There's also plenty of new stuff: overlay the previous month's performance on this one's to get a sense of how you're tracking relative to your goals, see how many videos each viewer is watching, and how long they're sticking around. These metrics, plus fine date controls and device filters make learning about how much video your audience is watching easier than ever.

    Audience module

    The audience module is your headquarters for understanding the people who are watching your content: where they're coming from, both on the Web and around the world. Discover what content categories interest them most, what sources (Facebook, Google, other websites) convert to video views, and which countries are watching the most.

    We're really excited about the Audience panel. Expect it to grow lots over the coming weeks and months. We'll be surfacing even more of what we understand about your videos—the secret sauce that makes our recommendation engine an industry leader—to help you understand what's resonating, and create more of it. 

    Videos module

    Finally, the videos module now includes panels to help you understand your library—how many videos Prizma has ingested, which ones are garnering the most views, and even which ones are trending. And for partners big and small, we're launching the ability to play videos directly from the dashboard, because sometimes there's no context for video analytics like the videos themselves.

    Wrapping it up

    With a more powerful, more useful dashboard, Prizma today is better than ever. And of course, there's plenty more exciting news to come for Prizma Analytics. Click below to get started today.

    Are there things you want your video engagement platform to track? Let us know in the comments!

    Subscribe to Prizma Blog Updates