FingerprinTV Dataset

The FingerprinTV dataset is a set of network traces resulting from systematic, automated tests of the top-1000 most reviewed applications (“apps”) on the Apple TV, Fire TV, and Roku smart TV platforms. The dataset is used in the paper “FingerprinTV: Fingerprinting Smart TV Apps”, Proceedings of the Privacy Enhancing Technologies Symposium (PoPETs) 2022, Issue 3 (referred to as “the FingerprinTV paper” throughout this document). The dataset comprises a total of 30,000 packet captures (PCAP files). This page summarizes how the dataset was collected, explains how the dataset is organized, and provides instructions for how to access the dataset. By accessing the dataset, you agree to the terms of the dataset license and acceptable use agreement below.

Dataset Collection

The diagram in Figure 1 below depicts the testbed that was used for collecting the FingerprinTV dataset. As shown in the diagram, the dataset collection was automatized using FingerprinTV code (which will be made publicly available at a later date). A summary of how this automation operates is provided below. See the FingerprinTV paper for additional details.

A preliminary to any app traffic data collection effort is to settle on what apps to test. To cover the most influential smart TV apps in the FingerprinTV dataset, the web interfaces of the three smart TV platforms’ app stores were crawled to determine all available apps, and the top-1000 apps of each platform, in terms of the number of user reviews submitted, were then selected for inclusion in the dataset. Note that some apps in the top-1000 could not be tested, e.g., discontinued apps and apps that would crash on launch. These apps were replaced by their runners-ups to include a total of 1000 functional apps for each platform.

Each of these 3 x top-1000 apps were then subjected to the same instrumentation. The instrumentation code was executed on a “Controller” device (a MacBook for Apple TV, and a Raspberry Pi for Fire TV and Roku). The Controller would also act as a wireless access point that the smart TV device would associate itself with. As such, all traffic of the smart TV would flow through the Controller. For each app, the instrumentation code would (1) install the smart TV app; (2) perform three “warm-up” launches of the app to dismiss terms of service and configuration screens; (3) launch the app 10 times while recording traffic (producing one PCAP file per launch containing the traffic that occurred within approximately 45 seconds after launching the app); (4) uninstall the app.

Figure 1: Overview of the testbed used for collecting the FingerprinTV dataset.

Dataset Organization

The FingerprinTV dataset is made available as three sets of PCAP files, one for each smart TV platform. These sets are packaged as ZIP archives, using the following filenames:

  • appletv_launch_samples.zip for Apple TV,
  • firetv_launch_samples.zip for Fire TV,
  • and roku_launch_samples.zip for Roku.

Each set (ZIP archive) contains 10,000 PCAP files. Each individual PCAP file is referred to as a launch sample. A launch sample contains the traffic that occurred when the respective app (as identified by the filename of the launch sample, see below) was launched. The dataset contains 10 launch samples for every app on all three platforms.

Note: Each PCAP file in the FingerprinTV dataset is a filtered version of a corresponding original “raw” PCAP file. This filtered file includes all IPv4 traffic from the raw file where the communicating entities are the smart TV device (see Figure 1) and some remote endpoint (i.e., any IPv4 address that is not in any of IPv4’s private address spaces). It also includes the smart TV device’s (IPv4) DNS exchanges, irrespective of whether the DNS server’s address is local or remote.

File Naming Convention

Each launch sample (PCAP file) follows a naming convention that embeds the ID of the app that the launch sample pertains to in the filename. The naming convention also embeds the launch sample number in the filename s.t. different launch samples for the same app have distinct filenames.

The naming convention used is app-<ID>-<SAMPLE>.pcap where <ID> is the app’s ID and <SAMPLE> is a 2-digit launch sample number (01 through 10). App IDs differ across smart TV platforms. For Apple TV, it is the string “id” followed by a series of digits, for Fire TV it is a 10 character alphanumeric string (an Amazon Standard Identification Number (ASIN)), and for Roku it is a simple integer value.

Device Models and Network Identifiers

The models and network identifiers of the devices in the FingerprinTV dataset are listed in the table below. The “Platform” column specifies what set of launch samples the device appears in. The “Role” column refers to what role the device assumes in Figure 1 above.

PlatformDeviceRoleIPv4 AddressMAC Address
Apple TVApple TV 4K (1st generation)Smart TV192.168.2.26c:4a:85:36:c0:14
Apple TVMacBook Pro Retina (15 inch, Mid 2012)Controller192.168.2.116:7d:da:ec:70:64
Fire TVFire TV Cube (2nd generation)Smart TV192.168.4.2b8:5f:98:83:03:95
Fire TVRaspberry Pi 4 (8GB RAM)Controller192.168.4.1dc:a6:32:c5:89:13
RokuRoku UltraSmart TV192.168.4.1884:ea:ed:1d:42:b5
RokuRaspberry Pi 4 (8GB RAM)Controller192.168.4.198:48:27:bc:fb:f4
Table 1: Network identifiers of the Smart TVs and Controllers (see Figure 1) in the FingerprinTV dataset.

App Metadata

App metadata, such as an app’s name and the developer, is made available in CSV format alongside the launch samples. There is one CSV file per platform. Each CSV file contains a superset of the following columns (there are additional platform-specific columns):

app_id, app_name, reviews_count, developer

The CSV files’ filenames indicate the smart TV platform the respective CSV file relates to:

  • appletv_apps_metadata.csv for Apple TV,
  • firetv_apps_metadata.csv for Fire TV,
  • roku_apps_metadata.csv for Roku.

The CSV files are sorted by the number of reviews submitted for each app (the reviews_count column). Please note that the Apple TV and Roku CSV files also contain entries for paid apps, so you will want to disregard those if you want to determine the top-1000 free apps. Also note that if you construct such a list, it will not match the apps included in the FingerprinTV dataset exactly as some broken apps had to be discarded and replaced by their runners-ups outside this top-1000 as explained in the Data Collection section above.

Remarks:

  • A few entries in the Roku CSV file have no data for the reviews_count column (no data was returned for this field when the Roku Channel Store was crawled). It is suggested that you treat such empty values as 0 reviews.
  • The CSV files for Apple TV and Roku contain entries for all apps discovered from crawling the respective app stores. The CSV file for Fire TV only contains entries for the top-1050 most reviewed apps. This is because some Fire TV app metadata (e.g., the developer) can only be obtained by visiting an app’s dedicated webpage, whereas basic metadata (i.e., app ID, app name, and number of reviews) can be crawled relatively fast by traversing a paginated list of available apps. To save time, additional app metadata was therefore only crawled for the apps that were included in the FingerprinTV dataset. For completeness, the list of all available Fire TV apps at the time of the crawl is also included in the FingerprinTV dataset (see the firetv_apps_all_available.csv file).

Browsing the App Stores for App Metadata

Alternatively, an app ID can be used to look up the web listing of an app on the three smart TV platforms’ app stores. As of this writing (March 2022), the URL patterns are (using <appID> to denote the part that should be replaced with an app’s ID):

  • Apple TV: https://apps.apple.com/us/app/<appID>?platform=appleTV. For example, the Pluto TV app’s ID is id751712884 and its web listing is thus https://apps.apple.com/us/app/id751712884?platform=appleTV
  • Fire TV: https://www.amazon.com/dp/<appID>. For example, the Pluto TV app’s ID (ASIN) is B00KDSGIPK and its web listing is thus https://www.amazon.com/dp/B00KDSGIPK
  • Roku: https://channelstore.roku.com/details/<appID>. For example, the Pluto TV app’s ID is 74519 and its web listing is thus https://channelstore.roku.com/details/74519 (You will likely be redirected to a different URL. This is because Roku has recently started using UUID-like identifiers for apps, but the old integer-based identifiers still work.)

Developer Parent Organizations

As part of the analysis in the FingerprinTV paper, it is examined if apps that share the same fingerprint stem from the same developer. This analysis considers two developers (as listed in the CSV files discussed earlier) to be the same if it can be determined that they pertain to the same parent organization. For example, the Fire TV developers “Scripps Networks, LLC”, “Discovery Communications”, and “OWN, LLC” are considered identical, since Discovery, Inc. owns a majority stake in these companies. For completeness, and to enable reproducibility, CSV files that map developer names to their parent organizations are included in the FingerprinTV dataset. There is one file per smart TV platform:

  • appletv_developer_parentorgs.csv for Apple TV,
  • firetv_developer_parentorgs.csv for Fire TV,
  • roku_developer_parentorgs.csv for Roku.

A fused version of the above three files, allplatforms_developer_parentorgs.csv, is also included. This file is used in the FingerprinTV paper when analyzing data from the three smart TV platforms as one large dataset.

These CSV files contain three columns:

developer, developer_parent_org, source

The developer column is the name of the developer for some app(s) in the app metadata CSVs discussed above (i.e., it maps directly to values in the developer columns of those files). The developer_parent_org column lists the name of the respective developer’s parent organization and should be identical for developers A and B if they are to be treated as the same developer. The source column is optional. It contains links to page(s) that were used to identify the relationship between a developer and their parent organization. Note that entries in this column may be multi-line.

Please note that these files were put together by manually looking up organizational relationships among developers, but only for apps that would share the same fingerprint. They should therefore not be considered exhaustive.

License and Acceptable Use Agreement (AUA)

The FingerprinTV dataset License and Acceptable Use Agreement is inspired by a similar one from CAIDA. This is a basic policy to which you must agree before we give you access to any part of the FingerprinTV dataset.

License

The UCI Networking Group’s authorization to access the FingerprinTV dataset grants You a limited, non-exclusive, non-transferable, non-assignable, and terminable license to copy, modify, and use the data only for non-profit research and education. No license is granted for any other purpose and there are no implied licenses in this Agreement. Nothing in this License is intended to limit any rights You may have arising from fair use or due to other limitations on the UCI Networking Group’s exclusive rights under copyright law or other applicable laws.

The UCI Networking Group has the authority and reserves the right, in its sole discretion, to discontinue further access and use to anyone who violates this AUA.

NON-DISCLOSURE. You will not disclose any part of the FingerprinTV dataset to any person other than those employed by your institute who are assisting or collaborating with you using the dataset. Other entities must request access to the dataset separately using the form below.

MANDATORY CITATION. If you create a publication (including web pages, papers published by a third party, teaching material, and publicly available presentations) using data from the FingerprinTV dataset, you must cite our paper as follows:

  • Title: FingerprinTV: Fingerprinting Smart TV Apps
  • Authors: Janus Varmarken, Jad Al Aaraj, Rahmadi Trimananda, Athina Markopoulou
  • Venue: Proceedings on Privacy Enhancing Technologies 2022

Or, in BibTex format:

@article{varmarken2022fingerprintv,
  title={{FingerprinTV: Fingerprinting Smart TV Apps}},
  author={Janus Varmarken and Jad Al Aaraj and Rahmadi Trimananda and Athina Markopoulou},
  journal={Proceedings on Privacy Enhancing Technologies},
  volume={2022},
  number={3},
  year={2022},
  publisher={Sciendo}
}

ANONYMIZATION. For any publication or other disclosure, you will anonymize or de-identify any credentials, authentication tokens, unique identifiers, and any other personally identifiable information you find in the dataset.

NO ABUSE. You will not attempt to use any information that can be derived from the FingerprinTV dataset for purposes that are different from non-profit research and education. This includes disclosing any form of credentials or personally identifiable information found in the FingerprinTV dataset, or using them for the purpose of gaining unauthorized access to any third-party services or systems. We have done our best to ensure that the dataset contains no data that could be used to compromise our systems by collecting the data in a dedicated testbed; however, if you find any vulnerabilities or credentials in the dataset, you must responsibly disclose them to us or the manufacturers of systems affected by them.

Disclaimer of Warranties

THE UCI NETWORKING GROUP USES ITS BEST EFFORTS TO PROVIDE DATA IN ACCORDANCE WITH ETHICAL PRINCIPLES AND SCIENTIFIC INTEGRITY. HOWEVER, THE DATA PROVIDED HEREIN IS ON AN “AS IS” BASIS. NEITHER THE UCI NETWORKING GROUP, ITS RESEARCHERS, RESEARCH PARTNERS, LICENSORS, AND DATA PROVIDERS, NOR THE UNIVERSITY OF CALIFORNIA AND ITS TRUSTEES, OFFICERS, EMPLOYEES, AND AGENTS MAKE ANY WARRANTY, EITHER IMPLIED OR EXPRESS, OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, INCLUDING, BUT NOT LIMITED TO, THE ACCURACY, TIMELINESS, COMPLETENESS, RELIABILITY, OR AVAILABILITY OF DATA, APPLICATIONS, OR SERVICES ACCESSIBLE THROUGH OR MADE AVAILABLE BY THE UCI NETWORKING GROUP.

Limitation of Liability

TO THE EXTENT ALLOWED BY LAW, IN NO EVENT SHALL THE UCI NETWORKING GROUP AND THE UNIVERSITY OF CALIFORNIA BE LIABLE TO YOU OR ANY THIRD PARTY FOR ANY INDIRECT, CONSEQUENTIAL, INCIDENTAL, SPECIAL OR PUNITIVE DAMAGES, ARISING FROM YOUR USE OF THE DATA.

Access the Dataset

To access the FingerprinTV dataset, please fill out the form below. Note that by filling out the form, you agree to the UCI Networking Group’s Privacy Policy and the dataset license and acceptable use agreement above. The FingerprinTV dataset is hosted on Google Drive. Once you submit the form, you will be redirected to Google Drive from where you can download the dataset. Please do not share the Google Drive link with anyone else. Instead, please refer any other interested party to this access form. Keeping track of dataset accesses is important for the UCI Networking Group as it facilitates accurate reporting to its funding agencies.

FingerprinTV Dataset

This form is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.