UCI Rokustic and Firetastic Dataset

The UCI Rokustic and Firetastic dataset is a set of network traces resulting from systematic, automated tests of the top-1000 apps on the Roku and Fire TV smart TV platforms. Each file in the dataset contains the network traffic of a single app. The dataset is used in the paper “The TV is Smart and Full of Trackers: Measuring Smart TV Advertising and Tracking,” Proceedings of the Privacy Enhancing Technologies Symposium (PoPETs) 2020, Issue 2. This page details the dataset’s format and provides a form to request access to the dataset (at the bottom of the page). By requesting access to the dataset, you agree to the terms of the dataset license below.

Dataset Structure and Format

The UCI Rokustic and Firetastic dataset is made available as two sets of JSON files, distributed as two zip archives (“roku_blocklistjson_top50.zip” and “firetv_blocklistjson_all.zip”).

Each JSON file’s name identifies the app the traffic contained in the file pertains to:

  • For Roku, the naming convention is “app-<ID>.json”, where
  • For Fire TV, the naming convention is “<package name>__page<page number>-out.tshark.nomoads.json” (note: two underscores), where
    • <package name> is the app’s root package name, but using underscores instead of periods as separators (e.g., com_cbs_ott instead of com.cbs.ott).
    • <page number> is an integer identifying the page on the Amazon Web Store where the app appeared at the time the experiment was conducted (the Amazon Web Store uses pagination, displaying 50 apps per page). Some apps from the same developer/parent organization use identical root package names for multiple apps, and the page number therefore helps distinguish between such apps.

The JSON format is described in detail here.

Testbed Setup and Device IPs

Roku

Roku traffic was captured by setting up a Raspberry Pi as a wireless access point and router. The Pi’s wired network interface connects the Pi (and in turn the Roku) to the WAN. The Pi’s wireless network interface acts as a wireless access point to which the Roku is connected. The IP of the Roku device is 192.168.4.85, and the IP of the Raspberry Pi is 192.168.4.1. Traffic was recorded by running tcpdump on the Pi’s wireless network interface. The setup is depicted in the figure below.

Fire TV

Fire TV traffic was recorded using AntMonitor, which performs traffic interception by setting up a (local) layer-3 VPN. The IP of the virtual interface (TUN) established by AntMonitor is 192.168.0.2. As a result, even though app testing was performed in parallel on multiple Fire TV devices, the (virtual) IP of the Fire TV device is always 192.168.0.2 across all files (irrespective of what physical device was used for any particular app). The setup is depicted in the figure below.

License

The UCI Rokustic and Firetastic data sharing agreement is inspired by a similar one from CAIDA. This is a basic policy to which you must agree before we give you access to any part of our dataset.

UCI ROKUSTIC AND FIRETASTIC DATASET ACCEPTABLE USE AGREEMENT for DATA COLLECTED BY UCI NETWORKING GROUP.

Usage of this dataset is subject to agreeing to the following terms.

LICENSE

UCI Networking Group authorization to access the data grants You a limited, non-exclusive, non-transferable, non-assignable, and terminable license to copy, modify, and use the data only for non-profit research and education. No license is granted for any other purpose and there are no implied licenses in this Agreement. Nothing in this License is intended to limit any rights You may have arising from fair use or due to other limitations on UCI Networking Group’s exclusive rights under copyright law or other applicable laws. UCI Networking Group has the authority and reserves the right, in its sole discretion, to discontinue further access and use to anyone who violates this AUA.

You will not disclose the dataset to any other person other than those employed by your institute who are collaborating with you using the dataset. Other entities must request access to the dataset separately using our form below.

You will make no attempts to reverse engineer, decrypt, or otherwise identify any personal information in the UCI Rokustic and Firetastic dataset. We have done our best to de-anonymize the dataset to protect our systems. However, if you find any remaining vulnerabilities or credentials in the dataset, you must responsibly disclose them to us.

If You create a publication (including web pages, papers published by a third party, teaching material, and publicly available presentations) using data from this dataset, You must cite the corresponding paper as follows:

@article{varmarken2020smarttv,
  title={{The TV is Smart and Full of Trackers: Measuring Smart TV Advertising and Tracking}},
  author={Varmarken, Janus and Le, Hieu and Shuba, Anastasia and Markopoulou, Athina and Shafiq, Zubair},
  journal={Proceedings on Privacy Enhancing Technologies},
  volume={2020},
  number={2},
  year={2020},
  publisher={De Gruyter Open}
}

We also encourage You to provide the UCI Networking Group with a link to your publication. We use this information in reports to our funding agencies.

DISCLAIMER OF WARRANTIES. UCI NETWORKING GROUP USES ITS BEST EFFORTS TO PROVIDE DATA IN ACCORDANCE WITH ETHICAL PRINCIPLES AND SCIENTIFIC INTEGRITY. HOWEVER, THE DATA PROVIDED HEREIN IS ON AN “AS IS” BASIS. NEITHER UCI ROKUSTIC AND FIRETASTIC DATASET, ITS RESEARCHERS, RESEARCH PARTNERS, LICENSORS, AND DATA PROVIDERS, NOR THE UNIVERSITY OF CALIFORNIA AND ITS TRUSTEES, OFFICERS, EMPLOYEES, AND AGENTS MAKE ANY WARRANTY, EITHER IMPLIED OR EXPRESS, OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, INCLUDING, BUT NOT LIMITED TO, THE ACCURACY, TIMELINESS, COMPLETENESS, RELIABILITY, OR AVAILABILITY OF UCI ROKUSTIC AND FIRETASTIC DATA, APPLICATIONS, OR SERVICES ACCESSIBLE THROUGH OR MADE AVAILABLE BY UCI NETWORKING GROUP.

LIMITATION OF LIABILITY. TO THE EXTENT ALLOWED BY LAW, IN NO EVENT SHALL UCI NETWORKING GROUP AND THE UNIVERSITY OF CALIFORNIA BE LIABLE TO YOU OR ANY THIRD PARTY FOR ANY INDIRECT, CONSEQUENTIAL, INCIDENTAL, SPECIAL OR PUNITIVE DAMAGES, ARISING FROM YOUR USE OF THE DATA.

If You have any questions about the data or about this Public Agreement, please email smarttv.uci@gmail.com.

Access the Data

To access the data, please fill out the form below. Note that by filling out the form, you agree to our Privacy Policy and the dataset license above. The dataset is hosted on Google Drive. Once you submit the form, you will be redirected to Google Drive from where you can download the dataset. Please do not share the Google Drive link with anyone else. Instead, please refer any other interested party to this access form. Keeping track of dataset accesses is important for us as it facilitates accurate reporting to our funding agencies.

Please use your university/business email. Gmail and others will not be accepted.