OVRseen Datasets

The following datasets serve as OVRseen datasets of the paper “OVRseen: Auditing Network Traffic and Privacy Policies in Oculus VR,” published in the Proc. of USENIX Security Symposium 2022. These datasets were collected in February/March 2021; thus, they may not capture the current situation of the ecosystem (e.g., there might be differences in terms of apps popularity/reviews, number of apps that have a privacy policy, etc.). All of the datasets are bundled in one zipped file.

List of Apps

The first dataset consists of spreadsheets containing the raw lists of apps from Oculus and SideQuest app stores (we extracted them using the crawler scripts) and the curated lists of top apps that we used during our collection period (these top apps were selected based on their reviews/popularity). These lists of top apps correspond to Section 2 “App corpus” in our paper.

Network Traffic Dataset

The second dataset is the network traffic dataset that corresponds to Section 3.2.1 in our paper. We manually interacted with 140 most popular VR apps on a test Oculus Quest 2 device using test accounts (no human subjects involved). We captured packet traces with additional fields that contain extra information, such as:

  • package_name” – the package name of the app responsible for the HTTP connection.
  • package_version” – the version number of the app provided by the “package_name” field.
  • pii_types” – indicates which types of personally identifiable information (PII) were found in the packet. Since we used a test device and test accounts, no data from actual users have been collected (i.e., no human subjects were involved).

Privacy Policies

The third dataset is a collection of privacy policy documents that were collected from 102 apps that have a privacy policy (during our collection period). These documents are in the form of HTML files. They correspond to Section 4 “Collecting privacy policies” in our paper.

Manual Validations

The fourth dataset consists of spreadsheets that contain the manual validation statistics for both PoliCheck and Polisis. These manual validation results correspond to Section 4.1 “Validation of PoliCheck results (network-to-policy consistency)” and Section 4.2 “Validation of Polisis results (purpose extraction)” in our paper.

Intermediate Outputs

The fifth dataset contains OVRseen’s intermediate outputs when we ran OVRseen’s entire pipeline on our network traffic and privacy policy datasets. We provide these intermediate outputs for convenience, in case, one wants to quickly observe our results without re-running OVRseen.

  • Post-processing: This stage’s intermediate outputs include a CSV file (i.e., all-merged-with-esld-engine-privacy-developer-party.csv) containing TCP flows extracted from the network traffic dataset (i.e., PCAP files). Among others, each TCP flow contains app ID (i.e., APK file name), PII types, and endpoints.
  • Network-to-policy consistency: This stage’s intermediate outputs include: (1) text files generated from the pre-processing of privacy policy HTML files into text files (i.e., plaintext_policies.zip); and (2) a CSV file that contains data flows (i.e., policheck_flows.csv); both serve as inputs to PoliCheck. This stage also produces CSV files (i.e., policheck_result and policheck_result_with_referencing_oculus_and_unity_privacy_policies folders) that contain data flows classified into one of the five PoliCheck’s disclosure types (both with and without referencing Oculus and Unity privacy policies by default).
  • Purpose extraction: This stage’s intermediate outputs include: (1) JSON files (i.e., polisis_output.zip) containing annotated text segments from Polisis (www.pribot.org), and (2) a CSV file that contains the mapping from PoliCheck data flows into the annotated text segments from Polisis (i.e., policheck_results_w_purposes_expanded.csv).

License

The OVRseen data sharing agreement is inspired by a similar one from CAIDA. This is a basic policy to which you must agree before we give you access to any part of our dataset.

OVRSEEN ACCEPTABLE USE AGREEMENT for DATA COLLECTED BY OVRSEEN

Usage of this dataset is subject to agreeing to the following terms.

LICENSE

OVRseen authorization to access the data grants You a limited, non-exclusive, non-transferable, non-assignable, and terminable license to copy, modify, and use the data in accordance with this Public Agreement. No license is granted for any other purpose and there are no implied licenses in this Agreement. Nothing in this License is intended to limit any rights You may have arising from fair use or due to other limitations on OVRseen exclusive rights under copyright law or other applicable laws. OVRseen has the authority and reserves the right, in its sole discretion, to discontinue further access and use to anyone who violates this AUA.You will make no attempts to reverse engineer, decrypt, or otherwise identify any personal information in the OVRseen dataset.

If You create a publication (including web pages, papers published by a third party, and publicly available presentations) using data from this dataset, You should cite the corresponding paper as follows:

@inproceedings{trimananda2022ovrseen,
   title     = {{OVRseen: Auditing Network Traffic and Privacy Policies in Oculus VR}},
   author    = {Trimananda, Rahmadi and Le, Hieu and Cui, Hao and Tran Ho, Janice and 
                Shuba, Anastasia and Markopoulou, Athina},
   booktitle = {31st ${$USENIX$}$ security symposium (${$USENIX$}$ security 22)},
   year      = {2022}
 }

We also encourage You to provide the OVRseen Team with a link to your publication. We use this information in reports to our funding agencies.

DISCLAIMER OF WARRANTIES. OVRSEEN USES ITS BEST EFFORTS TO PROVIDE DATA IN ACCORDANCE WITH ETHICAL PRINCIPLES AND SCIENTIFIC INTEGRITY. HOWEVER, THE DATA PROVIDED HEREIN IS ON AN “AS IS” BASIS. NEITHER OVRSEEN, ITS RESEARCHERS, RESEARCH PARTNERS, LICENSORS, AND DATA PROVIDERS, NOR THE UNIVERSITY OF CALIFORNIA AND ITS TRUSTEES, OFFICERS, EMPLOYEES, AND AGENTS MAKE ANY WARRANTY, EITHER IMPLIED OR EXPRESS, OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, INCLUDING, BUT NOT LIMITED TO, THE ACCURACY, TIMELINESS, COMPLETENESS, RELIABILITY, OR AVAILABILITY OF OVRSEEN DATA, APPLICATIONS, OR SERVICES ACCESSIBLE THROUGH OR MADE AVAILABLE BY OVRSEEN.

LIMITATION OF LIABILITY. TO THE EXTENT ALLOWED BY LAW, IN NO EVENT SHALL OVRSEEN AND THE UNIVERSITY OF CALIFORNIA BE LIABLE TO YOU OR ANY THIRD PARTY FOR ANY INDIRECT, CONSEQUENTIAL, INCIDENTAL, SPECIAL OR PUNITIVE DAMAGES, ARISING FROM YOUR USE OF THE DATA.

If You have any questions about the data or about this Public Agreement, please email properdata@uci.edu.

Access the Data

To access the data, please fill out the form below and we will email you the data. Note that by filling out the form, you agree to our Privacy Policy.

OVRseen Dataset
Please use your (Gmail-based) university/business email, or a Gmail account.

This form is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.