PingPong: Dataset

Dataset Description

The PingPong dataset is a set of network traces resulting from systematic, automated tests of 22 different smart home devices. The network traces (.pcap files) are accompanied by timestamp files that indicate when a specific type of functionality was triggered on the smart home device. This page details the dataset’s format and provides a form to request access to the dataset (at the bottom of the page). By requesting access to the dataset, you agree to the terms of the dataset license below.

Overview

The PingPong dataset is distributed as a single .tar.gz archive. The archive contains a directory tree that separates the .pcap files (and associated timestamp files) by evaluation experiment (please see our paper and technical report). At the root of the directory tree are two directories:

  • evaluation-datasets: contains the network traces that we produced ourselves using our smart home testbed.
  • negative-datasets: contains placeholder directories for the external datasets that we used for our “Negative Control Experiments” (i.e., experiments that validate the uniqueness of packet-level signatures).

The directory tree also contains empty output directories (such as result, analysessignatures). The purpose of these is to simplify reproducing our evaluation results: when the PingPong software is executed on the network traces in some directory, it will output its results to the corresponding output directory (i.e., a sibling directory of the directory that contains the network traces).

The “evaluation-datasets” Directory

This directory contains the network traces that we produced ourselves using our smart home testbed:

  • evaluation-datasets/local-phone: contains network traces resulting from controlling the smart home devices using the respective device vendor’s official Android application, on a smartphone that is connected to the local wireless network.
  • evaluation-datasets/remote-phone: contains network traces resulting from controlling (a subset of) the smart home devices using the respective device vendor’s official Android application, on a smartphone that is not connected to the local wireless network
  • evaluation-datasets/ifttt: contains the network traces for our home automation experiments (network traces resulting from controlling the IFTTT-compatible devices through IFTTT).
  • evaluation-datasets/same-vendor: contains network traces resulting from controlling 4 smart home devices from the same vendor (TP-Link) using the vendor’s official Android application, on a smartphone that is connected to the local wireless network.
  • evaluation-datasets/public-dataset: contains network traces from more recent (December 2019) “local-phone” experiments with the TP-Link plug, the WeMo Insight plug, as well as a new device not present in the original “local-phone” set: the Blink camera. These experiments were performed as part of our evaluation on a public dataset, the Mon(IoT)r dataset: we re-did the TP-Link and WeMo Insight experiments to enable a comparison of how signatures change over time and space (comparing the signatures extracted from experiments in our testbed with signatures extracted from the Mon(IoT)r dataset). Please see Section V.F of our paper for additional details.

All of the above directories (except same-vendor) contain a standalone directory and a smarthome directory:

  • standalone: contains network traces where only the smart home device in question is present on the network. These network traces make up the training set(s) for the smart home device.
  • smarthome: contains network traces where the smart home device is present alongside other (idle) smart home devices and (active) general purpose computing devices. These network traces make up the test/validation set(s) for the smart home device.

The standalone and smarthome directories contain a directory for each smart home device. The device directory in turn contains a directory for each tested functionality, e.g., the Sengled light bulb has one directory for “ON/OFF” and another directory for “light intensity” [note: this functionality demarcating directory is omitted for those devices where only single functionality was tested]. Finally, this directory contains (a subset of) the following directories:

  • wlan1 and/or eth0: contains network traces captured on the router’s wireless interface and/or its WAN interface, respectively.
  • timestamps: contains a text file indicating when the functionality was triggered. The timestamp format is “MM/dd/yyyy hh:mm:ss (A|P)M” (e.g., “02/17/2020 12:09:42 AM”). There is one timestamp/trigger per line. For binary functionality (e.g., ON and OFF), the trigger timestamps alternate between each value, starting with the “positive” value (e.g., for ON and OFF, the timestamps alternate between ON and OFF triggers, starting with ON).
  • signatures: empty directory where the PingPong software will output the signatures it extracts from the training sets.
  • analyses: empty directory where the PingPong software will output log files that give insight into the training process.

The “negative-datasets” Directory

This directory contains placeholder directories for the external datasets that we used for our “Negative Control Experiments” (i.e., experiments that validate the uniqueness of packet-level signatures). As we do not have licenses to redistribute these datasets, we ask that you obtain them directly from the sources. The placeholder directories contain READMEs describing how to obtain these external datasets.

License

The PingPong data sharing agreement is inspired by a similar one from CAIDA. This is a basic policy to which you must agree before we give you access to any part of our dataset.

PINGPONG DATASET ACCEPTABLE USE AGREEMENT for DATA COLLECTED BY UCI PROGRAMMING LANGUAGES RESEARCH GROUP AND UCI NETWORKING GROUP, JOINTLY “UCI PINGPONG TEAM”.

Usage of this dataset is subject to agreeing to the following terms.

LICENSE

UCI PingPong Team authorization to access the data grants You a limited, non-exclusive, non-transferable, non-assignable, and terminable license to copy, modify, and use the data only for non-profit research and education. No license is granted for any other purpose and there are no implied licenses in this Agreement. Nothing in this License is intended to limit any rights You may have arising from fair use or due to other limitations on UCI PingPong Team’s exclusive rights under copyright law or other applicable laws. UCI PingPong Team has the authority and reserves the right, in its sole discretion, to discontinue further access and use to anyone who violates this AUA.

You will not disclose the dataset to any other person other than those employed by your institute who are collaborating with you using the dataset. Other entities must request access to the dataset separately using our form below.

You will make no attempts to reverse engineer, decrypt, or otherwise identify any personal information in the PingPong dataset. We have done our best to de-anonymize the dataset to protect our systems. However, if you find any remaining vulnerabilities or credentials in the dataset, you must responsibly disclose them to us.

If You create a publication (including web pages, papers published by a third party, teaching material, and publicly available presentations) using data from this dataset, You must cite the corresponding paper as follows:

@article{trimananda2020packetlevelsignatures,
    title={{Packet-Level Signatures for Smart Home Devices}},
    author={Trimananda, Rahmadi and Varmarken, Janus and Markopoulou, Athina and Demsky, Brian},
    journal={Proceedings of the 2020 Network and Distributed System Security (NDSS) Symposium},
    year={2020},
    month={February}
}

We also encourage You to provide the UCI PingPong Team with a link to your publication. We use this information in reports to our funding agencies.

DISCLAIMER OF WARRANTIES. UCI PINGPONG TEAM USES ITS BEST EFFORTS TO PROVIDE DATA IN ACCORDANCE WITH ETHICAL PRINCIPLES AND SCIENTIFIC INTEGRITY. HOWEVER, THE DATA PROVIDED HEREIN IS ON AN “AS IS” BASIS. NEITHER PINGPONG DATASET, ITS RESEARCHERS, RESEARCH PARTNERS, LICENSORS, AND DATA PROVIDERS, NOR THE UNIVERSITY OF CALIFORNIA AND ITS TRUSTEES, OFFICERS, EMPLOYEES, AND AGENTS MAKE ANY WARRANTY, EITHER IMPLIED OR EXPRESS, OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, INCLUDING, BUT NOT LIMITED TO, THE ACCURACY, TIMELINESS, COMPLETENESS, RELIABILITY, OR AVAILABILITY OF PINGPONG DATA, APPLICATIONS, OR SERVICES ACCESSIBLE THROUGH OR MADE AVAILABLE BY UCI PINGPONG TEAM.

LIMITATION OF LIABILITY. TO THE EXTENT ALLOWED BY LAW, IN NO EVENT SHALL UCI PINGPONG TEAM AND THE UNIVERSITY OF CALIFORNIA BE LIABLE TO YOU OR ANY THIRD PARTY FOR ANY INDIRECT, CONSEQUENTIAL, INCIDENTAL, SPECIAL OR PUNITIVE DAMAGES, ARISING FROM YOUR USE OF THE DATA.

If You have any questions about the data or about this Public Agreement, please send an email to ics-pingpong@uci.edu or directly to the main author.

Access the Data

To access the data, please fill out the form below and we will email you a link to download the data. Note that by filling out the form, you agree to our Privacy Policy.

PingPong Dataset
Please use your (Gmail-based) university/business email, or a Gmail account.

This form is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.