NoMoATS Dataset

Data Format

This dataset is the one used in the paper “NoMoATS: Towards Automatic Detection of Mobile Tracking,” in the Proc. of PETS 2020. We used the Droidbot automation tool to automatically interact with 307 popular Android apps on a test phone and using test accounts (no human subjects involved). We captured packet traces and then extracted and saved fields from each packet (primarily from HTTP/S and IP headers), together with additional fields (such as a label indicating whether or not the packet was sent by a advertising or tacking library, which app generated the packet, etc.)

The NoMoATS dataset provides the aforementioned information in JSON format, building on and extending the JSON format used by ReCon. An example of a packet in our JSON format is shown below (note that for easier viewing certain parts of the packet have been redacted, as indicated by ellipses).

   "207365b5-3c21-4b44-8600-115f4ec1aa07": {
        "ats": 1, 
        "ats_pkg": "com.google.android.gms.internal.ads", 
        "dst_ip": "18.233.123.55", 
        "dst_port": 443, 
        "headers": {
            "accept-encoding": "gzip", 
            "connection": "Keep-Alive", 
            "host": "impression-east.liftoff.io", 
            "user-agent": "Mozilla/5.0 (Linux; Android 7.1.1; Nexus 6 Build/N6F27M; wv)..."
        }, 
        "host": "impression-east.liftoff.io", 
        "method": "GET", 
        "package_name": "codematics.universal.tv.remote.control", 
        "pii_types": [
            "Advertising ID"
        ], 
        "trace": "...com.android.org.conscrypt.NativeCrypto.SSL_write(Native Method)...\ncom.google.android.gms.internal.ads...", 
        "ts": "1550807053.447762000", 
        "type": "libssl.so:SSL_write", 
        "uri": "/doubleclick/win_notice?ad_group_id=63161&channel_id=16&creative_id=74694&device_id_sha1=REDACTED_ADVERTISING_ID..."
    }

In addition to the information extracted from HTTP/S and IP headers, the JSON for each packet contains the following extra information :

  • ats” – this is the label indicating whether or not this packet was sent by an advertising or tracking (A&T) library. Please see the paper for details of how these labels are obtained.
  • ats_pkg” – indicates the package name of the A&T library responsible for sending the packet.
  • package_name” – the package name of the app responsible for sending the packet.
  • pii_types” – indicates which types of personally identifiable information (PII) were found in the packet. Since we used a test phone and test accounts, no data from actual users have been collected (i.e., no human subjects were involved). In addition, we also redacted any PII value to the best of our ability, maintaining only the PII type. For instance, in the above example the Advertiser ID value was replaced with “REDACTED_ADVERTISING_ID,” and the only information retained is that this packet contained some PII of type “Advertiser ID.”

License

The NoMoAds data sharing agreement is inspired by a similar one from CAIDA. This is a basic policy to which you must agree before we give you access to any part of our dataset.

NOMOADS ACCEPTABLE USE AGREEMENT for DATA COLLECTED BY NOMOADS

Usage of this dataset is subject to agreeing to the following terms.

LICENSE

NoMoAds authorization to access the data grants You a limited, non-exclusive, non-transferable, non-assignable, and terminable license to copy, modify, and use the data in accordance with this Public Agreement. No license is granted for any other purpose and there are no implied licenses in this Agreement. Nothing in this License is intended to limit any rights You may have arising from fair use or due to other limitations on NoMoAds’s exclusive rights under copyright law or other applicable laws. NoMoAds has the authority and reserves the right, in its sole discretion, to discontinue further access and use to anyone who violates this AUA. You will make no attempts to reverse engineer, decrypt, or otherwise identify any personal information in the NoMoATS dataset.

If You create a publication (including web pages, papers published by a third party, and publicly available presentations) using data from this dataset, You should cite the corresponding paper as follows:

@article{shuba2020nomoats,
  title={{NoMoATS: Towards Automatic Detection of Mobile Tracking}},
  author={Shuba, Anastasia and Markopoulou, Athina},
  journal={Proceedings on Privacy Enhancing Technologies},
  volume={2020},
  number={2},
  year={2020},
  publisher={De Gruyter Open}
}

We also encourage You to provide the NoMoAds Team with a link to your publication. We use this information in reports to our funding agencies.

DISCLAIMER OF WARRANTIES. NOMOADS USES ITS BEST EFFORTS TO PROVIDE DATA IN ACCORDANCE WITH ETHICAL PRINCIPLES AND SCIENTIFIC INTEGRITY. HOWEVER, THE DATA PROVIDED HEREIN IS ON AN “AS IS” BASIS. NEITHER NOMOADS, ITS RESEARCHERS, RESEARCH PARTNERS, LICENSORS, AND DATA PROVIDERS, NOR THE UNIVERSITY OF CALIFORNIA AND ITS TRUSTEES, OFFICERS, EMPLOYEES, AND AGENTS MAKE ANY WARRANTY, EITHER IMPLIED OR EXPRESS, OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, INCLUDING, BUT NOT LIMITED TO, THE ACCURACY, TIMELINESS, COMPLETENESS, RELIABILITY, OR AVAILABILITY OF NOMOADS DATA, APPLICATIONS, OR SERVICES ACCESSIBLE THROUGH OR MADE AVAILABLE BY NOMOADS.

LIMITATION OF LIABILITY. TO THE EXTENT ALLOWED BY LAW, IN NO EVENT SHALL NOMOADS AND THE UNIVERSITY OF CALIFORNIA BE LIABLE TO YOU OR ANY THIRD PARTY FOR ANY INDIRECT, CONSEQUENTIAL, INCIDENTAL, SPECIAL OR PUNITIVE DAMAGES, ARISING FROM YOUR USE OF THE DATA.

If You have any questions about the data or about this Public Agreement, please email nomoads.uci@gmail.com.

Access the Data

To access the data, please fill out the form below. Note that by filling out the form, you agree to our Privacy Policy. Once you submit the form, you will be redirected to Google Drive from where you can download the dataset. Please do not share the Google Drive link with anyone else. Instead, please refer any other interested party to this access form. Keeping track of dataset accesses is important for us as it facilitates accurate reporting to our funding agencies.