NoMoAds Dataset

Data Format

This dataset is the one used in the paper “NoMoAds:Effective and Efficient Cross-App Mobile Ad-Blocking,” in the Proc. of PETS 2018.  We manually interacted with 50 popular Android apps on a test phone and using test accounts (no human subjects involved). We captured packet traces and then extracted and saved fields from each packet (primarily from HTTP/S and IP headers), together with additional fields (such as a label indicating whether or not the packet contains an ad request, which app generated the packet, etc.)

The NoMoAds dataset provides the aforementioned information in  JSON format, building on and extending the JSON format used by ReCon. An example of a packet in our JSON format is shown below.

    "12f889b4-146f-49f2-ae58-226ee2abdb94": {
        "ad": 1, 
        "domain": "mopub.com", 
        "dst_ip": "192.44.68.3", 
        "dst_port": 80, 
        "headers": {
            "accept-encoding": "gzip", 
            "accept-language": "en", 
            "connection": "Keep-Alive", 
            "host": "ads.mopub.com", 
            "user-agent": "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 6 Build/MMB29X; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/44.0.2403.117 Mobile Safari/537.36"
        }, 
        "host": "ads.mopub.com", 
        "method": "GET", 
        "package_name": "com.BeresnevGames.Knife", 
        "package_responsible": "com.BeresnevGames.Knife", 
        "package_version": "1.5", 
        "pii_types": [
            "Advertiser ID"
        ], 
        "uri": "/m/ad?v=6&id=25dd9ac3ec824016b8b553acb9cbe35a&nv=4.15.0&dn=motorola%2CNexus%206%2Cshamu&bundle=com.BeresnevGames.Knife&q=m_gender%3Ao&z=-0700&o=p&w=1440&h=2560&sc_a=3.5&ct=2&av=1.5&udid=ifa%3Axxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx&dnt=0&mr=1&android_perms_ext_storage=0"
    }

In addition to the information extracted from HTTP/S and IP headers, the JSON for each packet contains the following extra information :

  • ad” – this is the label indicating whether or not this packet contains an ad request. Please see the paper for details of how these labels are obtained.
  • pii_types” – indicates which types of personally identifiable information (PII) were found in the packet. Since we used a test phone and test accounts, no data from actual users have been collected (i.e., no human subjects were involved). In addition, we also redacted any PII value to the best of our ability, maintaining only the PII type. For instance, in the above example the Advertiser ID value was replaced with “xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx,” and the only information retained is that this packet contained some PII of type “Advertiser ID.”
  • package_responsible” – the package name of the app responsible for fetching the ad.
  • package_name” – the package name of the app responsible for the HTTP connection that fetched the ad. Note that this is not always the same as the “package_responsible.” Sometimes apps use Google apps to fetch ads for themselves.
  • package_version” – the version number of the app provided by the “package_name” field.

License

The NoMoAds data sharing agreement is inspired by a similar one from CAIDA. This is a basic policy to which you must agree before we give you access to any part of our dataset.

NOMOADS ACCEPTABLE USE AGREEMENT for DATA COLLECTED BY NOMOADS

Usage of this dataset is subject to agreeing to the following terms.

LICENSE

NoMoAds authorization to access the data grants You a limited, non-exclusive, non-transferable, non-assignable, and terminable license to copy, modify, and use the data in accordance with this Public Agreement. No license is granted for any other purpose and there are no implied licenses in this Agreement. Nothing in this License is intended to limit any rights You may have arising from fair use or due to other limitations on NoMoAds’s exclusive rights under copyright law or other applicable laws. NoMoAds has the authority and reserves the right, in its sole discretion, to discontinue further access and use to anyone who violates this AUA.You will make no attempts to reverse engineer, decrypt, or otherwise identify any personal information in the NoMoAds dataset.

If You create a publication (including web pages, papers published by a third party, and publicly available presentations) using data from this dataset, You should cite the corresponding paper as follows:

@article{shuba2018nomoads,
  title={{NoMoAds: Effective and Efficient Cross-App Mobile Ad-Blocking}},
  author={Shuba, Anastasia and Markopoulou, Athina and Shafiq, Zubair},
  journal={Proceedings on Privacy Enhancing Technologies},
  volume={2018},
  number={4},
  year={2018},
  publisher={De Gruyter Open}
}

We also encourage You to provide the NoMoAds Team with a link to your publication. We use this information in reports to our funding agencies.

DISCLAIMER OF WARRANTIES. NOMOADS USES ITS BEST EFFORTS TO PROVIDE DATA IN ACCORDANCE WITH ETHICAL PRINCIPLES AND SCIENTIFIC INTEGRITY. HOWEVER, THE DATA PROVIDED HEREIN IS ON AN “AS IS” BASIS. NEITHER NOMOADS, ITS RESEARCHERS, RESEARCH PARTNERS, LICENSORS, AND DATA PROVIDERS, NOR THE UNIVERSITY OF CALIFORNIA AND ITS TRUSTEES, OFFICERS, EMPLOYEES, AND AGENTS MAKE ANY WARRANTY, EITHER IMPLIED OR EXPRESS, OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, INCLUDING, BUT NOT LIMITED TO, THE ACCURACY, TIMELINESS, COMPLETENESS, RELIABILITY, OR AVAILABILITY OF NOMOADS DATA, APPLICATIONS, OR SERVICES ACCESSIBLE THROUGH OR MADE AVAILABLE BY NOMOADS.

LIMITATION OF LIABILITY. TO THE EXTENT ALLOWED BY LAW, IN NO EVENT SHALL NOMOADS AND THE UNIVERSITY OF CALIFORNIA BE LIABLE TO YOU OR ANY THIRD PARTY FOR ANY INDIRECT, CONSEQUENTIAL, INCIDENTAL, SPECIAL OR PUNITIVE DAMAGES, ARISING FROM YOUR USE OF THE DATA.

If You have any questions about the data or about this Public Agreement, please email nomoads.uci@gmail.com.

Access the Data

To access the data, please fill out the form below and we will email you the data. Note that by filling out the form, you agree to our Privacy Policy.

Please use your university/business email. Gmail and others will not be accepted.