UCI Rokustic and Firetastic Dataset: JSON Subset

The UCI Rokustic and Firetastic Dataset is made available in JSON format. The JSON files contain a subset of the packets in the complete network traces (pcap files):

  • Only includes outbound TCP traffic.
  • Only includes packets for which the domain name of the destination can be identified (HTTP requests, or TLS Client Helloes with SNI extension).

We adopt and augment the JSON format used by NoMoAds.  Each packet is represented as single JSON object, nested within a root JSON object. The packet JSON object contains basic network and transport layer information, the destination’s domain name, HTTP header data (if HTTP request), together with additional fields described in detail below.

An example of a packet in our JSON format is shown below for each platform.

Roku

    "085a9711-e31c-4e66-9803-9e64672772a4": {
        "dst_ip": "172.217.11.174", 
        "dst_port": 80, 
        "firebog_advertising_tracking_abp": 1, 
        "headers": {
            "accept": "*/*", 
            "content-length": "0", 
            "content-type": "application/x-www-form-urlencoded", 
            "host": "www.google-analytics.com", 
            "user-agent": "Roku/DVP-9.0 (REDACTED_OS_VERSION)"
        }, 
        "host": "www.google-analytics.com", 
        "method": "POST", 
        "moaab_abp": 1, 
        "moz_disconnectme_abp": 1, 
        "piholeblocklist_default_smarttv_abp": 1, 
        "pii_types": "['Serial Number', 'Device OS Version']", 
        "protocol": "http", 
        "src_ip": "192.168.4.85", 
        "stopad_abp": 1, 
        "tcp.stream": 63, 
        "ts": "1555441644.779616000", 
        "uri": "/collect?v=1&sr=1280x720&ul=en_US&je=0&fl=-&dt=text_twist&cid=REDACTED_SERIALNUMBER&z=1313334912&tid=UA-43336621-16&t=event&ec=game&ea=menu"
    }

FireTV

    "0147bb18-bdd4-47f4-996b-68f475705289": {
        "ad": "0", 
        "dst_ip": "205.180.87.146", 
        "dst_port": 80, 
        "firebog_advertising_tracking_abp": 1, 
        "headers": {
            "accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8", 
            "accept-encoding": "gzip, deflate", 
            "accept-language": "en-US", 
            "connection": "keep-alive", 
            "host": "adsx.greystripe.com", 
            "upgrade-insecure-requests": "1", 
            "user-agent": "Mozilla/5.0 (Linux; Android 5.1.1; AFTT Build/LVY48F; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/59.0.3071.125 Mobile Safari/537.36", 
            "x-requested-with": "com.odesys.solitaire.ads"
        }, 
        "host": "adsx.greystripe.com", 
        "is_foreground": "False", 
        "method": "GET", 
        "moaab_abp": 1, 
        "moz_disconnectme_abp": 1, 
        "package_name": "com.odesys.solitaire.ads", 
        "package_version": "4.10.2", 
        "piholeblocklist_default_smarttv_abp": 1, 
        "pii_types": "['Device ID']", 
        "platform": "android", 
        "protocol": "http", 
        "rule": "no match", 
        "src_ip": "192.168.0.2", 
        "stopad_abp": 1, 
        "tcp.stream": 43, 
        "ts": "1556814158.774000000", 
        "uri": "/openx/www/delivery/ia.php?z=1556814158546&mhid=REDACTED_DEVICEID&s=fs&source=port%20tablet%20wifi%20Android-5.1.1%20asdk-2.4.2%20land&hid=REDACTED_DEVICEID&guid=713363fe-5906-436f-a6a2-11f72774049f&screen_size=1920x1080&lang=en&res=2.0"
    }

In addition to the information extracted from the IP, TCP, and HTTP headers, the JSON for each packet contains the following extra information :

  • firebog_advertising_tracking_abp” – this label indicates whether the Firebog blocklist blocks the request. 1 = blocked, 0 = not blocked.
  • moaab_abp” – this label indicates whether the Mother of All Adblocking blocklist blocks the request. 1 = blocked, 0 = not blocked.
  • piholeblocklist_default_smarttv_abp” – this label indicates whether the default Pi-hole blocklist blocks the request. 1 = blocked, 0 = not blocked.
  • stopad_abp” – this label indicates whether the StopAd blocklist blocks the request. 1 = blocked, 0 = not blocked.
  • pii_types” – indicates which types of personally identifiable information (PII) were found in the packet. Since we used a test device and test accounts, no data from actual users have been collected (i.e., no human subjects were involved). In addition, we also redacted any PII value to the best of our ability, maintaining only the PII type. For instance, in the Roku example above, the Serial Number value was replaced with “REDACTED_SERIALNUMBER,” and the only information retained is that this packet contained some PII of type “Serial Number.”
  • package_name” (Fire TV only) – the package name of the app/process responsible for the connection that packet pertains to. Note that this is not always the same as the actual app that is running (e.g., due to system activity).
  • package_version” (Fire TV only)  – the version number of the app provided by the “package_name” field.

Please note that some information in the JSON are due to the legacy of the format and is unused:

  • ad“, “rule” : IGNORE