Privacy-Preserving Mobile Data Crowdsourcing

Description

Motivation: A large number of services (such as wireless connectivity, location-based services, recommendations, etc.)  rely on data and measurements collected from mobile devices and shared with a centralized entity. Users enjoy these services at the expense of sharing data from their mobile devices, which increases their privacy risk. For example, users may want to receive a location-based service, but they may be concerned about sending their exact location to a server or third parties.

Intellectual Merit: In this project, we consider data collected from mobile devices, including information about the wireless network, as well as personal/user data. Our goal is to develop privacy-preserving techniques to obfuscate reported data, while still providing guarantees for the quality of the provided service. We seek to develop both the theoretical frameworks (privacy-utility formulation, optimization of obfuscation techniques, principled data minimization), as well as its applications (to wireless spectrum sharing, cellular signal maps, and personal information tracking). The project will produce models, algorithms, datasets, mobile system implementations and programming interfaces, which trade off utility vs. privacy in a principled and transparent way. Here is the NSF Abstract.

Broader Impacts: The project involves collaborations with major players in the industries of wireless networks and crowdsourced data and has potential for technology transfer. The plan for Broadening Participation in Computing  includes reaching out to women and undergraduates.

Team

Current members:

Past Members:

  • Rahmadi Trimanada, postdoc, currently with Comcast Research.
  • Evita Bakopoulou, PhD in Networked Systems, 2021; PhD Thesis on “Federated Learning for Mobile Data Privacy”; currently with Privacy Engineering at Google, Mountain View, CA.
  • Emmanouil Alimpertis, PhD Thesis on “Mobile Coverage Signal Maps”, Networked Systems, UCI 2020; currently with Apple, Cupertino, CA.
  • Anastasia Shuba, PhD Thesis on “Mobile Data Transparency and Control”, EECS, UCI 2019; currently with Broadcom, Los Angeles.
  • Yanqi Gu,  PhD student, Networked Systems.
  • Justin Ley, undergrad in CS, MS in Math.

Publications (by topic)

PriPrune: Quantifying and Preserving Privacy in Pruned Federated Learning

Federated learning (FL) is a paradigm that allows several client devices and a server to collaboratively train a global model, by exchanging only model updates, without the devices sharing their local training data. These devices are often constrained in terms of communication and computation resources, and can further benefit from model pruning – a paradigm that is widely used to reduce the size and complexity of models. We perform the first investigation of privacy guarantees for model pruning in FL and derive information- theoretic upper bounds on the amount of information leaked by pruned FL models. Based on our evaluations, we introduce PriPrune– a privacy-aware algorithm for local model pruning, which uses a personalized per-client defense mask and adapts the defense pruning rate so as to jointly optimize privacy and model performance. PriPrune is universal in that can be applied after any pruned FL scheme on the client, without modification, and protects against any inversion attack by the server.

Privacy by Projection: Federated Population Density Estimation by Projecting on Random Features.

We consider the problem of population density estimation based on location data crowdsourced from mobile devices, using kernel density estimation (KDE). In a conventional, centralized setting, KDE requires mobile users to upload their location data to a server, thus raising privacy concerns. Here, we propose a Federated KDE framework for estimating the user population density, which not only keeps location data on the devices but also provides probabilistic privacy guarantees against a malicious server that tries to infer users’ location. Our approach Federated random Fourier feature (RFF) KDE leverages a random feature representation of the KDE solution, in which each user’s information is irreversibly projected onto a small number of spatially delocalized basis functions, making precise localization impossible while still allowing population density estimation. We evaluate our method on both synthetic and real-world datasets, and we show that it achieves a better utility (estimation performance)-vs-privacy (distance between inferred and true locations) tradeoff, compared to state-of-the-art baselines (e.g., GeoInd). We also vary the number of basis functions per user, to further improve the privacy-utility trade-off, and we provide analytical bounds on localization as a function of areal unit size and kernel bandwidth.

Mobile Signal Maps. 

Example of Signal Map (LTE RSRP) from NYC, near JFK.

The first type of crowdsourced data we consider in this project are related to measurements of the quality of the cellular network itself. Mobile signal maps (for signal strength, coverage and other key performance indicators) are of great importance to cellular operators albeit expensive to obtain. Today, they are obtained primarily through crowdsourcing and third-party apps. Our earlier work focused on developing (i) tools for collecting and crowdsourcing network measurements from mobile devices and (ii) machine learning models, based on random forests and applied in a centralized way, to predict missing values. Our current work focuses on privacy-preserving techniques for crowdsourced mobile measurements, including federated learning (which is a natural fit for the decentralized crowdsourcing setting), secure aggregation, and data minimization.

Federated Mobile Packet Classification. 

The second type of mobile data we study in this project are mobile packet traces collected from mobile devices. We develop techniques to predict whether these packets contain personally identifiable information (personal information “exposure”) or ad requests (which is then used for ad blocking), based on fields of the HTTP packet. Prior state-of-the-art classifiers were trained in a centralized way, while we applied for the first time federated learning to this problem.

Extensions to Other End-Devices, Beyond Mobile

We also considered extensions and applications that consider data collection at a range of end-devices beyond just mobile devices. When possible, we observe activity at different levels: sensor data, network traffic, and commands exchanged between end-devices and the server. We also build classifiers, often across multiple network packts or sensor data. We report data collection practices and discuss their privacy implications

  • Smarthome/IoT : PingPong: R. Trimananda, J. Varmarken, A. Markopoulou, B. Demsky,  “Packet-Level Signatures for Smart Home Devices“, Proceedings of Network and Distributed System Security Symposium (NDSS) 2020. Feb. 2020, San Diego, CA. (acceptance rate ~18%) [slides] [dataset] [software]
  • VR Devices: BehaVR: Ismat Jarin, Yu Duan, Rahmadi Trimananda, Hao Cui, Salma Elmalaki, Athina Markopoulou BehaVR: User Identification Based on VR Sensor Data

Acknowledgements:

This work has been supported by the National Science  Foundation under Grant No. 1900654: “CNS Core: Medium: Collaborative Research: Privacy-Preserving Mobile Crowdsourced Data”, Oct. 1st 2019 — Sept. 31st, 2023. 

Disclaimer: Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.)

Point of Contact: please contact the PI, A. Markopoulou.

Last Updated: Mar. 26th, 2024