Project: Network Sampling and Construction Methods for Inference and Anonymization


Intellectual Merit: The goal of this project is to study network data that are generated in the context of mobile and/or online social networks. The project develops methods for (i) network sampling to facilitate inference for network structure and/or attributes and, conversely, for (ii) construction of networks with target characteristics. The methods aim at improving the state-of-the-art in network inference and network data anonymization, with target application domains primarily mobile and social network data. Here is the NSF Abstract.

Broader Impacts: These methodologies and their applications have implications for user privacy, specifically for mobile and online social networks and datasets. The project trains Ph.D students and interacts with relevant companies. PI Markopoulou created a related advanced graduate class on network science and data privacy, inspired by and synergetic to this project. Software has been made publicly available (e.g., in NetworkX).


Past Members

  • Yue Ye, “Time-varying Networks: Measurement, Modeling, and Computation”, PhD’19 in Computer Science, UCI.
  • Balint Tillman: “PhD Thesis: Graph Construction using the dK-Series Framework,” PhD’19 in Networked Systems, UC Irvine,  (currently with Google)
  • Luca Baldesi: visiting graph student (currently with the University of Trento)
  • Minas Gjoka: research scientist (currently with Google)
  • Francis, Lee, Phd student Sociology Dept.
  • Xuhong Zhang, PhD in EEECS, (now a postdoc at Univ. of Colorado)
  • Emily Smith, PhD in Sociology Dept. (currently a postdoc at UNC)
  • MS Theses: Sathyamurthy Nivedhita (now at Facebook), Sushruth Gopal, Milad Mehradabi.
  • Undergraduate students: Stephanie Harrington, Justin Ley

Publications (by topic)

Graph Construction Methods:

Goal and Main results: (i) 2K+ graph construction and (ii) Spectral Graph Forge. Both families of methods can create synthetic graphs that resemble real (primarily online social) graphs. The dk-series framework fixes local properties (2K  fixes the joint degree matrix, 2K+ also fixes the number of triangles, 3K was shown to be NP-Complete), while the spectral approach preserves global properties (in our papers: the modularity).

Methods for Sampling,  Inference, Incomplete Network Data

Goal and Main Results: This part of the project deals with the intertwined problems of modeling networks from sampled or incomplete data and of developing sampling or simulation algorithms with known statistical properties.  Major results from this project have included novel techniques for scalable estimation of dynamic network models from sampled or incomplete data, adaptation of missing-data techniques to held-out evaluation for network models, new methods for high-quality simulation from exponential family models, and new approaches for inferring network properties from minimal data sources.

  • Yin, Fan;, Phillips, Nolan E; and Butts, Carter T. (2019). “Selection of Exponential-Family Random Graph Models via Held-Out Predictive Evaluation (HOPE)”.  arXiv:1908.05873.
  • Yu, Yue; Smith, Emma J.; and Butts, Carter T.  (2019).  “Retrospective Network Imputation from Life History Data: The Impact of Designs.” Sociological Methodology, forthcoming.
  • Yu, Yue (2019). Time-varying Networks: Measurement, Modeling, and Computation. PhD Thesis, University of California, Irvine.
  • Butts, Carter T., “A Perfect Sampling Method for Exponential Family Random Graph Models”, in Journal of Mathematical Sociology, 42 (1), 2018.
  • Almquist, Zack W. and Butts, Carter T., “Dynamic Network Analysis with Missing Data: Theory and Methods”, Statistica Sinica, 2018.
  • Fitzhugh, Sean M. and Butts, Carter T., “Patterns of Comembership: Techniques for Identifying Subgraph Composition”, in Social Networks, 55(1), 2018.
  • Zhang, Xuhong and Butts, Carter T., “Activity Correlation Spectroscopy: A Novel Method for Inferring Social Relationships from Activity Data”,  in Social Network Analysis and Mining, 2017.
  • Minas Gjoka, Emily Smith, Carter T. Butts, “Estimating Subgraph Frequencies with or without Attributes from Egocentrically Sampled Data”,  arXiv report, 2015 [dataset]

Applications to Mobile Datasets

Goal and Main Results: In this part of the project, we collect and analyze mobile activity from urban environments (e.g., time series of traffic volume per area) and we infer information about not only the traffic itself, but also about human activity and characteristics of the city.

  • Evita Bakopoulou, Balint Tillman, Athina Markopoulou, “A Federated Learning Approach for Mobile Packet Classification”, in , July 2019.
  • E. Alimpertis, A. Markopoulou, K. Psounis, C.T. Butts, “City-Wide Signal Strength Maps: Prediction with Random Forests”, in the Proc. of the Web Conference (WWW),2019, pp. 2536 – 2542, May 2019, San Fransisco, USA ( 20% acceptance rate for short papers and poster presentation). preprint.
  • Milad Mehrabadi, “A Recommendation System for Predicting Privacy Leaks in Mobile Traffic”, MS Thesis, March 2019
  • E. Alimpertis, A. Markopoulou, “Using AntMonitor For Crowdsourcing Passive Mobile Network Measurements”, in NSDI Poster Session, Boston, Feb. 2017.
  • B.Cici, E. Alimpertis, A. Ihler, A. Markopoulou, ““Cell-to-cell Activity Prediction for Smart Cities”, Infocom  Workshop on Smart Cities and Urban Computing (SmartCity 2016), San Francisco, CA, April 2016.
  • Zhang, Xuhong; Mallepudi, Venkata; and Butts, Carter T., “An Integrated Platform for Collecting Mobile Phone Data and Learning Demographic Features”, IEEE International Conference on Pervasive Computing and Communications (PerCom), WiP Paper, 2017.
  • Zhang, Xuhong and Butts, Carter T, “A Novel Multivariate Spectral Regression Model for Learning Relationships Between Communication Activity and Urban Ecology,” in IEEE International Conference on Pervasive Computing and Communications (PerCom), 2016.

Applications to Sociology

  • Boessen, Adam; Hipp, John R.; Butts, Carter T.; Nagle, Nicholas N.; and Smith, Emily J. (2017). “Social Fabric and Fear of Crime: Considering Spatial Location and Time of Day”, in Social Networks, 2018.
  • Boessen, Adam; Hipp, John R.; Butts, Carter T.; Nagle, Nicholas N.; and Smith, Emily J., “The Built Environment, Spatial Scale, and Social Networks: Do Land Uses Matter for Personal Network Structure?”, Environment and Planning, B: Planning and Design, 2017.
  • Reed, Philip J.; Spiro, Emma S.; and Butts, Carter T., “Thumbs up for privacy?: Differences in Online Self-disclosure Behavior Across National Cultures”, in Social Science Research, 59 (155). 2016.


Acknowledgements: This material is based upon work supported by the National Science  Foundation under Grant No. 1526736: “III: Small: Network Sampling and Construction Methods for Inference and Anonymization”, Sept.1, 2015-Aug. 31, 2019.

Disclaimer: Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

Point of Contact: please contact the PI, A. Markopoulou.

Last Updated: Dec. 18th, 2019