Research

My broad research interest is in the areas of security & privacy, privacy-preserving machine learning, and intrusion detection systems. I am particularly passionate about building practical data-driven systems that take into account both data privacy and security while also keeping the intended functionality of the system unimpaired.

 

This page has not been updated since 2021. Please go to the publications page.

Interests

  • Security and Privacy of Machine Learning
  • Privacy-preserving Machine Learning
  • Intrusion Detection Systems

Research Projects

  • Novel Model Inversion Attribute Inference Attacks on Classification Models

    Increasing use of machine learning (ML) technologies in privacy-sensitive domains such as medical diagnoses, lifestyle predictions, and business decisions highlights the need to better understand if these ML technologies are introducing leakage of sensitive and proprietary training data. In this project, we focus on model inversion attacks where the adversary knows non-sensitive attributes about records in the training data and aims to infer the value of a sensitive attribute unknown to the adversary, using only black-box access to the target classification model. We first devise a novel confidence score-based model inversion attribute inference attack that significantly outperforms the state-of-the-art. We then introduce a label-only model inversion attack that relies only on the model's predicted labels but still matches our confidence score-based attack in terms of attack effectiveness. We also extend our attacks to the scenario where some of the other (non-sensitive) attributes of a target record are unknown to the adversary. We evaluate our attacks on two types of machine learning models, decision tree and deep neural network, trained on three real datasets. Moreover, we empirically demonstrate the disparate vulnerability of model inversion attacks, i.e., specific groups in the training dataset (grouped by gender, race, etc.) could be more vulnerable to model inversion attacks.

  • Privacy-preserving Real-time Anomaly Detection Using Edge Computing

    Anomaly detection on data collected by devices, such as sensors and IoT objects, is inevitable for many critical systems, e.g., an anomaly in the data of a patient's health monitoring device may indicate a medical emergency situation. Because of the resource-constrained nature of these devices, data collected by such devices are usually off-loaded to the cloud/edge for storage and/or further analysis. However, to ensure data privacy it is critical that the data be transferred to and managed by the cloud/edge in an encrypted form which necessitates efficient processing of such encrypted data for real-time anomaly detection. Motivated by the simultaneous demands for data privacy and real-time data processing, in this paper, we investigate the problem of a privacy-preserving real-time anomaly detection service on sensitive, time series, streaming data. We propose a privacy-preserving framework that enables efficient anomaly detection on encrypted data by leveraging a lightweight and aggregation optimized encryption scheme to encrypt the data before off-loading the data to the edge. We demonstrate our solution for a widely used anomaly detection algorithm, windowed Gaussian anomaly detector and evaluate the performance of the solution in terms of the obtained model privacy, accuracy, latency, and communication cost.

  • Privacy-preserving Multi-party Analytics

    Many enterprises are transitioning towards data-driven business processes. There are numerous situations where multiple parties would like to share data towards a common goal, if it were possible to simultaneously protect the privacy and security of the individuals and organizations described in the data. Existing solutions for multi-party analytics that follow the so called Data Lake paradigm, have parties transfer their raw data to a trusted mediator, who then performs the desired analysis on the global data, and shares the results with the parties. However, such a solution does not fit many applications such as Healthcare, Finance and the Internet-of-Things, where privacy is a strong concern. Motivated by the increasing demands for data privacy, we study the problem of privacy-preserving multi-party data analytics, where the goal is to enable analytics on multi-party data without compromising the data privacy of each individual party. We first propose a secure sum protocol with strong security guarantees. The proposed secure sum protocol is resistant to collusion attacks even with N-2 parties colluding, where N denotes the total number of collaborating parties. We then use this protocol to propose two secure gradient descent algorithms, one for horizontally partitioned data, and the other for vertically partitioned data. The proposed framework is generic and applies to a wide class of machine learning problems. We demonstrate our solution for two popular use-cases, regression and classification, and evaluate the performance of the proposed secure solution in terms of the obtained model accuracy, latency and communication cost. In addition, we perform a scalability analysis to evaluate the performance of the proposed solution as the data size and the number of parties increase.

  • A Fine-grained Approach for Anomaly Detection in File System Accesses

    Protecting sensitive data against malicious or compromised insiders is a challenging problem. Access control mechanisms are not always able to prevent authorized users from misusing or stealing sensitive data as insiders often have access permissions to the data. Also, security vulnerabilities and phishing attacks make it possible for external malicious parties to compromise identity credentials of users who have access to the data. Therefore, solutions for protection from insider threat require combining access control mechanisms and other security techniques, such as encryption, with techniques for detecting anomalies in data accesses. In this project, we propose a novel approach to create fine-grained profiles of the users' normal file access behaviors. Our approach is based on the key observation that even if a user's access to a file seems legitimate, only a fine-grained analysis of the access (size of access, timestamp, etc.) can help understanding the original intention of the user. We exploit the users' file access information at block level and develop a feature-extraction method to model the users' normal file access patterns (user profiles). Such profiles are then used in the detection phase for identifying anomalous file system accesses.

  • RWGuard: A Real-Time Detection System Against Cryptographic Ransomware

    Ransomware has recently (re)emerged as a popular malware that targets a wide range of victims - from individual users to corporate ones for monetary gain. Our key observation on the existing ransomware detection mechanisms is that they fail to provide an early warning in real-time which results in irreversible encryption of a significant number of files while the post-encryption techniques (e.g., key extraction, file restoration) suffer from several limitations. Also, the existing detection mechanisms result in high false positives being unable to determine the original intent of file changes, i.e., they fail to distinguish whether a significant change in a file is due to a ransomware encryption or due to a file operation by the user herself (e.g., benign encryption or compression). To address these challenges, in this project, we introduce a ransomware detection mechanism, RWGuard, which is able to detect crypto-ransomware in real-time on a user’s machine by (1) deploying decoy techniques, (2) carefully monitoring both the running processes and the file system for malicious activities, and (3) omitting benign file changes from being flagged through the learning of users’ encryption behavior. We evaluate our system against samples from 14 most prevalent ransomware families to date. Our experiments show that RWGuard is effective in real-time detection of ransomware with zero false negative and negligible false positive rates while incurring an overhead of only 1.9%.