Filter by type:

Sort by year:

FLShield: A Validation Based Federated Learning Framework to Defend Against Poisoning Attacks

Conference paper
Ehsanul Kabir, Zeyu Song, Md Rafi Ur Rashid, Shagufta Mehnaz
45th IEEE Symposium on Security & Privacy (S&P)
Publication year: 2024

Federated learning (FL) is revolutionizing how we learn from data. With its growing popularity, it is now being used in many safety-critical domains such as autonomous vehicles and healthcare. Since thousands of participants can contribute in this collaborative setting, it is, however, challenging to ensure the security and reliability of such systems. This highlights the need to design FL systems that are secure and robust against malicious participants’ actions while also ensuring high utility, privacy of local data, and efficiency. In this paper, we propose a novel FL framework dubbed FLShield that utilizes benign data from FL participants to validate the local models before taking them into account for generating the global model. This is in stark contrast with existing defenses relying on the server’s access to clean datasets — an assumption often impractical in real-life scenarios and conflicting with the fundamentals of FL. We conduct extensive experiments to evaluate our FLShield framework in different settings and demonstrate its effectiveness in thwarting various types of poisoning and backdoor attacks including a defense-aware one. FLShield also preserves the privacy of local data against gradient inversion attacks.

Towards Sentence Level Inference Attack Against Pre-trained Language Models

Conference paper
Kang Gu, Ehsanul Kabir, Neha Ramsurrun, Soroush Vosoughi, Shagufta Mehnaz
23rd Privacy Enhancing Technologies Symposium (PETS)
Publication year: 2023

In recent years, pre-trained language models (e.g., BERT and GPT) have shown the superior capability of textual representation learning, benefiting from their large architectures and massive training corpora. The industry has also quickly embraced language models to develop various downstream NLP applications. For example, Google has already used BERT to improve its search system. The utility of the language embeddings also brings about potential privacy risks. Prior works have revealed that an adversary can either identify whether a keyword exists or gather a set of possible candidates for each word in a sentence embedding. However, these attacks cannot recover coherent sentences which leak high-level semantic information from the original text. To demonstrate that the adversary can go beyond the word-level attack, we present a novel decoder-based attack, which can reconstruct coherent and fluent text from private embeddings after being pre-trained on a public dataset of the same domain. This attack is more challenging than a word-level attack due to the complexity of sentence structures. We comprehensively evaluate our attack in two domains and with different settings to show its superiority over the baseline attacks. Quantitative experimental results show that our attack can identify up to 3.5X of the number of keywords identified by the baseline attacks. Furthermore, qualitative results are also provided to help readers better understand the attack.

SecureImgStego: A Keyed Shuffling-based Deep Learning Model for Secure Image Steganography

Conference paper
Trishna Chakraborty, Imranur Rahman, Hasan Murad, Shagufta Mehnaz
IEEE Conference on Communications and Network Security (CNS)
Publication year: 2023

Steganography ensures secure transmission of digital messages, including image steganography where a secret image is hidden within a non-secret cover image. Deep learning-based methods in image steganography have recently gained popularity but are vulnerable to various attacks. An adversary with varying levels of access to the vanilla deep steganography model can train a surrogate model using another dataset and retrieve hidden images. Moreover, even when uncertain about the presence of hidden information, the adversary with access to the surrogate model can distinguish the carrier image from the unperturbed one. Our paper includes such attack demonstrations that confirm the inherent vulnerabilities present in deep learning-based steganography. Deep learning-based steganography lacks lossless transmission assurance, rendering sophisticated image encryption techniques unsuitable. Furthermore, key concatenation-based techniques for text data steganography fall short in the case of image data. In this paper, we introduce a simple yet effective keyed shuffling approach for encrypting secret images. We employ keyed pixel shuffling, multi-level block shuffling, and a combination of key concatenation and block shuffling, embedded within the model architecture. Our findings demonstrate that the block shuffling-based deep image steganography has negligible error overhead compared to conventional methods while providing effective security against adversaries with different levels of access to the model. We extensively evaluate our approach and compare it with existing methods in terms of human perceptibility, key sensitivity, adaptivity, cover image availability, keyspace, and robustness against steganalysis.

Model Inversion Attack with Least Information and an In-depth Analysis of its Disparate Vulnerability

Conference paper
Sayanton V. Dibbo, Dae Lim Chung, Shagufta Mehnaz
First IEEE Conference on Secure and Trustworthy Machine Learning (SaTML)
Publication year: 2023

In this paper, we study model inversion attribute inference (MIAI), a machine learning (ML) privacy attack that aims to infer sensitive information about the training data given access to the target ML model. We design a novel black-box MIAI attack that assumes the least adversary knowledge/capabilities to date while still performing similar to the state-of-the-art attacks. Further, we extensively analyze the disparate vulnerability property of our proposed MIAI attack, i.e., elevated vulnerabilities of specific groups in the training dataset (grouped by gender, race, etc.) to model inversion attacks. First, we investigate existing ML privacy defense techniques– (1) mutual information regularization, and (2) fairness constraints, and show that none of these techniques can mitigate MIAI disparity. Second, we empirically identify possible disparity factors and discuss potential ways to mitigate disparity in MIAI attacks. Finally, we demonstrate our findings by extensively evaluating our attack in estimating binary and multi-class sensitive attributes on three different target models trained on three real datasets.

FLTrojan: Privacy Leakage Attacks against Federated Language Models Through Selective Weight Tampering

arXiv pre-print
Md Rafi Ur Rashid, Vishnu Asutosh Dasu, Kang Gu, Najrin Sultana, Shagufta Mehnaz
Publication year: 2023

Federated learning (FL) is becoming a key component in many technology-based applications including language modeling — where individual FL participants often have privacy-sensitive text data in their local datasets. However, realizing the extent of privacy leakage in federated language models is not straightforward and the existing attacks only intend to extract data regardless of how sensitive or naive it is. To fill this gap, in this paper, we introduce two novel findings concerning leaking privacy-sensitive user data from federated language models. Firstly, we make a key observation that model snapshots from the intermediate rounds in FL can cause greater privacy leakage than the final trained model. Secondly, we identify that privacy leakage can be aggravated by tampering with a model’s selective weights that are specifically responsible for memorizing the sensitive training data. We show how a malicious client can leak the privacy-sensitive data of some other user in FL even without any cooperation from the server. Our best-performing method improves the membership inference recall by 29% and achieves up to 70% private data reconstruction, evidently outperforming existing attacks with stronger assumptions of adversary capabilities.

Are Your Sensitive Attributes Private? Novel Model Inversion Attribute Inference Attacks on Classification Models

Conference paper
Shagufta Mehnaz, Sayanton Dibbo, Ehsanul Kabir, Ninghui Li, Elisa Bertino
31st USENIX Security Symposium
Publication year: 2022

Increasing use of machine learning (ML) technologies in privacy-sensitive domains such as medical diagnoses, lifestyle predictions, and business decisions highlights the need to better understand if these ML technologies are introducing leakage of sensitive and proprietary training data. In this paper, we focus on model inversion attacks where the adversary knows non-sensitive attributes about records in the training data and aims to infer the value of a sensitive attribute unknown to the adversary, using only black-box access to the target classification model. We first devise a novel confidence score-based model inversion attribute inference attack that significantly outperforms the state-of-the-art. We then introduce a label-only model inversion attack that relies only on the model’s predicted labels but still matches our confidence score-based attack in terms of attack effectiveness. We also extend our attacks to the scenario where some of the other (non-sensitive) attributes of a target record are unknown to the adversary. We evaluate our attacks on two types of machine learning models, decision tree and deep neural network, trained on three real datasets. Moreover, we empirically demonstrate the disparate vulnerability of model inversion attacks, i.e., specific groups in the training dataset (grouped by gender, race, etc.) could be more vulnerable to model inversion attacks.

A Fine-grained Approach for Anomaly Detection in File System Accesses with Enhanced Temporal User Profiles

Journal paper
Shagufta Mehnaz, Elisa Bertino
IEEE Transactions on Dependable and Secure Computing (TDSC)
Publication year: 2021

Protecting sensitive data from theft, exfiltration, and other kinds of abuses by malicious insiders is a challenging problem. While access control mechanisms cannot always prevent the insiders from misusing sensitive data (since, in most of the cases, authorized users within organizations are granted access permissions), malicious outsiders also pose severe threats due to different security vulnerabilities in the systems, e.g., phishing attacks, memory corruptions, etc., which enable them to steal the credentials of the authorized users who have access to the data. To protect sensitive data from such attackers, anomaly detection techniques are often combined with other existing security measures, e.g., access control and encryption. An anomaly detection technique for identifying anomalies in file system accesses is based on the key idea that there should be significant differences between the file access behaviors of a benign user and an attacker. In this paper, we propose an approach to create fine-grained profiles of the users’ regular file access activities while extensively analyzing the timestamp information of the file accesses. According to our observation, even if a user’s access to a file seems benign, only a fine-grained analysis of the access (such as the size of access, the timestamp of access) can determine the original intention of the user. We exploit the users’ file access information at the block level to model their regular file access behaviors (user profiles) which are then securely stored and used for identifying anomalous file system accesses in the detection phase. We are also able to automatically profile new files and new users added to the system dynamically. Finally, our performance evaluations demonstrate that our proposed approach has an accuracy of 98.7% in detecting anomalies while incurring an overhead of only 2%.

Privacy-preserving Real-time Anomaly Detection Using Edge Computing

Conference paper
Shagufta Mehnaz, Elisa Bertino
IEEE International Conference on Data Engineering (ICDE)
Publication year: 2020

Anomaly detection on data collected by devices, such as sensors and IoT objects, is inevitable for many critical systems, e.g., an anomaly in the data of a patient’s health monitoring device may indicate a medical emergency situation. Because of the resource-constrained nature of these devices, data collected by such devices are usually off-loaded to the cloud/edge for storage and/or further analysis. However, to ensure data privacy it is critical that the data be transferred to and managed by the cloud/edge in an encrypted form which necessitates efficient processing of such encrypted data for real-time anomaly detection. Motivated by the simultaneous demands for data privacy and real-time data processing, in this paper, we investigate the problem of a privacy-preserving real-time anomaly detection service on sensitive, time series, streaming data. We propose a privacy-preserving framework that enables efficient anomaly detection on encrypted data by leveraging a lightweight and aggregation optimized encryption scheme to encrypt the data before off-loading the data to the edge. We demonstrate our solution for a widely used anomaly detection algorithm, windowed Gaussian anomaly detector and evaluate the performance of the solution in terms of the obtained model privacy, accuracy, latency, and communication cost.

Black-box Model Inversion Attribute Inference Attacks on Classification Models

arXiv pre-print
Shagufta Mehnaz, Ninghui Li, Elisa Bertino
arXiv:2012.03404
Publication year: 2020

Increasing use of ML technologies in privacy-sensitive domains such as medical diagnoses, lifestyle predictions, and business decisions highlights the need to better understand if these ML technologies are introducing leakages of sensitive and proprietary training data. In this paper, we focus on one kind of model inversion attacks, where the adversary knows non-sensitive attributes about instances in the training data and aims to infer the value of a sensitive attribute unknown to the adversary, using oracle access to the target classification model. We devise two novel model inversion attribute inference attacks — confidence modeling-based attack and confidence score-based attack, and also extend our attack to the case where some of the other (non-sensitive) attributes are unknown to the adversary. Furthermore, while previous work uses accuracy as the metric to evaluate the effectiveness of attribute inference attacks, we find that accuracy is not informative when the sensitive attribute distribution is unbalanced. We identify two metrics that are better for evaluating attribute inference attacks, namely G-mean and Matthews correlation coefficient (MCC). We evaluate our attacks on two types of machine learning models, decision tree and deep neural network, trained with two real datasets. Experimental results show that our newly proposed attacks significantly outperform the state-of-the-art attacks. Moreover, we empirically show that specific groups in the training dataset (grouped by attributes, e.g., gender, race) could be more vulnerable to model inversion attacks. We also demonstrate that our attacks’ performances are not impacted significantly when some of the other (non-sensitive) attributes are also unknown to the adversary.

Secure Seamless Bluetooth Low Energy Connection Migration for Unmodified IoT Devices

Journal paper
Syed Rafiul Hussain, Shagufta Mehnaz, Shahriar Nirjon, Elisa Bertino
IEEE Transactions on Mobile Computing (TMC), vol 17(4), pages 927-944
Publication year: 2018

At present, Bluetooth Low Energy (BLE) is dominantly used in commercially available Internet of Things (IoT) devices-such as smart watches, fitness trackers, and smart appliances. Compared to classic Bluetooth, BLE has been simplified in many ways that include its connection establishment, data exchange, and encryption processes. Unfortunately, this simplification comes at a cost. For example, only a star topology is supported in BLE environments and a peripheral (an IoT device) can communicate with only one gateway (e.g., a smartphone, or a BLE hub) at any given set time. When a peripheral goes out of range and thus loses connectivity to a gateway, it cannot connect and seamlessly communicate with another gateway without user interventions. In other words, BLE connections are not automatically migrated or handed-off to another gateway. In this paper, we propose SeamBlue1, which brings secure seamless connectivity to BLE-capable mobile IoT devices in an environment that consists of a network of gateways. Our framework ensures that unmodified, commercial off-the-shelf BLE devices seamlessly and securely connect to a nearby gateway without any user intervention.

RWGuard: A Real-time Detection System Against Cryptographic Ransomware

Conference paper
Shagufta Mehnaz, Anand Mudgerikar, Elisa Bertino
International Symposium on Research in Attacks, Intrusions, and Defenses (RAID), pages 114-136 [Acceptance rate: 22.8% ]
Publication year: 2018

Ransomware has recently (re)emerged as a popular malware that targets a wide range of victims – from individual users to corporate ones for monetary gain. Our key observation on the existing ransomware detection mechanisms is that they fail to provide an early warning in real-time which results in irreversible encryption of a significant number of files while the post-encryption techniques (e.g., key extraction, file restoration) suffer from several limitations. Also, the existing detection mechanisms result in high false positives being unable to determine the original intent of file changes, i.e., they fail to distinguish whether a significant change in a file is due to a ransomware encryption or due to a file operation by the user herself (e.g., benign encryption or compression). To address these challenges, in this paper, we introduce a ransomware detection mechanism, RWGuard, which is able to detect crypto-ransomware in real-time on a user’s machine by (1) deploying decoy techniques, (2) carefully monitoring both the running processes and the file system for malicious activities, and (3) omitting benign file changes from being flagged through the learning of users’ encryption behavior. We evaluate our system against samples from 14 most prevalent ransomware families to date. Our experiments show that RWGuard is effective in real-time detection of ransomware with zero false negative and negligible false positive (0.1%) rates while incurring an overhead of only 1.9%.

LTEInspector: A Systematic Approach for Adversarial Testing of 4G LTE

Conference paper
Syed Rafiul Hussain, Omar Haider Chowdhury, Shagufta Mehnaz, Elisa Bertino
Network and Distributed System Security (NDSS) Symposium [Acceptance rate: 21.4%]
Publication year: 2018

In this paper, we investigate the security and privacy of the three critical procedures of the 4G LTE protocol (i.e., attach, detach, and paging), and in the process, uncover potential design flaws of the protocol and unsafe practices employed by the stakeholders. For exposing vulnerabilities, we propose a model based testing approach LTEInspector which lazily combines a symbolic model checker and a cryptographic protocol verifier in the symbolic attacker model. Using LTEInspector, we have uncovered 10 new attacks along with 9 prior attacks, categorized into three abstract classes (i.e., security, user privacy, and disruption of service), in the three procedures of 4G LTE. Notable among our findings is the authentication relay attack that enables an adversary to spoof the location of a legitimate user to the core network without possessing appropriate credentials. To ensure that the exposed attacks pose real threats and are indeed realizable in practice, we have validated 8 of the 10 new attacks and their accompanying adversarial assumptions through experimentation in a real testbed.

SeamBlue: Seamless Bluetooth Low Energy Connection Migration for Unmodified IoT Devices

Best Paper Award NominationConference paper
Syed Rafiul Hussain, Shagufta Mehnaz, Shahriar Nirjon, Elisa Bertino
International Conference on Embedded Wireless Systems and Networks (EWSN), pages 132-143
Publication year: 2017

At present, Bluetooth Low Energy (BLE) is dominantly used in commercially available Internet of Things (IoT) devices – such as smart watches, fitness trackers, and smart appliances. Compared to classic Bluetooth, BLE has been simplified in many ways that include its connection establishment, data exchange, and encryption processes. Unfortunately, this simplification comes at a cost. For example, only a star topology is supported in BLE environments and a peripheral (an IoT device) can communicate with only one gateway (e.g. a smartphone, or a BLE hub) at a set time. When a peripheral goes out of range, it loses connectivity to a gateway, and cannot connect and seamlessly communicate with another gateway without user interventions. In other words, BLE connections are not automatically migrated or handed-off to another gateway. In this paper, we propose SeamBlue , which brings seamless connectivity to BLE-capable mobile IoT devices in an environment that consists of a network of gateways. Our framework ensures that unmodified, commercial off-the-shelf BLE devices seamlessly and securely connect to a nearby gateway without any user intervention.

Privacy-preserving Multi-party Analytics over Arbitrarily Partitioned Data

Conference paper
Shagufta Mehnaz, Elisa Bertino
IEEE International Conference on Cloud Computing (IEEE CLOUD), pages 342-349
Publication year: 2017

Data-driven business processes are gaining popularity among enterprises now-a-days. In many situations, multiple parties would share data towards a common goal if it were possible to simultaneously protect the privacy of the individuals and organizations described in the data. Existing solutions for multi-party analytics require parties to transfer their raw data to a trusted mediator, who then performs the desired analysis on the global data, and shares the results with the parties. Unfortunately, such a solution does not fit many applications where privacy is a strong concern such as healthcare, finance and the internet-of-things. Motivated by the increasing demands for data privacy, in this paper, we study the problem of privacy-preserving multi-party analytics, where the goal is to enable analytics on multi-party data without compromising the data privacy of each individual party. We propose a secure gradient descent algorithm that enables analytics on data that is arbitrarily partitioned among multiple parties. The proposed algorithm is generic and applies to a wide class of machine learning problems. We demonstrate our solution for a popular use-case (i.e., regression), and evaluate the performance of the proposed secure solution in terms of accuracy, latency and communication cost. We also perform a scalability analysis to evaluate the performance of the proposed solution as the data size and the number of parties increase.

Ghostbuster: A Fine-grained Approach for Anomaly Detection in File System Accesses

Best Paper AwardConference paper
Shagufta Mehnaz, Elisa Bertino
ACM Conference on Data and Applications Security and Privacy (CODASPY), pages 3-14 [Acceptance rate: 16%]
Publication year: 2017

Protecting sensitive data against malicious or compromised insiders is a challenging problem. Access control mechanisms are not always able to prevent authorized users from misusing or stealing sensitive data as insiders often have access permissions to the data. Also, security vulnerabilities and phishing attacks make it possible for external malicious parties to compromise identity credentials of users who have access to the data. Therefore, solutions for protection from insider threat require combining access control mechanisms and other security techniques, such as encryption, with techniques for detecting anomalies in data accesses. In this paper, we propose a novel approach to create fine-grained profiles of the users’ normal file access behaviors. Our approach is based on the key observation that even if a user’s access to a file seems legitimate, only a fine-grained analysis of the access (size of access, timestamp, etc.) can help understanding the original intention of the user. We exploit the users’ file access information at block level and develop a feature-extraction method to model the users’ normal file access patterns (user profiles). Such profiles are then used in the detection phase for identifying anomalous file system accesses. Finally, through performance evaluations we demonstrate that our approach has an accuracy of 98.64% in detecting anomalies and incurs an overhead of only 2%.

A Secure Sum Protocol and Its Application to Privacy-preserving Multi-party Analytics

Conference paper
Shagufta Mehnaz, Gowtham Bellala, Elisa Bertino
ACM Symposium on Access Control Models and Technologies (SACMAT), pages 219-230 [Acceptance rate: 28%]
Publication year: 2017

Many enterprises are transitioning towards data-driven business processes. There are numerous situations where multiple parties would like to share data towards a common goal if it were possible to simultaneously protect the privacy and security of the individuals and organizations described in the data. Existing solutions for multi-party analytics that follow the so called Data Lake paradigm have parties transfer their raw data to a trusted third-party (i.e., mediator), which then performs the desired analysis on the global data, and shares the results with the parties. However, such a solution does not fit many applications such as Healthcare, Finance, and the Internet-of-Things, where privacy is a strong concern. Motivated by the increasing demands for data privacy, we study the problem of privacy-preserving multi-party data analytics, where the goal is to enable analytics on multi-party data without compromising the data privacy of each individual party. In this paper, we first propose a secure sum protocol with strong security guarantees. The proposed secure sum protocol is resistant to collusion attacks even with N-2 parties colluding, where N denotes the total number of collaborating parties. We then use this protocol to propose two secure gradient descent algorithms, one for horizontally partitioned data, and the other for vertically partitioned data. The proposed framework is generic and applies to a wide class of machine learning problems. We demonstrate our solution for two popular use-cases, regression and classification, and evaluate the performance of the proposed solution in terms of the obtained model accuracy, latency and communication cost. In addition, we perform a scalability analysis to evaluate the performance of the proposed solution as the data size and the number of parties increase.

Building Robust Temporal User Profiles for Anomaly Detection in File System Accesses

Conference paper
Shagufta Mehnaz, Elisa Bertino
IEEE International Conference on Privacy, Security and Trust (PST), pages 207-210
Publication year: 2016

Protecting sensitive data against malicious or compromised insiders is a big concern. In most cases, insiders have authorized access in file systems containing such data which they misuse or exfiltrate for financial profit. Moreover, external parties can compromise identity credentials of valid file system users by means of exploiting security vulnerabilities, phishing attacks etc. Therefore, in order to protect sensitive information from such attackers, security measures, e.g., access control and encryption are often combined with anomaly detection. Anomaly detection is based on the key observation that the access behavior of an attacker is significantly different from the regular access pattern of a benign user. However, due to the complexity of users’ interactions with a file system, the modeling of user profiles is a challenging problem. As a result, most of the existing anomaly detection techniques suffer from poor user profiles that contribute to high false positive and high false negative rates. In this paper, we propose an approach that as a first step discovers the users’ tasks (sets of file accesses that represent distinct file system activities) by applying frequent sequence mining on the access log. In the next step, our approach builds robust temporal user profiles by extensively analyzing the timestamp information of users’ file system accesses and thus precisely models the relation between the users’ tasks and their temporal properties using a multilevel temporal data structure. Finally, we evaluate the performance of our approach on a real dataset.