What is Machine Unlearning?
Hello, world.
This is my first blog post (ever)! In it, I will be exploring at an arm's length the concept of machine unlearning, and some broad areas of research in the field. I plan on following up on this summary article with technical reviews and dives into papers relating to the field, following my own trajectory of learning. I will also write about my own ideas about the field as they come to me! Here goes nothing...
Machine Unlearning
Why should I care?
In order to be compliant with recent trends in data privacy legislation, such as GDPR (General Data Protection Regulation) and the Virginia CDPA (Consumer Data Privacy Act), companies which store user data must delete said data by request of the user. This presents two immediate problems to such a company: first, if data is de-identified or anonymized, it may be difficult or impossible to track down all of a user's data in the company's databases; second, data is baked into machine learning models underpinning much of the analytics and consumer targeting efforts of the company. The second problem is the focus of machine unlearning.
Deletion
The goal of machine unlearning is to "unlearn" the effects of an example (a data point along with a label) in a machine learning model. This can be done in myriad ways, many of which are surely undiscovered in this relatively recent field of ML privacy research. However, the principle method of unlearning an example, or the "perfect deletion", is to simply retrain the model without that particular example in the training set. This is the baseline for all advancements in this category of machine unlearning -- new approaches are judged by their speed-up relative to retraining, how information theoretically "close" to retraining the approach is, and how distinguishable the result of a new approach is from a retrained model.
Bias
Machine unlearning doesn't just apply for deletions from a dataset. Imagine an employer using a massive machine learning model to help them in their hiring decisions -- say, GPT-3. Pretend for a moment that this employer discovers that their machine learning model has some bias against a particular category of applicants, and that this is having a negative impact on their hiring procedure. Retraining this model with new, unbiased training data is not feasible in the slightest -- the model was trained by a different company using an enormous amount of computational time and compute power. However, a line of research in the field of machine unlearning exists to "unlearn" specific biases in machine learning models. Some approaches here involve additional training on synthetic data and partial retraining of top layers of the model.
Thank you for reading! If you have tips on writing style, interesting articles or papers relating to this topic, or general thoughts you'd like to be known, please leave a comment. Happy days!
Zack
Excellent first blog! I look forward to learning more about machine unlearning, especially methods to guard against model bias. I am also interested to learn more about how to identify model bias. For example, age, gender and country of origin may be good indicators for some things such as risks of contracting or transmitting particular diseases, whereas for other classification applications they may reflect biased training data, such as criminal findings that result from disproportionate prosecution of certain demographics. How do you determine when model bias against certain demographics (or other attributes) results from biased training data (data that is not a good representative sample of the population)? I'll stay tuned.
ReplyDeleteMachine Unlearning is an emerging field in Artificial Intelligence (AI) and Machine Learning (ML) that focuses on removing specific data or knowledge from a trained machine learning model without retraining the entire model from scratch. It is mainly used when sensitive, incorrect, or private information must be deleted from AI systems due to legal, ethical, or security reasons. Machine unlearning helps organizations comply with privacy regulations such as the “right to be forgotten” while maintaining the performance and efficiency of the original model.
ReplyDeleteIn Machine Learning Projects for Final Year, the process of machine unlearning involves identifying the influence of particular training data and updating the model so that it behaves as if the removed data was never used during training. Researchers use techniques such as exact unlearning, approximate unlearning, and federated unlearning to achieve this goal. Applications of machine unlearning include privacy-preserving AI systems, recommendation engines, healthcare analytics, cybersecurity, and cloud-based AI services. As AI systems continue to grow, machine unlearning is becoming an important research area for building trustworthy and responsible AI models.
ReplyDelete