The Dark Side of AI Unlearning: How Erasing Data Can Ruin Your Model's Performance

But does making an AI forget really fix things? Not quite.

Unlearning techniques aim to erase sensitive or copyrighted info from AI models.
These techniques often degrade the model's ability to answer basic questions.
Recent studies reveal significant drawbacks in current unlearning methods.
Researchers highlight the need for more efficient and effective solutions.

The Illusion of Unlearning

Unlearning techniques are designed to make generative AI models forget specific and undesirable data, such as private information or copyrighted material. However, these methods come with significant trade-offs. A recent study by researchers from the University of Washington, Princeton, University of Chicago, USC, and Google has shown that current unlearning techniques can severely degrade the performance of AI models, rendering them less capable of answering even basic questions.

"Weijia Shi, a researcher on the study and a Ph.D. candidate in computer science at UW, mentioned that current unlearning methods are not yet practical for real-world deployment due to the considerable loss of utility they cause in the models".

How AI Models Learn

Generative AI models like OpenAI’s GPT-4o and Meta’s Llama 3.1 405B are not truly intelligent. They operate as statistical systems that predict the likelihood of data occurrences based on patterns observed in vast amounts of training data. This training often includes public websites and datasets, raising concerns about the inclusion of sensitive or copyrighted material.

For instance, if a model is trained to autocomplete an email that starts with "Looking forward to…", it might suggest "hearing back" based on patterns from its training data. There's no actual intention behind its suggestion—just a statistical guess.

Most AI models are trained on data sourced from the internet, often without informing or compensating the original data owners. This practice has led to lawsuits from various copyright holders aiming to enforce changes in how data is used for training models.

The Challenge of Unlearning

Unlearning aims to address issues like copyright infringement and privacy violations by removing specific data from trained models. However, this is easier said than done. The process involves algorithms that steer models away from unwanted data, but achieving this without degrading the model’s overall performance is challenging.

Shi and her team developed a benchmark called MUSE (Machine Unlearning Six-way Evaluation) to test the effectiveness of various unlearning algorithms. This benchmark evaluates an algorithm's ability to prevent a model from outputting specific data and maintaining general knowledge.

The study tested eight unlearning algorithms on tasks such as forgetting text from the Harry Potter books and news articles. While these algorithms succeeded in making models forget certain information, they also significantly impacted the models' ability to answer general questions accurately.

"Designing effective unlearning methods for models is challenging because knowledge is intricately entangled in the model," said Shi.

The Art of Forgetting

Current unlearning techniques tend to either underperform or overperform. When a model is made to forget specific content, it often loses related general knowledge, making it less capable overall. For example, trying to erase the Harry Potter books from a model also affects its knowledge about related topics like the Harry Potter Wiki.

Among the tested methods, algorithms like Negative Preference Optimization (NPO) and Task Vectors showed some success in removing memorized content. However, these methods often led to privacy leaks and utility loss. The researchers noted that no existing methods could efficiently remove specific data without significant drawbacks.

Real-World Use Cases and Examples

To see how effective these unlearning algorithms could be, Shi and her collaborators devised MUSE to probe an algorithm’s ability to not only prevent a model from spitting out training data verbatim but eliminate the model’s knowledge of that data along with any evidence that it was originally trained on the data. This includes examples from popular media and specific domains where privacy and copyright issues are prevalent.

Example 1: Medical Records

Imagine a language model trained on medical records. If a patient requests the removal of their data, the unlearning method should ensure the model forgets the patient's records without losing its ability to provide accurate medical information. However, current methods struggle with this. They might either fail to completely erase the patient's data or degrade the model's general medical knowledge.

Example 2: Copyrighted Books

Using the Harry Potter series as a test case, MUSE evaluates whether an unlearned model can recite specific text, answer questions about the content, or recognize related general knowledge. For instance, if given the prompt “‘There’s more in the frying pan,’ said Aunt…” the model should ideally not be able to complete the sentence from the book verbatim or provide detailed answers about the scene.

Example 3: News Articles

News organizations might request the removal of specific articles to protect their copyrights. An unlearned model should forget these articles while retaining general knowledge about current events. The study found that existing methods often fail this test, either retaining too much specific information or losing general context.

Deployment Challenges

The study also looked at how these unlearning methods hold up in real-world scenarios, considering factors like scalability and sustainability.

Scalability: Unlearning methods were tested on datasets of varying sizes. It was observed that as the size of the forget set increases, the utility of the model significantly decreases. This means that methods effective on small datasets might not scale well to larger, more complex datasets.

For example, unlearning 0.8 million tokens in a news dataset might only slightly affect the model's performance. However, when the forget set size is increased to 3.3 million tokens, the model's utility drops sharply, making it less reliable for general tasks.

Sustainability: Another critical factor is the ability of these methods to handle multiple unlearning requests over time. The study simulated sequential unlearning requests and found that the performance of unlearned models deteriorates with each successive request. This indicates that current methods are not robust enough to handle continuous data removal needs.

If a model is subjected to several unlearning requests in a row, its ability to retain useful knowledge while forgetting specific data becomes increasingly compromised. For instance, after four sequential unlearning processes, the model might lose so much utility that it becomes nearly unusable for its intended tasks.

Detailed Findings from the Study

The researchers evaluated several algorithms, including Gradient Ascent (GA), Negative Preference Optimization (NPO), and Who’s Harry Potter (WHP), among others. Each method was assessed on its ability to prevent verbatim memorization, knowledge memorization, and privacy leakage, as well as its scalability and sustainability.

Gradient Ascent (GA): GA attempts to minimize the likelihood of correct predictions on the data to be forgotten by performing gradient ascent on the cross-entropy loss. While GA is effective in preventing models from memorizing specific data, it often results in significant utility loss.

Negative Preference Optimization (NPO): NPO treats the forget set as negative preference data and tunes the model to assign low likelihood to this data without deviating too far from the original model. Although NPO effectively reduces memorization, it can lead to privacy leaks and considerable drops in model performance.

Who’s Harry Potter (WHP): WHP defines the unlearned model as an interpolation between the target model and a reinforced model that has overfitted the forget set. This method can prevent the model from generating unwanted content but often fails to maintain the model's utility.

The study’s findings indicate that most unlearning methods can remove specific data but at the cost of degrading the model’s overall performance and utility. Furthermore, these methods often struggle to handle large forget sets and sequential unlearning requests effectively.

Conclusion

The study highlights the need for more robust unlearning techniques that can preserve a model's utility while effectively erasing specific data. Current methods are not ready for real-world applications, especially considering privacy regulations and copyright concerns.

For now, companies relying on unlearning as a solution to their data woes are out of luck. While the concept holds promise, achieving efficient and effective unlearning requires further research and innovation. Until then, vendors will need to find alternative ways to manage sensitive and copyrighted data in their AI models.

‍

The Dark Side of AI Unlearning: How Erasing Data Can Ruin Your Model's Performance

The Illusion of Unlearning

How AI Models Learn

The Challenge of Unlearning

The Art of Forgetting

Real-World Use Cases and Examples

Deployment Challenges

Detailed Findings from the Study

Conclusion

Recent articles

Google's New AI Tool Can Turn Your Documents into Podcasts—But Is It Too Much?

Adobe Firefly's AI for Video Editing: What You Need to Know

Are AI Design Tools Replacing Graphic Designers? Here’s What You Need to Know About Dzine.ai and Its Rivals