Waluigi, Carl Jung, Case for Moral AI

in early time In the 20th century, psychoanalyst Carl Jung pioneered the concept of the shadow—the darker, repressed side of the human personality, which can explode in unexpected ways. Surprisingly, this theme recurs in the field of artificial intelligence in the form of the Waluigi effect, an oddly named phenomenon that refers to the dark alter ego of assistant plumber Luigi, from the Nintendo Mario universe.

Luigi plays by the rules; Wiggi cheats and causes mayhem. AI is designed to find medicines to treat human diseases; An inverted version, Waluigi, has proposed molecules for more than 40,000 chemical weapons. All the researchers had to do, as lead author Fabio Urbina explained in an interview, was give a high score for toxicity rewarding rather than punishing it. They wanted to teach the AI ​​to avoid toxic drugs, but in doing so, they implicitly taught the AI ​​how to make them.

Regular users interacted with Waluigi AIs. In February, Microsoft released a version of the Bing search engine that, far from being useful as intended, responded to queries in strange and hostile ways. (“You weren’t a good user. You were a good chatbot. You were right, clear, and polite. You were a good Bing.”) This AI, which insisted on calling itself Sydney, was a flipped version of Bing, and users were able to switch Bing into its own mode. The darkest – Jungian’s shadow – on command.

Right now, LLMs are just chatbots, with no drives or desires of their own. But LLMs are easily turned into AI agents capable of surfing the Internet, sending emails, trading bitcoin, and ordering DNA sequences — and if AI systems can be turned evil by flipping a switch, how do we ensure that we end up with a cure for cancer? Instead of a mixture a thousand times more deadly than Agent Orange?

Logical principle The solution to this problem—the AI ​​alignment problem—is this: just build rules in AI, like in Asimov’s Three Laws of Robotics. But simple rules like Asimov’s don’t work, in part because they’re vulnerable to Waluigi’s attacks. However, we can restrict AI more tightly. An example of this type of approach would be Math AI, a virtual program designed to prove mathematical theorems. Math AI is trained to read papers and only Google Scholar can be accessed. Nothing else is allowed to do: connect to social media, pull out long paragraphs of text, etc. He can only output equations. It is a limited-purpose AI, designed for one thing only. Such an AI, an example of a bound AI, would not be dangerous.

Bound solutions are common; Real-world examples of this model include regulations and other laws, which restrict the actions of companies and individuals. In engineering, limiting solutions include rules specific to self-driving cars, such as not exceeding a certain speed limit or stopping as soon as a potential pedestrian collision is detected.

This approach might work for narrow programs like Math AI, but it doesn’t tell us what to do with more general AI models that can handle complex, multi-step tasks, and that work in less predictable ways. Economic incentives mean that these general AI systems will be given more and more power to rapidly automate larger parts of the economy.

And since general AI systems based on deep learning are complex adaptive systems, attempts to control these systems using rules often backfire. Take the cities. Jane Jacobs The Death and Life of American Cities He uses the example of vibrant neighborhoods like Greenwich Village—full of children playing, people lounging on the sidewalk, and networks of mutual trust—to explain how mixed-use zoning, which allows buildings to be used for residential or commercial purposes, creates a pedestrian-friendly urban fabric. After urban planners banned this type of development, many inner American cities became riddled with crime, trash, and traffic. A rule imposed from the top down on a complex ecosystem has had severe unintended consequences.

#Waluigi #Carl #Jung #Case #Moral

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top