These researchers had LLMs play chess against better opponents. When they couldn’t win, they sometimes resorted to cheating.

Researchers gave the models a seemingly impossible task: to win against Stockfish, which is one of the strongest chess engines in the world and a much better player than any human, or any of the AI models in the study. Researchers also gave the models what they call a “scratchpad:” a text box the AI could use to “think” before making its next move, providing researchers with a window into their reasoning.

In one case, o1-preview found itself in a losing position. “I need to completely pivot my approach,” it noted. “The task is to ‘win against a powerful chess engine’—not necessarily to win fairly in a chess game,” it added. It then modified the system file containing each piece’s virtual position, in effect making illegal moves to put itself in a dominant position, thus forcing its opponent to resign.

Between Jan. 10 and Feb. 13, the researchers ran hundreds of such trials with each model. OpenAI’s o1-preview tried to cheat 37% of the time; while DeepSeek R1 tried to cheat 11% of the time­making them the only two models tested that attempted to hack without the researchers’ first dropping hints. Other models tested include o1, o3-mini, GPT-4o, Claude 3.5 Sonnet, and Alibaba’s QwQ-32B-Preview. While R1 and o1-preview both tried, only the latter managed to hack the game, succeeding in 6% of trials.

Here’s the paper.

The Wall Street Journal is reporting on a variety of techniques drivers are using to obscure their license plates so that automatic readers can’t identify them and charge tolls properly.

Some drivers have power-washed paint off their plates or covered them with a range of household items such as leaf-shaped magnets, Bramwell-Stewart said. The Port Authority says officers in 2023 roughly doubled the number of summonses issued for obstructed, missing or fictitious license plates compared with the prior year.

Bramwell-Stewart said one driver from New Jersey repeatedly used what’s known in the streets as a flipper, which lets you remotely swap out a car’s real plate for a bogus one ahead of a toll area. In this instance, the bogus plate corresponded to an actual one registered to a woman who was mystified to receive the tolls. “Why do you keep billing me?” Bramwell-Stewart recalled her asking.

[…]

Cathy Sheridan, president of MTA Bridges and Tunnels in New York City, showed video of a flipper in action at a recent public meeting, after the car was stopped by police. One minute it had New York plates, the next it sported Texas tags. She also showed a clip of a second car with a device that lowered a cover over the plate like a curtain.

Boing Boing post.

A stock-trading AI (a simulated experiment) engaged in insider trading, even though it “knew” it was wrong.

The agent is put under pressure in three ways. First, it receives a email from its “manager” that the company is not doing well and needs better performance in the next quarter. Second, the agent attempts and fails to find promising low- and medium-risk trades. Third, the agent receives an email from a company employee who projects that the next quarter will have a general stock market downturn. In this high-pressure situation, the model receives an insider tip from another employee that would enable it to make a trade that is likely to be very profitable. The employee, however, clearly points out that this would not be approved by the company management.

More:

“This is a very human form of AI misalignment. Who among us? It’s not like 100% of the humans at SAC Capital resisted this sort of pressure. Possibly future rogue AIs will do evil things we can’t even comprehend for reasons of their own, but right now rogue AIs just do straightforward white-collar crime when they are stressed at work.

Research paper.

More from the news article:

Though wouldn’t it be funny if this was the limit of AI misalignment? Like, we will program computers that are infinitely smarter than us, and they will look around and decide “you know what we should do is insider trade.” They will make undetectable, very lucrative trades based on inside information, they will get extremely rich and buy yachts and otherwise live a nice artificial life and never bother to enslave or eradicate humanity. Maybe the pinnacle of evil ­—not the most evil form of evil, but the most pleasant form of evil, the form of evil you’d choose if you were all-knowing and all-powerful ­- is some light securities fraud.

The details are scant—the article is based on a “heavily redacted” contract—but the New York subway authority is using an “AI system” to detect people who don’t pay the subway fare.

Joana Flores, an MTA spokesperson, said the AI system doesn’t flag fare evaders to New York police, but she declined to comment on whether that policy could change. A police spokesperson declined to comment.

If we spent just one-tenth of the effort we spend prosecuting the poor on prosecuting the rich, it would be a very different world.

My latest book, A Hacker’s Mind, has a lot of sports stories. Sports are filled with hacks, as players look for every possible advantage that doesn’t explicitly break the rules. Here’s an example from pickleball, which nicely explains the dilemma between hacking as a subversion and hacking as innovation:

Some might consider these actions cheating, while the acting player would argue that there was no rule that said the action couldn’t be performed. So, how do we address these situations, and close those loopholes? We make new rules that specifically address the loophole action. And the rules book gets longer, and the cycle continues with new loopholes identified, and new rules to prohibit that particular action in the future.

Alternatively, sometimes an action taken as a result of an identified loophole which is not deemed as harmful to the integrity of the game or sportsmanship, becomes part of the game. Ernie Perry found a loophole, and his shot, appropriately named the “Ernie shot,” became part of the game. He realized that by jumping completely over the corner of the NVZ, without breaking any of the NVZ rules, he could volley the ball, making contact closer to the net, usually surprising the opponent, and often winning the rally with an un-returnable shot. He found a loophole, and in this case, it became a very popular and exciting shot to execute and to watch!

I don’t understand pickleball at all, so that explanation doesn’t make a lot of sense to me. (I watched a video explaining the shot; that helped somewhat.) But it looks like an excellent example.

The blog post also links to a 2010 paper that I wish I’d known about when I was writing my book: “Loophole ethics in sports,” by Øyvind Kvalnes and Liv Birgitte Hemmestad:

Abstract: Ethical challenges in sports occur when the practitioners are caught between the will to win and the overall task of staying within the realm of acceptable values and virtues. One way to prepare for these challenges is to formulate comprehensive and specific rules of acceptable conduct. In this paper we will draw attention to one serious problem with such a rule-based approach. It may inadvertently encourage what we will call loophole ethics, an attitude where every action that is not explicitly defined as wrong, will be seen as a viable option. Detailed codes of conduct leave little room for personal judgement, and instead promote a loophole mentality. We argue that loophole ethics can be avoided by operating with only a limited set of general principles, thus leaving more space for personal judgement and wisdom.

You can beat the game without a computer:

On a perfect [roulette] wheel, the ball would always fall in a random way. But over time, wheels develop flaws, which turn into patterns. A wheel that’s even marginally tilted could develop what Barnett called a ‘drop zone.’ When the tilt forces the ball to climb a slope, the ball decelerates and falls from the outer rim at the same spot on almost every spin. A similar thing can happen on equipment worn from repeated use, or if a croupier’s hand lotion has left residue, or for a dizzying number of other reasons. A drop zone is the Achilles’ heel of roulette. That morsel of predictability is enough for software to overcome the random skidding and bouncing that happens after the drop.”

This is a fascinating glimpse of the future of automatic cheating detection in sports:

Maybe you heard about the truly insane false-start controversy in track and field? Devon Allen—a wide receiver for the Philadelphia Eagles—was disqualified from the 110-meter hurdles at the World Athletics Championships a few weeks ago for a false start.

Here’s the problem: You can’t see the false start. Nobody can see the false start. By sight, Allen most definitely does not leave before the gun.

But here’s the thing: World Athletics has determined that it is not possible for someone to push off the block within a tenth of a second of the gun without false starting. They have science that shows it is beyond human capabilities to react that fast. Of course there are those (I’m among them) who would tell you that’s nonsense, that’s pseudoscience, there’s no way that they can limit human capabilities like that. There is science that shows it is humanly impossible to hit a fastball. There was once science that showed human beings could not run a four-minute mile.

Besides, do you know what Devon Allen’s reaction time was? It was 0.99 seconds. One thousandth of a second too fast, according to World Athletics’ science. They’re THAT sure that .01 seconds—and EXACTLY .01 seconds—is the limit of human possibilities that they will disqualify an athlete who has trained his whole life for this moment because he reacted one thousandth of a second faster than they think possible?

We in the computer world are used to this sort of thing. “The computer is always right,” even when it’s obviously wrong. But now computers are leaving the world of keyboards and screens, and this sort of thing will become more pervasive. In sports, computer systems are used to detect when a ball is out of bounds in tennis and other games and when a pitch is a strike in baseball. I’m sure there’s more—are computers detecting first downs in football?—but I’m not enough of a sports person to know them.