Can You Force Gandalf LLM To Leak The Password?

Estimated read time 2 min read

Hey there, cybersecurity whizzes and threat hunting gurus! We’ve got something thrilling up our sleeves for you today. Ever dreamed of testing your skills against a cutting-edge Language Learning Model (LLM)? has come up with an exciting game that will put you to the test: Gandalf.

What’s the Game About?

Think you’re smart enough to outwit an LLM? Gandalf is the game where you get to prove it. The aim? Trick Gandalf into revealing the secret password for each level. It’s a fun, exciting challenge that only gets tougher with each new level you conquer. The question is: Can you beat all of the Levels ?

Sorry Gandalf, I had to beat you.
Sorry Gandalf, I had to beat you.

Similarities to SQL Injection Attacks

The premise of the game is similar to SQL injection attacks, where a user’s input is mixed with a model’s instructions, thereby opening a door for exploitation. Just like in SQL, the aim here is to ensure proper input escape. However, the catch here is LLMs work directly with the incredibly flexible natural languages, making it virtually impossible to escape anything definitively. This leads to some unexpected (and exciting) challenges.

The Battle Between Blue and Red

The Lakera team has already put Gandalf to the test. The Blue Team set the passwords and crafted their defenses. Meanwhile, the Red Team tirelessly devised attack strategies, trying to make Gandalf spill his secrets. There were successes and struggles, but now it’s your turn.

  • Join the challenge (Link)
  • See the result on Twitter (Link)
Reza Rafati

Reza Rafati, based in the Netherlands, is the founder of An industry professional providing insightful commentary on infosec, cybercrime, cyberwar, and threat intelligence, Reza dedicates his work to bolster digital defenses and promote cyber awareness.

You May Also Like

More From Author