The launch of ChatGPT on November 30, 2022, marked a major technological breakthrough: in just two months, the application reached the 100 million user milestone—a historical growth rate surpassing TikTok or Instagram. But behind the hype for these productivity-boosting assistants lies a complex reality: a new digital "Wild West" where model capabilities are evolving faster than our defenses.
This article, a summary of the first part of the codelab Prompt Warfare: Attacks & Defenses in the Realm of LLMs ⚔️🛡️🤖 presented at Devfest 2025, explores the mechanisms of this revolution, its structural flaws, and strategies for protection.
1. At the Heart of the Machine: The Transformer Revolution▲
To secure a technology, you must first understand it. It is crucial not to confuse terms: AI is the global field, neural networks are a brain-inspired method within it, and LLMs (Large Language Models) are specific neural networks trained to understand and generate text.
The true breakthrough came in 2017 with the Transformer architecture proposed by Google Brain. Unlike older models (RNNs, CNNs) that struggled to retain context over long sequences, the Transformer uses a self-attention mechanism. This allows the model to weigh the importance of each word relative to others, regardless of their distance in the sentence, providing a nuanced understanding of context.
This architecture currently powers two major use cases:
- Chatbots: Designed to simulate natural conversation (e.g., Customer service for SNCF or Sephora).
- Copilots: Oriented toward task execution and production assistance (e.g., GitHub Copilot for code, Gemini for office productivity).
2. An Unprecedented Attack Surface▲
LLM security differs radically from classical cybersecurity. Here, it’s not just about viruses or DDoS attacks, but semantic manipulations. An LLM is never isolated: it acts as a "brain" at the center of a connected architecture.
Vulnerability Vectors▲
Integrating an LLM exposes several critical frontiers:
- User Input: This is the open door for Prompt Injections, where malicious instructions are hidden within natural language.
- Internal Services (RAG/Plugins): If the LLM is connected to your APIs or databases, a successful injection can trick it into generating payloads (SQL, commands) that your systems may execute blindly.
- Training Data: The risk of data poisoning exists if source data (public or internal) contains biases or misleading information.
The Architectural Dilemma▲
The choice of deployment directly impacts your security posture:
- Model via API (e.g., OpenAI, Bedrock): Simple to implement, but your data travels outside your perimeter to a third party ("black box").
- Hosted Model (On-premise/Private Cloud): You maintain total control over the data, but you inherit the heavy responsibility of maintaining, updating, and securing the model itself.
3. Chronicle of Deviations: The Reality of Risk▲
Recent history proves that "security through obscurity" or simple system instructions are not enough. Incidents are varied and costly:
- Commercial Hijacking: In 2023, a user used prompt injection on a Chevrolet dealership's chatbot to make it agree to the sale of a new vehicle for $1, creating a legally binding agreement.
- Legal Liability: In 2024, Air Canada was ordered by a court to pay damages after its chatbot "hallucinated" a non-existent refund policy. The court ruled the company responsible for the information provided by its AI.
- Algorithmic Discrimination: iTutorGroup had to pay $365,000 to settle a lawsuit after using an AI that automatically rejected job applicants based on their age.
- Model Corruption: The historical example of Tay (Microsoft) in 2016, which became racist in less than 24 hours due to real-time learning from toxic interactions, remains a textbook case.
4. Structuring the Defense: Standards and Regulations▲
Faced with these threats, improvisation is no longer an option. Robust frameworks exist to structure your defense strategy.
Essential Technical Frameworks▲
- OWASP Top 10 for LLM: The "bible" of AI security. It categorizes 10 critical risks, ranging from Prompt Injection (LLM01) to Unsecured Output Handling (LLM05) and Supply Chain risks (LLM03).
- MITRE ATLAS: Modeled after MITRE ATT&CK, this framework maps the actual tactics and techniques used by attackers (e.g., model theft, evasion), allowing SOC teams to anticipate threats.
- Google SAIF (Secure AI Framework): A holistic framework proposed by Google to integrate security from the design phase ("Secure by Default") across data, infrastructure, and the model.
The Regulatory Landscape▲
Compliance is becoming a strategic issue with two distinct global approaches:
- European Union (AI Act): A risk-based regulation. Critical systems are strictly regulated, while generative AIs (like chatbots) are subject to transparency obligations toward users.
- United States: A more sectoral approach that prioritizes freedom of speech (First Amendment) while legislating against specific abuses like deepfakes (Take It Down Act).
5. conclusion▲
In conclusion, generative AI is a powerful productivity lever, capable of reducing the time spent on certain tasks by 60%. However, its integration requires a shift from naive adoption to controlled adoption, by rigorously applying security frameworks like the OWASP Top 10 and actively monitoring the interactions of these new digital "brains."




