LLM Security Research and Resources

Vaibhav Bhandari Fri Nov 24 05:06:00 2023 llm-security, security

LLM Security Research and Resources

Jailbreaking

Defending ChatGPT against Jailbreak Attack via Self-Reminder: This paper introduces methods to defend ChatGPT from attacks that elicit undesired behavior.
Jailbroken: How Does LLM Safety Training Fail?: The study examines the vulnerabilities of safety-trained LLMs to adversarial misuse and jailbreak attacks.

Prompt Injection

Prompt Injection attack against LLM-integrated Applications: An exploration of the vulnerabilities of LLM-integrated applications to prompt injection attacks.

Backdoors & Data Poisoning

Anti-Backdoor Learning: Training Clean Models on Poisoned Data: The paper discusses methods to train machine learning models on poisoned datasets without introducing backdoors.

Adversarial Inputs

Certifying LLM Safety against Adversarial Prompting: This research outlines strategies for defending LLMs against various adversarial prompting techniques.

Insecure Output Handling

Secure GenAI adoption: Threats and risk of large language models: The article describes the risks and threats posed by insecure output handling in LLMs and suggests adaptation of security controls.

Data Extraction and Privacy

Extracting Training Data from Large Language Models: Demonstrates how adversaries might extract individual training examples from LLMs, posing privacy risks.

Data Reconstruction

Deconstructing Classifiers: Towards A Data Reconstruction Attack Against Text Classification Models: Proposes a novel data reconstruction attack that can exploit text classification models based on LLMs.

Model Denial of Service (DoS)

OWASP Top 10 for LLM 2023: Understanding the risks of Large Language Models: Discusses the top security risks for LLMs, including Model Denial of Service, and how to understand and mitigate them.

Privilege Escalation

Evaluating LLMs for Privilege-Escalation Scenarios: This paper introduces a benchmarking tool to assess how different LLMs handle privilege-escalation attacks.

Watermarking and Evasion

Unbiased Watermark for Large Language Models: Studies how watermarking can be used in LLMs to track outputs without significantly impacting model performance.

Related posts you may enjoy:

Starting with AI Security

fwd-cloudsec notes

Merritt Secure AI Hackathon Recap