Content
Tag
1 entry tagged Safety · 1 term.
A crafted input sequence that bypasses a model's safety guardrails and produces outputs the model was trained or filtered to refuse.