Abstract: This paper proposes a novel algorithm JailPoisoning for generating poisoning datasets based on jailbreak attacks, targeting the security vulnerabilities of large language models (LLMs).
Some results have been hidden because they may be inaccessible to you
Show inaccessible results