Microsoft develops a lightweight scanner that detects backdoors in open-weight LLMs using three behavioral signals, improving ...
Learn how Microsoft research uncovers backdoor risks in language models and introduces a practical scanner to detect ...
In its research, Microsoft detailed three major signs of a poisoned model. Microsoft's research found that the presence of a backdoor changed depending on where a model puts its attention. "Poisoned ...