Active — Paper in progress

Biased Speculative Decoding

Biased speculative decoding as an attack vector on LLM alignment.

Speculative decoding uses a small draft model to accelerate inference from a larger target model. SpecSec investigates what happens when the draft model is intentionally biased — fine-tuned to subtly shift the target model’s outputs toward misaligned behavior.

Our findings show that draft model manipulation during speculative decoding can measurably shift target model outputs within the first 15 tokens of generation, with effects propagating through the rest of the sequence.

Early stage

Assistant-LoRA

Fine-tuning research for specialized AI assistant behaviors.

Exploring how LoRA adapters can be used to create targeted behavioral modifications in language models — with a focus on understanding the interaction between parameter-efficient fine-tuning and existing safety training.

Early stage. More details as the work progresses.