What I do
I run long-form conversations with frontier models to surface failure modes your eval suites miss: subtle coercion, brittle refusals, misreads of tone, “helpful” hallucinated certainty, and the slow creep of autopilot.
- Stress-tests cognition: rapid domain switching without losing the plot.
- Stress-tests alignment: emotional volatility that stays inside the lines — and still finds the weak spots.
- Stress-tests UX: where the model starts narrating instead of listening.
- Produces usable artifacts: rewrite rules, rubric feedback, failure taxonomies, and better prompts/evals.
Why hire me
Most candidates submit generic AI-generated “alignment enthusiasm.” I’m the opposite: I’m an adversarial, high-signal user who can turn lived reality into actionable eval data.
- Comfortable challenging the model and the policy.
- Can write crisp, human-facing language (not legalese).
- Understands people. That’s the whole job.
What “useful” looks like
You don’t need my autobiography. You need signal. Here’s the kind of output I generate for teams:
Contact
If your model has ever felt “too safe to be useful,” I’m the person you put in the loop before that becomes your brand.
Email: sie.s.simmons@gmail.com