# Anthropic Fellows Program — Reinforcement Learning

[Anthropic](https://www.jorb.ai/firms/anthropic.md) · London · United Kingdom · [Research / Applied Science](https://www.jorb.ai/jobs/research-applied-science.md)

Anthropic is hiring a Anthropic Fellows Program — Reinforcement Learning in London. Posted 2026-04-10; applications close 2026-06-09.

**Apply**: https://job-boards.greenhouse.io/anthropic/jobs/5183052008

Posted 12d ago.

## Role details

## About Anthropic

Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a growing group of researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems.

**Apply using this link**. The next cohort of Anthropic fellows starts on July 20, 2026. **Apply by April 26, 2026** to be considered for this cohort. We will continue accepting applications for later cohorts on a rolling basis. In exceptional circumstances, we may be able to accommodate fellows starting outside of usual cohort timelines.

This page is specific to one of the Anthropic Fellows Workstreams; see also the main **Anthropic Fellows posting**.

# Anthropic Fellows Program overview

The Anthropic Fellows Program is designed to foster AI research and engineering talent. We provide funding and mentorship to promising technical talent—regardless of previous experience.

Fellows will primarily use external infrastructure (e.g., open-source models, public APIs) to work on an empirical project aligned with our research priorities, with the goal of producing a **public output** (e.g., a paper submission). In one of our earlier cohorts, over 80% of fellows produced papers.

We run multiple cohorts of Fellows each year and review applications on a rolling basis. This application is for cohorts starting in July 2026 and beyond.

## What to expect

  
- 4 months of full-time research
  
- Direct mentorship from Anthropic researchers
  
- Access to a shared workspace (in either Berkeley, California or London, UK)
  
- Connection to the broader AI safety and security research community
  
- Weekly stipend of 3,850 USD / 2,310 GBP / 4,300 CAD + benefits (these vary by country)
  
- Funding for compute (~$15k/month) and other research expenses

## Interview process

The interview process will include an initial application and reference check, technical assessments and interviews, and a research discussion.

**We encourage you to apply even if you do not believe you meet every single qualification.** Not all strong candidates will meet every qualification as listed. Research shows that people from underrepresented groups are more prone to imposter syndrome and doubting their candidacy, so we urge you not to exclude yourself prematurely and to submit an application if you're interested in this work. We believe AI systems like the ones we're building have enormous social and ethical implications, making diverse perspectives on our team important.

## Compensation

The expected base stipend for this role is 3,850 USD / 2,310 GBP / 4,300 CAD per week, with an expectation of 40 hours per week for 4 months (with possible extension).

## Fellows workstreams

Due to the success of the Anthropic Fellows for AI Safety Research program, we are expanding it across teams at Anthropic. We expect significant overlap in skills and responsibilities across the roles and will by default consider candidates for all workstreams.

Some workstreams may include unique assessment steps; we therefore ask you for workstream preferences in the application. An overview of the current workstreams:

  
- AI Safety Fellows
  
- AI Security Fellows
  
- ML Systems & Performance Fellows
  
- Reinforcement Learning Fellows
  
- Economics & Societal Impacts Fellows

This page is specific to one of the Anthropic Fellows Workstreams; see also the main **Anthropic Fellows posting**.

## Across the workstreams, you may be a good fit if you:

  
- Are motivated by making AI safe and beneficial for society as a whole
  
- Are excited to transition into empirical AI research and would be interested in a full-time role at Anthropic
  
- Have a strong technical background in computer science, mathematics, or physics
  
- Thrive in fast-paced, collaborative environments
  
- Can implement ideas quickly and communicate clearly

## Strong candidates may also have:

  
- Strong background in a discipline relevant to a specific Fellows workstream (e.g., economics, social sciences, or cybersecurity)
  
- Experience in research or engineering related to their workstream

## Candidates must be:

  
- Fluent in Python programming
  
- Available to work full-time on the Fellows program

# Reinforcement Learning Fellows

## Mentors, research areas, & past projects

Fellows will undergo a project selection and mentor matching process. Potential research areas and mentors include:

  
- Ruhua Jiang
  
- Kaidi Cao
  
- Sunny Duan
  
- David Brandfonbrener
  
- Colt Steele
  
- Dino Distefano
  
- Will Williams

Projects in this workstream may include:

  
- Building model-based tools to better understand AI training data and improve training data quality
  
- A research project to better understand generalization
  
- Creating RL environments to improve Claude models in capabilities within your domain of expertise
  
- Building RL environments for safety-related tasks
  
- Conducting research and implementing solutions in areas such as RL algorithms

## Unique candidate criteria

You might be a particularly great fit for this workstream if you:

  
- Have strong software engineering skills with experience building complex ML systems
  
- Can balance research exploration with engineering rigor and operational reliability
  
- Enjoy collaborating across research and engineering disciplines
  
- Are comfortable working with large-scale distributed systems and high-performance computing
  
- Have experience with training, fine-tuning, or evaluating large language models
  
- Adept at analyzing and debugging model training processes

# Logistics

**Logistics Requirements:** To participate in the Fellows program, you must have work authorization in the US, UK, or Canada and be located in that country during the program.

**Workspace Locations:** We have designated shared workspaces in London and Berkeley where fellows will work from and mentors will visit. **We are also open to remote fellows in the UK, US, or Canada**. We will ask you about your availability to work from Berkeley or London (full- or part-time) during the program.

**Visa Sponsorship:** We are **not** currently able to sponsor visas for fellows. To participate in the Fellows program, you need to have or independently obtain full-time work authorization in the UK, the US, or Canada.

**Program Duration:** The program runs for 4 months, full-time. If you can't commit to the full duration, please still apply and note your constraints in the application. We review these requests on a case-by-case basis.

**Please note:** We do not guarantee that we will make any full-time offers to fellows. However, strong performance during the program may indicate that a Fellow would be a good fit for full-time roles at Anthropic. In previous cohorts, 25–50% of fellows received a full-time offer, and we’ve supported many more to go on to do great work on AI safety and security at other organizations.

Applications and interviews are managed by Constellation, our official recruiting partner for this program. Constellation also runs the Berkeley workspace that hosts fellows. Clicking "Apply here" will redirect you to Constellation's application portal. You can expect to receive emails from Constellation with application updates.

## Apply here

**Apply here**

The below are Anthropic's policies for full-time roles. These do NOT apply to the Fellows Program.

## Logistics

**Minimum education:** Bachelor's degree or an equivalent combination of education, training, and/or experience

**Required field of study:** A field relevant to the role as demonstrated through coursework, training, or professional experience

**Minimum years of experience:** Years of experience required will correlate with the internal job level requirements for the position

**Location-based hybrid policy:** Currently, we expect all staff to be in one of our offices at least 25% of the time. However, some roles may require more time in our offices.

**Visa sponsorship:** We do sponsor visas. If we make you an offer, we will make every reasonable effort to obtain a visa and we retain an immigration lawyer to help with this.

**We encourage you to apply even if you do not believe you meet every single qualification.** Not all strong candidates will meet every qualification as listed. We urge you to submit an application if you're interested in this work. We strive to include a range of diverse perspectives on our team.

**Your safety matters to us.** To protect yourself from potential scams, remember that Anthropic recruiters only contact you from @anthropic.com email addresses. Legitimate recruiters will never ask for money, fees, or banking information before your first day. If unsure, visit anthropic.com/careers for confirmed position openings.

## How we're different

We believe the highest-impact AI research is big science. At Anthropic we work as a single cohesive team on a few large-scale research efforts. We value impact—advancing our long-term goals of steerable, trustworthy AI—over smaller, more specific puzzles. We view AI research as an empirical science, akin to physics and biology as much as computer science. We are highly collaborative and host frequent research discussions to pursue the highest-impact work. We greatly value communication skills.

The easiest way to understand our research directions is to read our recent work, which continues many directions our team pursued prior to Anthropic, including GPT-3, Circuit-Based Interpretability, Multimodal Neurons, Scaling Laws, AI & Compute, Concrete Problems in AI Safety, and Learning from Human Preferences.

## Come work with us!

Anthropic is a public benefit corporation headquartered in San Francisco. We offer competitive compensation and benefits, optional equity donation matching, generous vacation and parental leave, flexible working hours, and a welcoming office space for collaboration. **Guidance on Candidates' AI Usage:** Learn about our policy for using AI in our application process.

## More open roles at Anthropic

- [Anthropic Fellows Program — ML Systems & Performance](https://www.jorb.ai/jobs/69d8499d193675065559e149.md) — London, posted 12d ago
- [Anthropic Fellows Program — AI Security](https://www.jorb.ai/jobs/69d8499d193675065559e14b.md) — London, posted 12d ago
- [Anthropic Fellows Program](https://www.jorb.ai/jobs/69d8499d193675065559e14c.md) — London, posted 12d ago
- [Anthropic Fellows Program — AI Safety](https://www.jorb.ai/jobs/69d8499d193675065559e14d.md) — London, posted 12d ago
- [Emerging Account Executive, Startups](https://www.jorb.ai/jobs/69af8b969243cd01c9194896.md) — New York, posted 1mo ago

---

Updated: 2026-04-22
Canonical: https://www.jorb.ai/jobs/69d8499d193675065559e14e
