# Agent Evaluation Intern

[Tencent](https://www.jorb.ai/firms/tencent.md) · London · United Kingdom · [Research / Applied Science](https://www.jorb.ai/jobs/research-applied-science.md)

Tencent is hiring a Agent Evaluation Intern in London. Posted 2026-05-20; applications close 2026-07-19.

**Apply**: https://tencent.wd1.myworkdayjobs.com/Tencent_Careers/job/UK-London/Agent-Evaluation-Intern_R107491-1

Posted 21h ago.

## Role details

## About the Hiring Team

Level Infinite is Tencent’s global gaming brand. It is a global game publisher offering a comprehensive network of services for games, development teams, and studios around the world. We are dedicated to delivering engaging and original gaming experiences to a worldwide audience, whenever and wherever they choose to play while building a community that fosters inclusivity, connection, and accessibility. Level Infinite also provides a wide range of services and resources to our network of developers and partner studios around the world to help them unlock the true potential of their games.

## What the Role Entails

We are hiring an intern to work on evaluation and reliability infrastructure for a real-world LLM agent system in the UA performance marketing field. The agent performs multi-step reasoning, retrieves context, selects tools, executes actions, handles user confirmations, and interacts with external services.

The goal of this internship is to build transferable expertise in agent evaluation engineering: evaluating tool use, measuring trajectory quality, designing benchmarks, analyzing traces, comparing model and prompt variants, and improving the reliability of agentic AI systems.

This role is ideal for someone interested in future opportunities in LLM agent evaluation, AI safety evaluation, research engineering, LLMOps, or applied AI infrastructure.

  
- Research the state-of-the-art agentic workflow evaluation framework in the industry and in the research literature.
  
- Apply theory to build automated evaluation pipelines that can run agent scenarios, capture execution artifacts, score results, and detect regressions.
  
- Evaluate tool-use behavior, including whether the agent selects the right tool, passes correct arguments, avoids unnecessary calls, and handles tool errors appropriately.
  
- Analyze agent trajectories using traces, logs, intermediate steps, and final outputs to identify reasoning failures, context misuse, hallucinated assumptions, and brittle workflow patterns.
  
- Design metrics for agent reliability, including success rate, tool-call precision, argument accuracy, recovery rate, retry count, latency, cost, and safety-related failure rates.
  
- Create reusable evaluation datasets from synthetic cases, golden workflows, and real anonymized executions.
  
- Support experiments comparing prompts, model providers, tool descriptions, memory strategies, context construction methods, and execution modes.
  
- Help build human evaluation workflows and rubrics for judging agent correctness, faithfulness, usefulness, and risk awareness.
  
- Work with engineers to translate evaluation findings into better tests, monitoring signals, tool interfaces, prompts, and guardrails.
  
- Potentially compose research papers and publish in scientific conferences.

## Who We Look For

  
- Currently pursuing or recent graduates of a Master’s or PhD degree in Computer Science, Artificial Intelligence, Machine Learning, Software Engineering, Data Science, or a related field.
  
- Strong Python fundamentals and interest in AI systems.
  
- Curious about how LLM agents work, fail, and improve.
  
- Interested in evaluation methodology, not just application building.
  
- Comfortable reading logs, traces, test cases, and structured data.
  
- Detail-oriented and able to define clear, measurable criteria for ambiguous agent behavior.
  
- Prior experience with LLMs, LangChain-like agents, tool calling, pytest, data analysis, or observability tools is helpful but not required.

## Equal Employment Opportunity at Tencent

As an equal opportunity employer, we firmly believe that diverse voices fuel our innovation and allow us to better serve our users and the community. We foster an environment where every employee of Tencent feels supported and inspired to achieve individual and common goals.

## Applying to this role

This Agent Evaluation Intern role at Tencent runs through the firm's own careers portal and expects a CV and cover letter written specifically for the posting, not a portable submission carried across firms. Jorb AI's application agent tailors a CV and cover letter from your background to this posting and tracks the role alongside the rest of your applications.

[Tailor this application](https://www.jorb.ai/signup?ref=job-atom&firm=tencent&job=6a0d9445638107003bb3190e)

## More open roles at Tencent

- [Agent Development Intern](https://www.jorb.ai/jobs/6a0d9445638107003bb31913.md) — London, posted 21h ago
- [Internal Control Intern](https://www.jorb.ai/jobs/6a0c17033dccfb759a17c0fd.md) — Singapore, posted 1d ago
- [Associate Internal Control Analyst](https://www.jorb.ai/jobs/6a0c17033dccfb759a17c0ff.md) — Singapore, posted 1d ago
- [AI Agent Research & Application Intern](https://www.jorb.ai/jobs/6a0c8533d4412dbadb717bac.md) — London, posted 1d ago
- [WXG - Data Engineering Intern](https://www.jorb.ai/jobs/6a06c915f71a2fe1e46b0538.md) — Singapore, posted 5d ago

---

Updated: 2026-05-20
Canonical: https://www.jorb.ai/jobs/6a0d9445638107003bb3190e
