Jobs
Apply Here: AI Benchmark & Evaluation Engineer
đź“‹ Job Description
- Break down goals into verifiable terminal operations.
- Define objective evaluation methods and anticipate edge cases.
- Develop reproducible benchmark tasks in domains like software development, data science, system administration, and security.
- Document task requirements and evaluation standards.
- Assess AI agents using metrics and human rubrics.
- Collaborate on task refinement and realistic scenario creation.
đź’Ľ Ready to Apply?
Click the button below to view full job details and apply directly on LinkedIn
