Key Takeaways Copied to clipboard!
- Mercor pays top experts, like poets at $150/hour, to create rubrics and evaluations that teach frontier AI models, leveraging their expertise across billions of users.
- The rate of model improvement on economically valuable tasks is profound, showing a 25-30% improvement per year, though long-horizon tasks and tool integration remain key challenges.
- The future of knowledge work involves a shift where professionals will spend less time on repetitive analysis and more time training agents and building Reinforcement Learning (RL) environments to automate those tasks.
Segments
Mercor’s Expert Hiring Model
Copied to clipboard!
(00:00:30)
- Key Takeaway: Mercor pays top experts, like poets at $150/hour, to create rubrics that teach frontier AI models, scaling their specialized knowledge.
- Summary: Mercor hires experts to train leading AI models by creating evaluation rubrics, similar to how a professor grades essays. Paying high rates attracts the best talent whose knowledge is then applied across billions of users. Experts must sometimes disagree to capture edge cases in subjective domains like poetry.
Measuring Real-World AI Progress
Copied to clipboard!
(00:05:29)
- Key Takeaway: The AI Productivity Index (Apex) measures model progress against economically valuable tasks, revealing a 25-30% annual improvement rate for frontier models.
- Summary: Apex bridges the gap between academic AI evaluations and real-world customer outcomes by surveying experts on how they spend their time. This methodology uses expert time as a proxy for economic value, showing profound performance deltas between models like GPT-4.0 and GPT-5.
Model Limitations and Future Capabilities
Copied to clipboard!
(00:08:09)
- Key Takeaway: Current models struggle with long-horizon tasks and integrating multiple tools, but these capabilities are expected to be mastered within six to twelve months.
- Summary: Models are currently superhuman at single-turn tasks but struggle with projects requiring 50 to 100 hours of integrated work. Once researchers can measure these long-horizon and tool-use capabilities, they can rapidly optimize them. The ability to stump experts like Cass Sunstein in law might take two to three years due to the high degree of taste involved.
Data Needs for AI Advancement
Copied to clipboard!
(00:12:05)
- Key Takeaway: The most valuable data for AI advancement is success measurement data, specifically rubrics and test sets, rather than just pre-training curriculum.
- Summary: Deep domain experts can most impactfully contribute by defining rigorous evaluations (evals) for model capabilities across various fields. While raw data is helpful, iterative scoring of model attempts against a rubric (like RLHF) is crucial for learning preferences and improving performance. Tests mapping to meaningful economic value are the highest priority data requests.
Enshrining Taste in LLMs
Copied to clipboard!
(00:18:11)
- Key Takeaway: Future models will likely personalize taste by learning from and blending preferences across different historical eras, rather than enshrining only current standards.
- Summary: The debate over whether to enshrine current taste versus historical standards in models like those generating poetry is complex. If taste cannot be captured in a rubric, preference selection methods like RLHF can be used to teach models user desires. The long-term goal is personalization, allowing models to pull from various eras of taste based on individual user preferences.
The Future of Knowledge Work Roles
Copied to clipboard!
(00:22:30)
- Key Takeaway: A majority of high-end knowledge workers may transition within five years to training agents and building RL environments rather than performing repetitive analysis.
- Summary: Economically valuable tasks will shift toward fixed-cost investments where workers teach an agent a workflow once, allowing agents to perform it repeatedly. This transition means knowledge workers will primarily focus on finding and defining mistakes in agent performance to create new RL environments. This new job category requires domain knowledge, not deep technical AI expertise.
Labor Market Efficiency and Hiring
Copied to clipboard!
(00:39:20)
- Key Takeaway: Labor markets are inefficient due to disaggregation, and while AI can improve matching, the rise of optimized AI-generated applications may revive nepotism as a selection signal.
- Summary: The difficulty in matching candidates to jobs stems from the matching problem, not just distribution, as evidenced by LinkedIn’s limitations in predicting performance. While AI can analyze interview performance superhumanly, confounding variables and the homogenization of candidate profiles might lead companies to rely on personal recommendations again. The key to efficiency is using AI to assess performance while using all available tools.
Scaling Thiel Fellowship and Entrepreneurship
Copied to clipboard!
(00:44:18)
- Key Takeaway: The Thiel Fellowship’s selection process could be scaled 100,000x by using AI to interview and assess unconventional thinkers globally, increasing economic mobility.
- Summary: The fellowship’s current strength lies in its highly curated, local market selection, but AI could enable them to evaluate a fraction of the world’s population for unconventional thinking. Scaling requires technology to better assess candidates beyond the immediate network, though the best interviewers remain domain-specific. Dyslexic individuals often excel as entrepreneurs because they learn delegation early and approach problems unconventionally.
Personal Life and Company Naming
Copied to clipboard!
(00:47:46)
- Key Takeaway: Brendan Foody values gaining a global perspective through travel, particularly to Japan, to better understand varied viewpoints on AI and human interaction.
- Summary: If given a free year, Foody would prioritize travel to gain global perspective, similar to how Sam Altman toured the world after ChatGPT’s launch. He enjoys good food, recommending El Matate’s and Catonia in San Francisco, and uses the Belly app for local food ratings. The company name Mercor derives from the Latin word for marketplace, mirroring the Mercatus Center’s name.
Mercor’s Next Steps and Learning Focus
Copied to clipboard!
(00:59:04)
- Key Takeaway: Mercor’s immediate goal is scaling realistic evaluations for models using complex, multi-day tool trajectories, shifting focus from pure intelligence to enterprise utility.
- Summary: The next major focus is measuring model capabilities across long, multi-day tasks involving various tools, which is critical for enterprise adoption. The company is fascinated by how human labor can be most efficiently applied at the frontier of AI research to drive model improvement. This involves learning which specific rubrics and data types yield the greatest advancements in model training.