高中生用「我的世界」评测SOTA模型!Claude暂时领先,DeepSeek紧随其后

新智元报道编辑:定慧AI模型在基准测试中表现优秀,但在人类容易解决的问题上却频频出错。创意评测兴起,如MC-Bench利用Minecraft方块来评估模型能力,普通用户也能参与评测。这种测评范式更贴近人类对AI直观和创造力的实际期待。

速递|高中生在《我的世界》发起AI智力标准,百万建造玩家投票选出最佳模型

A high school student developed MC-Bench, a website that allows AI models to compete in Minecraft builds. The platform uses the popular game as a test of AI’s creativity and capability. Users can vote on which model created the best build, while Anthropic, Google, OpenAI, and Alibaba are among the contributors funding the project.