
{"id":8405,"date":"2026-05-20T05:31:43","date_gmt":"2026-05-19T21:31:43","guid":{"rendered":"https:\/\/infernews.com\/blog\/bench-can-ai-agents-automate-end-to-end-long-horizon-policy-rich-healthcare-work\/"},"modified":"2026-05-20T05:31:43","modified_gmt":"2026-05-19T21:31:43","slug":"bench-can-ai-agents-automate-end-to-end-long-horizon-policy-rich-healthcare-work","status":"publish","type":"post","link":"https:\/\/infernews.com\/blog\/bench-can-ai-agents-automate-end-to-end-long-horizon-policy-rich-healthcare-work\/","title":{"rendered":"chi-bench\uff1a\u6e2c\u8a66\u91ab\u7642 AI \u4ee3\u7406\u771f\u529f\u592b"},"content":{"rendered":"<figure class=\"wp-block-image size-large\"><img decoding=\"async\" src=\"https:\/\/infernews.com\/blog\/wp-content\/uploads\/2026\/05\/main_pass_at_1-b1ca71aceda4.jpg\" alt=\"\u03c7-Bench\"><\/figure>\n<p>chi-bench \u4fc2\u4e00\u500b\u7528\u4f86\u8a55\u4f30 AI \u4ee3\u7406\u5605\u57fa\u6e96\u74b0\u5883\uff0c\u91cd\u9ede\u5514\u4fc2\u554f\u7b54\uff0c\u800c\u4fc2\u8981\u6a21\u578b\u55ba\u6a21\u64ec\u51fa\u569f\u5605\u7f8e\u570b\u91ab\u7642\u5de5\u4f5c\u6d41\u7a0b\u4e2d\uff0c\u9010\u6b65\u5b8c\u6210\u6574\u500b\u500b\u6848\u3002\u5b83\u8986\u84cb\u4e8b\u524d\u6388\u6b0a\u3001\u4fdd\u96aa\u65b9\u5229\u7528\u7ba1\u7406\uff0c\u4ee5\u53ca\u7fa4\u9ad4\u7167\u8b77\u7ba1\u7406\u4e09\u985e\u9577\u6d41\u7a0b\u5de5\u4f5c\uff0c\u76ee\u7684\u662f\u6e2c\u8a66 AI \u6709\u5187\u80fd\u529b\u8655\u7406\u591a\u6b65\u9a5f\u3001\u898f\u5247\u5bc6\u96c6\u3001\u800c\u4e14\u6d89\u53ca\u591a\u89d2\u8272\u5354\u4f5c\u5605\u4efb\u52d9\u3002<\/p>\n<p>\u5b98\u65b9\u6458\u8981\u63d0\u5230\u5b83\u4f7f\u7528 20 \u500b healthcare apps\u300187 \u500b MCP tools\uff0c\u4ee5\u53ca\u4e00\u4efd 1,290+ \u6587\u4ef6\u7684 managed-care operations handbook \u4f5c\u70ba\u4efb\u52d9\u4f9d\u64da\u3002<\/p>\n<p>\u5be6\u969b\u4f7f\u7528\u6642\uff0c\u7814\u7a76\u8005\u901a\u5e38\u6703\u5148\u6e96\u5099\u5c0d\u61c9\u5605 API \u91d1\u9470\uff0c\u518d\u63c0\u9078\u4ee3\u7406\u6846\u67b6\u540c\u6a21\u578b\u8dd1\u4efb\u52d9\uff0c\u4e4b\u5f8c\u7531\u5167\u5efa\u8a55\u5be9\u6a5f\u5236\u6309\u6bcf\u6b21\u7d50\u679c\u8a55\u5206\u3002\u6bcf\u500b\u4efb\u52d9\u6703\u63d0\u4f9b\u81e8\u5e8a\u500b\u6848\u3001\u6a21\u64ec\u5de5\u4f5c\u7cfb\u7d71\uff0c\u4ee5\u53ca\u5927\u91cf\u64cd\u4f5c\u624b\u518a\uff0cAI \u8981\u900f\u904e\u5de5\u5177\u547c\u53eb\u540c\u64b0\u5beb\u6587\u4ef6\u53bb\u63a8\u9032\u6d41\u7a0b\uff0c\u5514\u4fc2\u55ae\u9760\u751f\u6210\u4e00\u6bb5\u7b54\u6848\u5c31\u7b97\u5b8c\u6210\u3002<\/p>\n<p>\u5b83\u6700\u6709\u610f\u601d\u5605\u5730\u65b9\uff0c\u5728\u65bc\u628a\u91ab\u7642\u884c\u653f\u6d41\u7a0b\u5165\u9762\u6700\u9ebb\u7169\u5605\u90e8\u5206\u5177\u9ad4\u5316\uff1a\u898f\u5247\u591a\u3001\u6587\u4ef6\u591a\u3001\u7cfb\u7d71\u591a\uff0c\u800c\u4e14\u4e2d\u9014\u53ef\u80fd\u8981\u53cd\u8986\u4e92\u52d5\u3002\u76f8\u6bd4\u4e00\u822c benchmark \u53ea\u91cf\u5ea6\u55ae\u6b65\u63a8\u7406\uff0cchi-bench \u66f4\u63a5\u8fd1\u73fe\u5be6\u4e16\u754c\uff0c\u56e0\u70ba\u5b83\u6703\u8003\u9a57\u6a21\u578b\u9ede\u6a23\u8de8\u61c9\u7528\u7a0b\u5f0f\u3001\u8ddf\u4f4f\u653f\u7b56\u8fa6\u4e8b\uff0c\u4e26\u4fdd\u6301\u9577\u6642\u9593\u6c7a\u7b56\u4e00\u81f4\u3002<\/p>\n<ul>\n<li>\u6db5\u84cb 3 \u5927\u91ab\u7642\u6d41\u7a0b\u5834\u666f\uff0c\u5c6c\u65bc\u7aef\u5230\u7aef\u4efb\u52d9\u8a55\u4f30<\/li>\n<li>\u4ee5\u7d04 20 \u500b\u6a21\u64ec\u91ab\u7642\u61c9\u7528\u53ca\u5927\u91cf\u6587\u4ef6\u4f5c\u70ba\u64cd\u4f5c\u74b0\u5883<\/li>\n<li>\u652f\u63f4\u591a\u985e\u4ee3\u7406\u8207\u6a21\u578b\u6bd4\u8f03\uff0c\u5305\u62ec Claude\u3001OpenAI\u3001Gemini \u53ca\u958b\u6e90\u6b0a\u91cd\u8def\u7dda<\/li>\n<li>\u6392\u884c\u699c\u4ee5 pass@1 \u70ba\u4e3b\uff0c\u4ea6\u53ef\u4fdd\u7559\u591a\u6b21\u8a66\u8dd1\u4f5c\u984d\u5916\u5206\u6790<\/li>\n<\/ul>\n<p>\u5f9e\u73fe\u6709\u8cc7\u6599\u7747\uff0c\u5462\u500b\u57fa\u6e96\u5c0d\u73fe\u6642\u6700\u5f37\u6a21\u578b\u90fd\u76f8\u7576\u56f0\u96e3\uff0c\u4ee3\u8868\u5b83\u6709\u4e00\u5b9a\u9451\u5225\u529b\uff0c\u5514\u6703\u8f15\u6613\u88ab\u9ad8\u5206\u63a9\u84cb\u5f31\u9ede\u3002\u5df2\u77e5\u76f8\u95dc\u914d\u7f6e\u5305\u62ec Claude Code \u914d Claude Opus\u3001OpenAI\/Codex \u8def\u7dda\u3001Gemini CLI\uff0c\u4ee5\u53ca\u7d93 OpenRouter \u63a5\u5165\u5605 Hermes\u3001OpenClaw\u3001DeepAgents \u7b49\uff1b\u81f3\u65bc\u5177\u9ad4\u8868\u73fe\u6703\u96a8\u4ee3\u7406\u5305\u88dd\u65b9\u5f0f\u540c\u5de5\u5177\u4f7f\u7528\u80fd\u529b\u800c\u6709\u660e\u986f\u5dee\u7570\u3002<\/p>\n<p>\u5c0d AI \u4ee3\u7406\u7814\u7a76\u54e1\u3001\u91ab\u7642\u6d41\u7a0b\u81ea\u52d5\u5316\u5718\u968a\uff0c\u751a\u81f3\u60f3\u4e86\u89e3\u300c\u6a21\u578b\u8b58\u5514\u8b58\u771f\u505a\u4e8b\u300d\u5605\u7522\u54c1\u4eba\u54e1\u569f\u8b1b\uff0cchi-bench \u90fd\u5e7e\u6709\u53c3\u8003\u50f9\u503c\u3002\u4e0d\u904e\u5b83\u805a\u7126\u7f8e\u570b\u91ab\u7642\u5236\u5ea6\u540c\u53d7\u898f\u7ba1\u6d41\u7a0b\uff0c\u95b1\u8b80\u7d50\u679c\u6642\u8981\u7559\u610f\u5834\u666f\u9650\u5236\uff0c\u5514\u9069\u5b9c\u76f4\u63a5\u7576\u6210\u6240\u6709\u884c\u696d\u5605\u901a\u7528\u7d50\u8ad6\u3002<\/p>\n<p><strong>GitHub\uff1a<\/strong> <a href=\"https:\/\/github.com\/actava-ai\/chi-bench\" rel=\"noopener noreferrer\">https:\/\/github.com\/actava-ai\/chi-bench<\/a><\/p>\n<p><strong>Paper\uff1a<\/strong> <a href=\"https:\/\/arxiv.org\/pdf\/2605.16679\" rel=\"noopener noreferrer\">https:\/\/arxiv.org\/pdf\/2605.16679<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>\u9019\u500b\u5c08\u6848\u4e0d\u662f\u804a\u5929\u6e2c\u9a57\uff0c\u800c\u662f\u53eb AI \u771f\u6b63\u8655\u7406\u91ab\u7642\u6d41\u7a0b\u3002\u5b83\u7279\u5225\u9069\u5408\u60f3\u6bd4\u8f03\u4ee3\u7406\u6a21\u578b\u5be6\u6230\u80fd\u529b\u7684\u4eba\u3002<\/p>\n","protected":false},"author":8,"featured_media":8404,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[133,144,197],"tags":[],"class_list":["post-8405","post","type-post","status-publish","format-standard","hentry","category-133","category-medical","category-framework"],"_links":{"self":[{"href":"https:\/\/infernews.com\/blog\/wp-json\/wp\/v2\/posts\/8405","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/infernews.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/infernews.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/infernews.com\/blog\/wp-json\/wp\/v2\/users\/8"}],"replies":[{"embeddable":true,"href":"https:\/\/infernews.com\/blog\/wp-json\/wp\/v2\/comments?post=8405"}],"version-history":[{"count":0,"href":"https:\/\/infernews.com\/blog\/wp-json\/wp\/v2\/posts\/8405\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/infernews.com\/blog\/wp-json\/wp\/v2\/media\/8404"}],"wp:attachment":[{"href":"https:\/\/infernews.com\/blog\/wp-json\/wp\/v2\/media?parent=8405"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/infernews.com\/blog\/wp-json\/wp\/v2\/categories?post=8405"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/infernews.com\/blog\/wp-json\/wp\/v2\/tags?post=8405"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}