{"id":8583,"date":"2026-05-29T23:20:18","date_gmt":"2026-05-29T15:20:18","guid":{"rendered":"https:\/\/infernews.com\/blog\/skill0-5\/"},"modified":"2026-05-29T23:20:52","modified_gmt":"2026-05-29T15:20:52","slug":"skill0-5","status":"publish","type":"post","link":"https:\/\/infernews.com\/blog\/skill0-5\/","title":{"rendered":"Skill0.5 \u5982\u4f55\u63d0\u5347\u5f37\u5316\u5b78\u7fd2\u6cdb\u5316\u529b"},"content":{"rendered":"<figure class=\"wp-block-image size-large\"><img decoding=\"async\" src=\"https:\/\/infernews.com\/blog\/wp-content\/uploads\/2026\/05\/pasted-8a8a5351f435.jpg\" alt=\"Og image\" \/><\/figure>\n<p>Skill0.5 \u662f\u4e00\u500b\u9762\u5411 Agentic Reinforcement Learning \u7684\u7814\u7a76\u9805\u76ee\uff0c\u805a\u7126\u8655\u7406 out-of-distribution generalization \u554f\u984c\u3002\u5b83\u6307\u51fa\u50b3\u7d71 skill-based RL \u65b9\u6cd5\u5e38\u8981\u5728 full externalization \u8207 full internalization \u4e4b\u9593\u4e8c\u9078\u4e00\uff0c\u524d\u8005\u6703\u5e36\u4f86\u9ad8\u6602\u7684 context \u958b\u92b7\uff0c\u5f8c\u8005\u5247\u5bb9\u6613\u51fa\u73fe overfitting \u8207\u77e5\u8b58\u885d\u7a81\u3002<\/p>\n<p>\u9019\u9805\u76ee\u628a general skill internalization \u8207 task-specific skill utilization \u4e00\u540c\u7d0d\u5165\u8a13\u7df4\uff0c\u4f46\u7528\u4e0d\u540c\u7b56\u7565\u8655\u7406\u5169\u7a2e\u6027\u8cea\u4e0d\u540c\u7684\u6280\u80fd\u3002\u7cfb\u7d71\u6703\u7528 difficulty-aware router \u6309\u4efb\u52d9\u96e3\u5ea6\u5206\u6d41\uff1aHard tasks \u7528 privileged distillation \u5167\u5316\u901a\u7528\u6280\u80fd\uff0cMedium tasks \u7528\u6a19\u6e96 RL \u63d0\u5347\u6210\u529f\u7387\uff0cEasy tasks \u5247\u900f\u904e diagnostic probing \u61f2\u7f70\u8d70\u6377\u5f91\u7684\u884c\u70ba\uff0c\u8feb\u4f7f\u6a21\u578b\u5fe0\u5be6\u904b\u7528\u4efb\u52d9\u76f8\u95dc\u6280\u80fd\u3002<\/p>\n<p>\u5c0d\u521d\u6b65\u7406\u89e3\u9019\u500b\u9805\u76ee\u7684\u4eba\u4f86\u8aaa\uff0c\u53ef\u5148\u628a\u5b83\u8996\u70ba\u4e00\u7a2e\u300c\u6309\u96e3\u5ea6\u5206\u5de5\u300d\u7684\u8a13\u7df4\u6846\u67b6\uff0c\u800c\u4e0d\u662f\u55ae\u4e00\u6a21\u578b\u7d50\u69cb\u3002\u4f7f\u7528\u6642\u8981\u7559\u610f context \u958b\u92b7\u88ab\u8996\u70ba\u554f\u984c\u4e4b\u4e00\uff0c\u67d0\u7a0b\u5ea6\u4e0a\u4e5f\u53cd\u6620\u8f03\u91cd\u7684\u5916\u90e8\u6280\u80fd\u4f9d\u8cf4\u53ef\u80fd\u589e\u52a0\u8cc7\u6e90\u58d3\u529b\uff0c\u5305\u62ec VRAM \u8207\u5e8f\u5217\u8655\u7406\u6210\u672c\u3002<\/p>\n<ul>\n<li>\u89e3\u6c7a rigid choice \u554f\u984c\uff0c\u907f\u514d\u53ea\u9760 externalization \u6216 internalization<\/li>\n<li>\u7528 difficulty-aware router \u628a\u4efb\u52d9\u5206\u6210 Hard\u3001Medium\u3001Easy \u4e09\u5c64<\/li>\n<li>\u5206\u5225\u7d50\u5408 privileged distillation\u3001\u6a19\u6e96 RL \u8207 diagnostic probing<\/li>\n<li>\u5728 ALFWorld \u8207 WebShop \u4e2d\uff0c\u64da\u6458\u8981\u6240\u8ff0\u512a\u65bc memory-based \u8207 skill-based RL baselines<\/li>\n<\/ul>\n<p>\u9019\u985e\u9805\u76ee\u8f03\u9069\u5408\u7814\u7a76\u667a\u80fd\u4ee3\u7406\u3001\u4efb\u52d9\u898f\u5283\u8207\u6cdb\u5316\u80fd\u529b\u7684\u4eba\u53c3\u8003\uff0c\u5c24\u5176\u662f\u60f3\u6539\u5584\u6a21\u578b\u5728\u964c\u751f\u60c5\u5883\u4e0b\u7a69\u5b9a\u6027\u7684\u5718\u968a\u3002<\/p>\n<p>\u8a13\u7df4\u548c\u5be6\u73fe\u6642\u4f7f\u7528 Qwen2.5-7B-Instruct \u4f5c\u70ba\u57fa\u790e\u6a21\u578b\u3002\u7b56\u7565\u6700\u4f73\u5316\u65b9\u9762\u63a1\u7528 GRPO \u4f5c\u70ba\u9aa8\u5e79\u7db2\u7d61\uff0c\u7d44\u5225\u5927\u5c0f G = 8\uff0c\u5b78\u7fd2\u7387\u70ba 1 \u00d7 10\u207b\u2076\u3002\u8a13\u7df4\u5728 4 \u500b H800 GPU \u4e0a\u9032\u884c\uff0c\u6bcf\u6b21\u8fed\u4ee3\u7684\u6279\u6b21\u5927\u5c0f\u70ba 16 \u500b\u4efb\u52d9\uff0c\u6700\u5927\u4e92\u52d5\u7bc4\u570d\u8a2d\u5b9a\u70ba 30 \u6b65\u3002\u4efb\u52d9\u7279\u5b9a\u6280\u80fd\u900f\u904e Qwen3-Embedding-0.6B \u53d6\u5f97\u3002<\/p>\n<p><strong>GitHub\uff1a<\/strong> <a href=\"https:\/\/github.com\/JasonZhujp\/Skill0_5\" rel=\"noopener noreferrer\">https:\/\/github.com\/JasonZhujp\/Skill0_5<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>\u9019\u500b\u9805\u76ee\u5617\u8a66\u5e73\u8861\u6280\u80fd\u8a18\u61b6\u8207\u5373\u5834\u8abf\u7528\uff0c\u6539\u5584\u65b0\u60c5\u5883\u4e0b\u7684\u8868\u73fe\u3002\u91cd\u9ede\u5728\u65bc\u5206\u6d41\u96e3\u5ea6\u8207\u5206\u958b\u512a\u5316\u5169\u985e\u6280\u80fd\u3002<\/p>\n","protected":false},"author":8,"featured_media":8582,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"ai_generated_summary":"","footnotes":""},"categories":[133,119,157,120],"tags":[],"class_list":["post-8583","post","type-post","status-publish","format-standard","hentry","category-133","category-119","category-157","category-120"],"_links":{"self":[{"href":"https:\/\/infernews.com\/blog\/wp-json\/wp\/v2\/posts\/8583","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/infernews.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/infernews.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/infernews.com\/blog\/wp-json\/wp\/v2\/users\/8"}],"replies":[{"embeddable":true,"href":"https:\/\/infernews.com\/blog\/wp-json\/wp\/v2\/comments?post=8583"}],"version-history":[{"count":0,"href":"https:\/\/infernews.com\/blog\/wp-json\/wp\/v2\/posts\/8583\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/infernews.com\/blog\/wp-json\/wp\/v2\/media\/8582"}],"wp:attachment":[{"href":"https:\/\/infernews.com\/blog\/wp-json\/wp\/v2\/media?parent=8583"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/infernews.com\/blog\/wp-json\/wp\/v2\/categories?post=8583"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/infernews.com\/blog\/wp-json\/wp\/v2\/tags?post=8583"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}