
{"id":8581,"date":"2026-05-29T23:05:00","date_gmt":"2026-05-29T15:05:00","guid":{"rendered":"https:\/\/infernews.com\/blog\/lara-rl\/"},"modified":"2026-05-29T23:05:00","modified_gmt":"2026-05-29T15:05:00","slug":"lara-rl","status":"publish","type":"post","link":"https:\/\/infernews.com\/blog\/lara-rl\/","title":{"rendered":"LaRA \u7528\u5c64\u7d1a\u8868\u793a\u627e\u51fa RL \u8a13\u7df4\u6c61\u67d3"},"content":{"rendered":"<figure class=\"wp-block-image size-large\"><img decoding=\"async\" src=\"https:\/\/infernews.com\/blog\/wp-content\/uploads\/2026\/05\/pasted-d9607d4feb84.jpg\" alt=\"Hero image preview\"><\/figure>\n<p>\u9019\u7bc7\u8ad6\u6587\u4ecb\u7d39 LaRA\uff08Layer-wise Representation Analysis\uff09\uff0c\u76ee\u7684\u662f\u627e\u51fa Reinforcement learning\uff08RL\uff09post-training \u968e\u6bb5\u7684\u8cc7\u6599\u6c61\u67d3\u554f\u984c\u3002\u6240\u8b02\u6c61\u67d3\uff0c\u662f\u6307\u8a55\u4f30\u984c\u76ee\u6216\u57fa\u6e96\u8cc7\u6599\u6df7\u5165\u8a13\u7df4\u8cc7\u6599\uff0c\u4ee4 Large Language Models\uff08LLMs\uff09\u770b\u4f3c\u8868\u73fe\u5f88\u597d\uff0c\u4f46\u5176\u5be6\u53ef\u80fd\u53ea\u662f\u8a18\u4f4f\u7b54\u6848\uff0c\u5f71\u97ff\u6cdb\u5316\u80fd\u529b\u8207\u8a55\u4f30\u53ef\u4fe1\u5ea6\u3002<\/p>\n<p>\u4f5c\u8005\u6307\u51fa\uff0c\u73fe\u6709\u65b9\u6cd5\u591a\u6578\u53ea\u770b\u8f38\u51fa\u5c64\u9762\u7684\u8a0a\u865f\uff0c\u4f8b\u5982 likelihood\u3001entropy \u6216\u751f\u6210\u884c\u70ba\u5dee\u7570\uff0c\u4f46\u9019\u985e\u65b9\u6cd5\u5c0d RL \u8a13\u7df4\u5f8c\u7684\u6a21\u578b\u672a\u5fc5\u7a69\u5b9a\u3002\u539f\u56e0\u662f RL \u91cd\u9ede\u5728\u6574\u689d reasoning trajectory \u7684 reward\uff0c\u800c\u4e0d\u662f\u9010\u500b token \u7684\u6a5f\u7387\uff0c\u56e0\u6b64\u53ea\u9760\u8f38\u51fa\u5206\u4f48\uff0c\u5bb9\u6613\u53d7 miscalibration \u5f71\u97ff\uff0c\u672a\u5fc5\u80fd\u6e96\u78ba\u53cd\u6620\u6a21\u578b\u662f\u5426\u8a18\u4f4f\u4e86\u8a55\u6e2c\u8cc7\u6599\u3002<\/p>\n<p>LaRA \u6539\u70ba\u5206\u6790\u6a21\u578b\u5404\u5c64\u7684\u5167\u90e8\u8868\u793a\uff0c\u89c0\u5bdf\u53d7\u63a7\u64fe\u52d5\u524d\u5f8c\u7684\u5e7e\u4f55\u8b8a\u5316\u3002\u8ad6\u6587\u63d0\u51fa\u4e09\u500b\u4e92\u88dc\u6307\u6a19\uff1aperturbation sensitivity\u3001directional collapse\u3001local representation rigidity\uff0c\u7528\u4f86\u91cf\u5ea6\u6c61\u67d3\u6a23\u672c\u5728\u4e0d\u540c layer \u7684\u7570\u5e38\u53cd\u61c9\uff1b\u4f5c\u8005\u767c\u73fe\uff0c\u53d7\u6c61\u67d3\u8cc7\u6599\u6703\u5728\u591a\u5c64\u8868\u793a\u4e2d\u9010\u6b65\u51fa\u73fe\u66f4\u9ad8\u654f\u611f\u5ea6\u3001\u66f4\u5f37\u65b9\u5411\u6536\u7e2e\uff0c\u4ee5\u53ca\u66f4\u9ad8\u5c40\u90e8\u525b\u6027\u3002<\/p>\n<p>\u4f7f\u7528\u9019\u500b\u9805\u76ee\u6642\uff0c\u91cd\u9ede\u4e0d\u662f\u589e\u52a0\u63a8\u7406\u901f\u5ea6\uff0c\u800c\u662f\u4f5c\u70ba\u6aa2\u6e2c\u6d41\u7a0b\uff0c\u5354\u52a9\u7814\u7a76\u4eba\u54e1\u5be9\u8996 RL \u8a13\u7df4\u5f8c\u6a21\u578b\u7684\u53ef\u4fe1\u5ea6\u3002\u6587\u4e2d\u4e5f\u63d0\u51fa\u4e00\u5957\u5075\u6e2c protocol\uff0c\u628a\u4e0d\u540c layer \u8207\u4e0d\u540c\u6307\u6a19\u7684\u504f\u5dee\u6574\u5408\u8d77\u4f86\uff1b\u5728 RL-trained reasoning models \u7684\u5be6\u9a57\u4e2d\uff0c\u9019\u5957\u65b9\u6cd5\u8868\u73fe\u512a\u65bc\u73fe\u6709 output-level baseline\u3002<\/p>\n<ul>\n<li>\u89e3\u6c7a RL post-training \u8cc7\u6599\u6c61\u67d3\u96e3\u4ee5\u8fa8\u8b58\u7684\u554f\u984c<\/li>\n<li>\u4ee5 representation-level \u8a0a\u865f\u53d6\u4ee3\u55ae\u770b\u8f38\u51fa\u6a5f\u7387<\/li>\n<li>\u7d50\u5408\u4e09\u500b\u6307\u6a19\uff0c\u5f9e\u591a\u5c64 layer \u5206\u6790\u6c61\u67d3\u75d5\u8de1<\/li>\n<li>\u9069\u5408\u7528\u65bc reasoning \u6a21\u578b\u8a55\u4f30\u3001\u8a13\u7df4\u5be9\u6838\u8207\u7814\u7a76\u6bd4\u8f03<\/li>\n<li>\u8ad6\u6587\u6458\u8981\u672a\u63d0\u4f9b VRAM \u9700\u6c42\uff0c\u8f03\u53ef\u80fd\u53d7\u6a21\u578b\u5927\u5c0f\u3001\u62bd\u53d6 layer \u6578\u76ee\u8207\u6279\u6b21\u5206\u6790\u8a2d\u5b9a\u5f71\u97ff<\/li>\n<\/ul>\n<p>\u5982\u679c\u4f60\u95dc\u5fc3 VRAM \u7684\u61c9\u7528\uff0c\u9019\u7bc7\u5167\u5bb9\u6c92\u6709\u5217\u51fa\u660e\u78ba\u986f\u793a\u5361\u8a18\u61b6\u9ad4\u9700\u6c42\uff0c\u4e5f\u6c92\u6709\u63d0\u4f9b\u90e8\u7f72\u898f\u683c\u3002\u4e0d\u904e\u6309\u65b9\u6cd5\u6027\u8cea\u63a8\u6e2c\uff0cLaRA \u9700\u8981\u8b80\u53d6\u591a\u500b layer \u7684 hidden representations\uff0c\u4f7f\u7528\u6642 VRAM \u4e3b\u8981\u6703\u82b1\u5728\u6a21\u578b\u8f09\u5165\u3001\u5132\u5b58\u4e2d\u9593\u5c64\u8868\u793a\uff0c\u4ee5\u53ca\u5c0d\u591a\u500b\u64fe\u52d5\u7248\u672c\u505a\u6279\u6b21\u5206\u6790\uff1b\u6a21\u578b\u8d8a\u5927\u3001\u5206\u6790\u5c64\u6578\u8d8a\u591a\uff0cVRAM \u9700\u6c42\u901a\u5e38\u8d8a\u9ad8\u3002<\/p>\n<p><strong>Paper\uff1a<\/strong> <a href=\"https:\/\/arxiv.org\/pdf\/2605.29888\" rel=\"noopener noreferrer\">https:\/\/arxiv.org\/pdf\/2605.29888<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>\u9019\u7bc7\u8ad6\u6587\u63d0\u51fa LaRA\uff0c\u4ee5\u6a21\u578b\u5167\u90e8\u8868\u793a\u5075\u6e2c RL \u5f8c\u8a13\u7df4\u7684\u8cc7\u6599\u6c61\u67d3\uff0c\u6bd4\u53ea\u770b\u8f38\u51fa\u66f4\u53ef\u9760\u3002<\/p>\n","protected":false},"author":8,"featured_media":8580,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[133,119,157,120],"tags":[],"class_list":["post-8581","post","type-post","status-publish","format-standard","hentry","category-133","category-119","category-157","category-120"],"_links":{"self":[{"href":"https:\/\/infernews.com\/blog\/wp-json\/wp\/v2\/posts\/8581","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/infernews.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/infernews.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/infernews.com\/blog\/wp-json\/wp\/v2\/users\/8"}],"replies":[{"embeddable":true,"href":"https:\/\/infernews.com\/blog\/wp-json\/wp\/v2\/comments?post=8581"}],"version-history":[{"count":0,"href":"https:\/\/infernews.com\/blog\/wp-json\/wp\/v2\/posts\/8581\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/infernews.com\/blog\/wp-json\/wp\/v2\/media\/8580"}],"wp:attachment":[{"href":"https:\/\/infernews.com\/blog\/wp-json\/wp\/v2\/media?parent=8581"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/infernews.com\/blog\/wp-json\/wp\/v2\/categories?post=8581"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/infernews.com\/blog\/wp-json\/wp\/v2\/tags?post=8581"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}