
{"id":9846,"date":"2026-07-03T22:19:05","date_gmt":"2026-07-03T14:19:05","guid":{"rendered":"https:\/\/infernews.com\/blog\/training-and-evaluation-code-for-paper-quot-learning-to-move-before-learning-to\/"},"modified":"2026-07-03T22:19:05","modified_gmt":"2026-07-03T14:19:05","slug":"training-and-evaluation-code-for-paper-quot-learning-to-move-before-learning-to","status":"publish","type":"post","link":"https:\/\/infernews.com\/blog\/training-and-evaluation-code-for-paper-quot-learning-to-move-before-learning-to\/","title":{"rendered":"TAP\uff1a\u5148\u5b78\u52d5\u4f5c\uff0c\u518d\u5b78\u6307\u4ee4\u7684 VLA \u8def\u7dda"},"content":{"rendered":"\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" src=\"https:\/\/infernews.com\/blog\/wp-content\/uploads\/2026\/07\/pasted-9cdaf75813c0.jpg\" alt=\"TAP Framework Overview\"><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">TAP(Task-Agnostic-Pretrain) \u662f\u4e00\u500b Vision-Language-Action\uff08VLA\uff09\u6a21\u578b\u8a13\u7df4\u6846\u67b6\uff0c\u5c6c\u65bc\u7814\u7a76\u539f\u578b\u517c\u8a13\u7df4\u65b9\u6cd5\u3002\u5b83\u8981\u8655\u7406\u7684\u6838\u5fc3\u554f\u984c\uff0c\u662f VLA \u9577\u671f\u4f9d\u8cf4\u5927\u91cf expert demonstrations\uff0c\u5c0e\u81f4\u6a5f\u68b0\u64cd\u4f5c\u80fd\u529b\u96e3\u4ee5\u7528\u8f03\u4f4e\u6210\u672c\u64f4\u5c55\u3002<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">\u73fe\u6709\u505a\u6cd5\u591a\u6578\u76f4\u63a5\u628a\u300chow to move\u300d\u8207\u300cwhat to do\u300d\u4e00\u9f4a\u5b78\uff0c\u901a\u5e38\u9700\u8981 observation\u3001instruction\u3001action \u9019\u985e\u5b8c\u6574\u793a\u7bc4\u8cc7\u6599\uff1b\u4f5c\u8005\u8a8d\u70ba\u9019\u7a2e\u56fa\u5b9a\u7bc4\u5f0f\u6df7\u6dc6\u4e86 physical competence \u8207 semantic alignment \u5169\u500b\u76ee\u6a19\uff0c\u7d50\u679c\u662f\u8a9e\u8a00\u6a19\u8a3b\u88ab\u904e\u5ea6\u7528\u55ba\u672c\u4f86\u53ef\u4ee5\u81ea\u6211\u76e3\u7763\u5b78\u7fd2\u7684\u52d5\u4f5c\u80fd\u529b\u4e0a\u3002Task-Agnostic Pretraining\uff08TAP\uff09\u56e0\u6b64\u6539\u6210\u5169\u968e\u6bb5\uff1a\u5148\u7528\u7121\u6a19\u8a3b\u4e92\u52d5\u8cc7\u6599\u900f\u904e self-supervised Inverse Dynamics \u5b78 transferable motor priors\uff0c\u518d\u7528\u5c11\u91cf expert demonstrations \u505a task-specific alignment\u3002<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">\u9019\u7a2e\u53d6\u5411\u540c\u6a19\u6e96 behavior cloning\u3001\u4ee5\u5927\u91cf\u7db2\u8def\u6216\u5c08\u5bb6\u8ecc\u8de1\u5806\u51fa\u4f86\u7684 VLA \u8def\u7dda\u5514\u540c\u3002TAP \u7684\u53d6\u6368\u5f88\u660e\u78ba\uff1a\u5b83\u672a\u5fc5\u8ffd\u6c42\u4e00\u6b21\u904e\u628a\u8a9e\u7fa9\u548c\u52d5\u4f5c\u5168\u5b78\u9f4a\uff0c\u800c\u662f\u5148\u628a\u53ef\u9077\u79fb\u7684\u300c\u9ede\u6a23\u90c1\u300d\u62c6\u51fa\u4f86\uff0c\u63db\u4f86\u66f4\u4f4e\u6a19\u8a3b\u6210\u672c\uff0c\u540c\u6642\u63d0\u9ad8\u5c0d\u80cc\u666f\u3001\u8996\u89d2\u8b8a\u5316\u7684\u7a69\u5b9a\u5ea6\uff1b\u4ee3\u50f9\u662f\u6574\u500b\u65b9\u6cd5\u4ecd\u7136\u8981\u9760\u7b2c\u4e8c\u968e\u6bb5\u793a\u7bc4\u53bb\u628a\u8a9e\u8a00\u6307\u4ee4\u5c0d\u9f4a\u5230\u5177\u9ad4\u4efb\u52d9\u3002<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">\u9805\u76ee\u5df2\u7d93\u4ea4\u4ee3\u4e86\u6e2c\u8a66\u65b9\u5f0f\uff1a\u9019\u4e0d\u662f\u5373\u88dd\u5373\u7528\u61c9\u7528\u7a0b\u5f0f\uff0c\u800c\u662f\u8981\u8ddf\u4f4f\u8ad6\u6587\u8a2d\u5b9a\uff0c\u8f09\u5165 HuggingFace \u63d0\u4f9b\u7684\u6a21\u578b\uff0c\u91cd\u73fe\u5169\u968e\u6bb5\u8a13\u7df4\uff0c\u518d\u7528 SIMPLER benchmark \u8207\u771f\u5be6 WidowX-250s \u5834\u666f\u9a57\u8b49\u3002\u6578\u5b57\u4e0a\uff0cTAP-20k \u5728 SIMPLER \u7684 Avg-All \u70ba 33.32%\uff0c\u9ad8\u904e Standard BC \u7684 23.15%\uff1b\u771f\u5be6\u74b0\u5883\u4e2d\u53ea\u7528 200 \u500b expert demos\uff0c\u9762\u5c0d background texture shift \u4ecd\u6709 45% success\uff0cviewpoint variation \u4ea6\u6709 20%\uff0c\u800c\u90e8\u5206 baseline \u6703\u8dcc\u5230 0%\u3002<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>\u7528 self-supervised Inverse Dynamics \u5148\u5b78\u52d5\u4f5c\u5148\u9a57\uff0c\u6e1b\u5c11\u5c0d\u8a9e\u8a00\u6a19\u8a3b\u4f9d\u8cf4<\/li>\n<li>\u4ee5\u7d04 30 \u5c0f\u6642 autonomous play \u52a0\u5c11\u91cf expert demonstrations\uff0c\u5c0d\u6bd4 1M+ expert trajectories \u8def\u7dda\u66f4\u6173\u8cc7\u6599<\/li>\n<li>\u5728 SIMPLER benchmark \u52dd\u904e Standard BC\uff0c\u63a5\u8fd1\u6216\u8d85\u904e\u90e8\u5206\u73fe\u6709 VLA \u6a21\u578b<\/li>\n<li>\u5c0d visual distractors\u3001background texture shift\u3001viewpoint variation \u7684\u6297\u5e72\u64fe\u80fd\u529b\u8f03\u5f37<\/li>\n<li>\u76f8\u95dc\u6a21\u578b\u5305\u62ec RT-1-X\u3001OpenVLA\u3001Nora\u3001Octo\uff0c\u4ee5\u53ca README \u63d0\u5230\u7684 TAP-20k<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">\u9805\u76ee\u8f03\u9069\u5408\u505a Embodied AI\u3001robot learning\u3001VLA \u8a13\u7df4\u6d41\u7a0b\u7814\u7a76\u7684\u5718\u968a\u53c3\u8003\uff0c\u5c24\u5176\u4fc2\u60f3\u7528\u5b78\u8853\u898f\u6a21\u7b97\u529b\u9a57\u8b49\u65b0\u8a13\u7df4\u8def\u7dda\u7684\u4eba\u3002\u5b83\u73fe\u968e\u6bb5\u66f4\u50cf\u4e00\u5957\u503c\u5f97\u8ddf\u9032\u7684\u65b9\u6cd5\u8ad6\uff0c\u800c\u5514\u4fc2\u9762\u5411\u4e00\u822c\u7528\u6236\u7684\u5b8c\u6210\u54c1\u5de5\u5177\u3002<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><a href=\"https:\/\/sjh0354.github.io\/task_agnostic_pretrain\/\" rel=\"noopener noreferrer\" target=\"_blank\"><strong>\u9805\u76ee\u4e3b\u9801<\/strong><\/a> \u00b7 <a href=\"https:\/\/github.com\/sjh0354\/Task-Agnostic-Pretrain\" rel=\"noopener noreferrer\" target=\"_blank\"><strong>GitHub<\/strong><\/a> \u00b7 <a href=\"https:\/\/arxiv.org\/pdf\/2607.02466\" rel=\"noopener noreferrer\" target=\"_blank\"><strong>Paper<\/strong><\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>\u9019\u662f\u4e00\u500b\u91dd\u5c0d Vision-Language-Action \u6a21\u578b\u8a13\u7df4\u74f6\u9838\u63d0\u51fa\u7684\u65b0\u6846\u67b6\u3002\u5b83\u7528\u8f03\u5c11\u5c08\u5bb6\u793a\u7bc4\uff0c\u63db\u53d6\u63a5\u8fd1\u5927\u578b\u8cc7\u6599\u8a13\u7df4\u7684\u64cd\u4f5c\u80fd\u529b\u3002<\/p>\n","protected":false},"author":8,"featured_media":9845,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"ai_generated_summary":"","footnotes":""},"categories":[133,114,119,76,27,127,149,184,199,204],"tags":[],"class_list":["post-9846","post","type-post","status-publish","format-standard","hentry","category-133","category-clone","category-119","category-76","category-paper","category-127","category-149","category-robotic","category-dataset-","category-visionlanguageaction"],"_links":{"self":[{"href":"https:\/\/infernews.com\/blog\/wp-json\/wp\/v2\/posts\/9846","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/infernews.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/infernews.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/infernews.com\/blog\/wp-json\/wp\/v2\/users\/8"}],"replies":[{"embeddable":true,"href":"https:\/\/infernews.com\/blog\/wp-json\/wp\/v2\/comments?post=9846"}],"version-history":[{"count":0,"href":"https:\/\/infernews.com\/blog\/wp-json\/wp\/v2\/posts\/9846\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/infernews.com\/blog\/wp-json\/wp\/v2\/media\/9845"}],"wp:attachment":[{"href":"https:\/\/infernews.com\/blog\/wp-json\/wp\/v2\/media?parent=9846"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/infernews.com\/blog\/wp-json\/wp\/v2\/categories?post=9846"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/infernews.com\/blog\/wp-json\/wp\/v2\/tags?post=9846"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}