
{"id":9456,"date":"2026-06-21T05:36:17","date_gmt":"2026-06-20T21:36:17","guid":{"rendered":"https:\/\/infernews.com\/blog\/spatial-tool-use-elicits-reasoning-for-spatial-intelligence\/"},"modified":"2026-06-21T05:39:53","modified_gmt":"2026-06-20T21:39:53","slug":"spatial-tool-use-elicits-reasoning-for-spatial-intelligence","status":"publish","type":"post","link":"https:\/\/infernews.com\/blog\/spatial-tool-use-elicits-reasoning-for-spatial-intelligence\/","title":{"rendered":"S-Agent \u628a\u8996\u89ba\u63a8\u7406\u5e36\u5165 3D \u5834\u666f\u8a18\u61b6"},"content":{"rendered":"\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" src=\"https:\/\/infernews.com\/blog\/wp-content\/uploads\/2026\/06\/pasted-0c2beeae9bb6.jpg\" alt=\"Watch the S-Agent demo video on YouTube\"\/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">\u73fe\u6642\u4e0d\u5c11 Vision-Language Model \u90fd\u504f\u5411\u7528\u55ae\u5f35\u5716\u7247\u3001\u55ae\u6b65\u56de\u7b54\u53bb\u505a\u7a7a\u9593\u5224\u65b7\uff1b\u5c31\u7b97\u52a0\u5165 agent\uff0c\u4e5f\u5e38\u898b\u70ba stateless inference\uff0c\u7f3a\u5c11\u6301\u7e8c\u8a18\u9304\u5834\u666f\u8b8a\u5316\u7684\u80fd\u529b\u3002S-Agent \u63d0\u51fa\u7684\u505a\u6cd5\uff0c\u662f\u628a\u7a7a\u9593\u63a8\u7406\u6539\u5beb\u6210 spatio-temporal evidence accumulation\uff1a\u4e0d\u662f\u5373\u6642\u731c\u7b54\u6848\uff0c\u800c\u662f\u9010\u6b65\u6536\u96c6 2D\u30013D \u548c\u6642\u9593\u5e8f\u5217\u8b49\u64da\u3002<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">\u9019\u662f\u4e00\u500b\u504f\u5411 <strong>Agentic \u8996\u89ba\u63a8\u7406\u6846\u67b6<\/strong> \u7684\u7814\u7a76\u9805\u76ee\uff0c\u76ee\u6a19\u662f\u89e3\u6c7a\u591a\u8996\u89d2\u5716\u7247\u8207\u5f71\u7247\u4e2d\u7684 3D \u7a7a\u9593\u7406\u89e3\u554f\u984c\u3002\u5b83\u628a Vision-Language Model \u7576\u6210 semantic planner\uff0c\u518d\u914d\u5408 hierarchical spatial tools\u3001Scene Memory \u8207 Agent Memory\uff0c\u8655\u7406 counting\u3001measurement\u3001orientation\u3001relative position \u9019\u985e\u55ae\u5e40\u65b9\u6cd5\u8f03\u6613\u51fa\u932f\u7684\u4efb\u52d9\u3002<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">\u540c\u985e\u505a\u6cd5\u591a\u6578\u505c\u7559\u5728 frame-level prediction\uff0cS-Agent \u7684\u53d6\u5411\u660e\u986f\u4e0d\u540c\uff1a\u5148 grounding \u7269\u4ef6\uff0c\u518d\u505a 2D-to-3D lifting\uff0c\u4e4b\u5f8c\u628a\u5e7e\u4f55\u7dda\u7d22\u6574\u5408\u6210\u53ef\u63a8\u7406\u7684 scene-centric understanding\u3002\u9019\u7a2e\u8a2d\u8a08\u7684\u4ee3\u50f9\uff0c\u662f\u7cfb\u7d71\u6bd4\u55ae\u6b21\u554f\u7b54\u8907\u96dc\uff0c\u4ea6\u66f4\u4f9d\u8cf4\u5de5\u5177\u93c8\u3001\u8a18\u61b6\u72c0\u614b\u8207\u591a\u6b65\u63a8\u7406\u6d41\u7a0b\uff0c\u4e0d\u7b97\u662f\u8f15\u91cf\u578b\u9805\u76ee\u3002<\/p>\n\n\n<figure class=\"wp-block-embed-youtube wp-block-embed is-type-video is-provider-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio\"><div class=\"lyte-wrapper\" title=\"S-Agent: Spatial tool-use elicits reasoning for spatial intelligence.sagent demo video\" style=\"width:853px;max-width:100%;margin:5px auto;\"><div class=\"lyMe\" id=\"WYL_f89D0ZCJWKo\" itemprop=\"video\" itemscope itemtype=\"https:\/\/schema.org\/VideoObject\"><div><meta itemprop=\"thumbnailUrl\" content=\"https:\/\/infernews.com\/blog\/wp-content\/plugins\/wp-youtube-lyte\/lyteCache.php?origThumbUrl=https%3A%2F%2Fi.ytimg.com%2Fvi%2Ff89D0ZCJWKo%2Fhqdefault.jpg\" \/><meta itemprop=\"embedURL\" content=\"https:\/\/www.youtube.com\/embed\/f89D0ZCJWKo\" \/><meta itemprop=\"duration\" content=\"PT56S\" \/><meta itemprop=\"uploadDate\" content=\"2026-06-18T18:56:26Z\" \/><\/div><div id=\"lyte_f89D0ZCJWKo\" data-src=\"https:\/\/infernews.com\/blog\/wp-content\/plugins\/wp-youtube-lyte\/lyteCache.php?origThumbUrl=https%3A%2F%2Fi.ytimg.com%2Fvi%2Ff89D0ZCJWKo%2Fhqdefault.jpg\" class=\"pL\"><div class=\"tC\"><div class=\"tT\" itemprop=\"name\">S-Agent: Spatial tool-use elicits reasoning for spatial intelligence.sagent demo video<\/div><\/div><div class=\"play\"><\/div><div class=\"ctrl\"><div class=\"Lctrl\"><\/div><div class=\"Rctrl\"><\/div><\/div><\/div><noscript><a href=\"https:\/\/youtu.be\/f89D0ZCJWKo\" rel=\"nofollow noopener\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/infernews.com\/blog\/wp-content\/plugins\/wp-youtube-lyte\/lyteCache.php?origThumbUrl=https%3A%2F%2Fi.ytimg.com%2Fvi%2Ff89D0ZCJWKo%2F0.jpg\" alt=\"S-Agent: Spatial tool-use elicits reasoning for spatial intelligence.sagent demo video\" width=\"853\" height=\"460\" \/><br \/>Watch this video on YouTube<\/a><\/noscript><meta itemprop=\"description\" content=\"Arxiv&#039;26\"><\/div><\/div><div class=\"lL\" style=\"max-width:100%;width:853px;margin:5px auto;\"><\/div><figcaption><\/figcaption><\/figure>\n\n\n<p class=\"wp-block-paragraph\">\u76ee\u524d GitHub \u63d0\u4f9b\u7684\u662f\u8ad6\u6587\u8207\u793a\u7bc4\u8cc7\u8a0a\uff0ccode\u3001data\u3001checkpoint \u4ecd\u6a19\u793a coming soon\uff0c\u6240\u4ee5\u73fe\u968e\u6bb5\u8f03\u9069\u5408\u7576\u6210\u7814\u7a76\u65b9\u5411\u4f86\u7406\u89e3\uff0c\u800c\u4e0d\u662f\u5373\u88dd\u5373\u8dd1\u7684\u5de5\u5177\u3002\u82e5\u8981\u6e2c\u8a66\u5b83\u7684\u50f9\u503c\uff0c\u8f03\u5408\u7406\u7684\u65b9\u6cd5\u662f\u7559\u610f\u4e4b\u5f8c\u516c\u958b\u7684 inference \/ evaluation code\uff0c\u4e26\u5c0d\u7167 MMSI-Bench \u4e00\u985e multi-view \u8207 video spatial reasoning benchmark \u7684\u8868\u73fe\u3002<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>\u6838\u5fc3\u4e3b\u5f35\u662f\u7528 <strong>spatio-temporal evidence accumulation<\/strong> \u53d6\u4ee3 isolated frame-level prediction<\/li>\n\n\n\n<li>\u7cfb\u7d71\u7d50\u69cb\u5305\u542b VLM semantic planner\u3001hierarchy of spatial tools\u3001Scene Memory\u3001Agent Memory<\/li>\n\n\n\n<li>\u8ad6\u6587\u6307\u5728 zero-shot \u8a2d\u5b9a\u4e0b\u53ef\u63d0\u5347 Gemini-3-Pro\uff0cSFT \u5f8c\u7684 S-Agent-8B \u4ea6\u80fd\u63a5\u8fd1\u9ad8\u968e closed-source models<\/li>\n\n\n\n<li>\u9069\u5408\u7814\u7a76 spatial intelligence\u3001multi-view reasoning\u3001video understanding \u7684\u5718\u968a\u7559\u610f<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">\u76f8\u95dc\u6a21\u578b\u65b9\u9762\uff0c\u6587\u4e2d\u660e\u78ba\u63d0\u5230 Gemini-3-Pro\u3001Qwen-VL-8B\uff0c\u4ee5\u53ca\u84b8\u993e\u5f8c\u7684 S-Agent-8B\u3002\u82e5\u4f60\u95dc\u5fc3 Computer-use agents\u3001CUAs \u4ee5\u5916\uff0cAI \u5982\u4f55\u771f\u6b63\u7406\u89e3\u9023\u7e8c 3D \u4e16\u754c\uff0c\u9019\u500b\u9805\u76ee\u6bd4\u4e00\u822c\u5716\u7247\u554f\u7b54\u66f4\u6709\u7814\u7a76\u50f9\u503c\u3002<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>GitHub\uff1a<\/strong> <a href=\"https:\/\/github.com\/Ropedia\/S-Agent\" rel=\"noopener noreferrer\">https:\/\/github.com\/Ropedia\/S-Agent<\/a><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>\u9805\u76ee\uff1a<\/strong><a href=\"https:\/\/ropedia.github.io\/S-Agent\/\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/ropedia.github.io\/S-Agent\/<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>S-Agent\u4e0d\u662f\u53ea\u770b\u55ae\u5f35\u5716\u4e0b\u5224\u65b7\uff0c\u800c\u662f\u628a\u591a\u8996\u89d2\u8207\u5f71\u7247\u7dda\u7d22\u6162\u6162\u7d2f\u7a4d\u3002\u5b83\u4e3b\u6253\u7a7a\u9593\u63a8\u7406\uff0c\u65b9\u5411\u6bd4\u4e00\u822c Vision-Language Model \u66f4\u9032\u53d6\u3002<\/p>\n","protected":false},"author":8,"featured_media":9455,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"ai_generated_summary":"","footnotes":""},"categories":[133,171,185,140,116,38,132,119,76,149,197],"tags":[],"class_list":["post-9456","post","type-post","status-publish","format-standard","hentry","category-133","category-171","category-qwen","category-gemini","category-agentic","category-38","category-3d","category-119","category-76","category-149","category-framework"],"_links":{"self":[{"href":"https:\/\/infernews.com\/blog\/wp-json\/wp\/v2\/posts\/9456","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/infernews.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/infernews.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/infernews.com\/blog\/wp-json\/wp\/v2\/users\/8"}],"replies":[{"embeddable":true,"href":"https:\/\/infernews.com\/blog\/wp-json\/wp\/v2\/comments?post=9456"}],"version-history":[{"count":3,"href":"https:\/\/infernews.com\/blog\/wp-json\/wp\/v2\/posts\/9456\/revisions"}],"predecessor-version":[{"id":9461,"href":"https:\/\/infernews.com\/blog\/wp-json\/wp\/v2\/posts\/9456\/revisions\/9461"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/infernews.com\/blog\/wp-json\/wp\/v2\/media\/9455"}],"wp:attachment":[{"href":"https:\/\/infernews.com\/blog\/wp-json\/wp\/v2\/media?parent=9456"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/infernews.com\/blog\/wp-json\/wp\/v2\/categories?post=9456"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/infernews.com\/blog\/wp-json\/wp\/v2\/tags?post=9456"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}