
{"id":8628,"date":"2026-05-30T16:30:45","date_gmt":"2026-05-30T08:30:45","guid":{"rendered":"https:\/\/infernews.com\/blog\/how-far-is-video-generation-from-world-model-a-causality-perspective\/"},"modified":"2026-05-30T16:31:28","modified_gmt":"2026-05-30T08:31:28","slug":"how-far-is-video-generation-from-world-model-a-causality-perspective","status":"publish","type":"post","link":"https:\/\/infernews.com\/blog\/how-far-is-video-generation-from-world-model-a-causality-perspective\/","title":{"rendered":"YoCausal \u7528\u5f71\u7247\u5012\u64ad\u6e2c\u8a66\u6a21\u578b\u56e0\u679c\u611f"},"content":{"rendered":"\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" src=\"https:\/\/infernews.com\/blog\/wp-content\/uploads\/2026\/05\/pasted-2afb58e2e3b1.jpg\" alt=\"YoCausal Logo\"\/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">YoCausal \u662f\u4e00\u500b\u7528\u4f86\u8a55\u6e2c Video Diffusion Models\uff08VDMs\uff09\u5605\u9805\u76ee\uff0c\u6838\u5fc3\u554f\u984c\u597d\u76f4\u63a5\uff1a\u6a21\u578b\u898b\u5230\u4e00\u6bb5\u5f71\u7247\u6642\uff0c\u7a76\u7adf\u4fc2\u7406\u89e3\u4e8b\u4ef6\u56e0\u679c\uff0c\u5b9a\u53ea\u4fc2\u8a18\u4f4f\u756b\u9762\u5e38\u898b\u5605\u6642\u9593\u6a21\u5f0f\u3002\u5462\u500b\u9805\u76ee\u7528\u6b63\u64ad\u540c\u5012\u64ad\u5f71\u7247\u6bd4\u8f03 denoising loss\uff0c\u82e5\u6a21\u578b\u5c0d\u6b63\u5411\u5f71\u7247\u5206\u6578\u66f4\u5408\u7406\uff0c\u4ee3\u8868\u5b83\u8f03\u80fd\u5206\u8fa8\u81ea\u7136\u56e0\u679c\u95dc\u4fc2\u3002<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">\u5b83\u63d0\u51fa\u5169\u500b\u95dc\u9375\u6307\u6a19\uff1aReverse Surprise Index\uff08RSI\uff09\u540c Causality Cognition Index\uff08CCI\uff09\u3002RSI \u4e3b\u8981\u7747\u6a21\u578b\u6709\u5e7e\u591a\u6b21\u504f\u597d\u6b63\u5411\u6642\u9593\u6d41\uff1bCCI \u518d\u9032\u4e00\u6b65\u5c07\u300c\u77e5\u9053\u6642\u9593\u65b9\u5411\u300d\u540c\u300c\u771f\u6b63\u7406\u89e3\u56e0\u679c\u300d\u5206\u958b\uff0c\u907f\u514d\u53ea\u9760\u6642\u9593\u7dda\u7d22\u5c31\u88ab\u8aa4\u5224\u70ba\u61c2\u56e0\u679c\u3002<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">\u4f7f\u7528\u5462\u500b\u9805\u76ee\u6642\uff0c\u91cd\u9ede\u5514\u4fc2\u8a13\u7df4\u65b0\u6a21\u578b\uff0c\u800c\u4fc2\u66ff\u73fe\u6709\u6a21\u578b\u5beb evaluator\uff0c\u7136\u5f8c\u7528\u6307\u5b9a\u8cc7\u6599\u96c6\u8dd1\u8a55\u6e2c\u3002\u9805\u76ee\u4ea6\u63d0\u4f9b leaderboard \u63d0\u4ea4\u683c\u5f0f\uff0c\u6703\u8981\u6c42\u6a21\u578b\u540d\u7a31\u3001\u7248\u672c\u6216 checkpoint\u3001\u6a21\u578b\u5927\u5c0f\uff0c\u4ee5\u53ca evaluation result JSON \u6a94\u6848\uff1b\u82e5\u6539\u52d5\u904e\u9810\u8a2d\u8a2d\u5b9a\u6216 preprocessing protocol\uff0c\u4e5f\u8981\u4e00\u4f75\u8aaa\u660e\u3002<\/p>\n\n\n<figure class=\"wp-block-embed-youtube wp-block-embed is-type-video is-provider-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio\"><div class=\"lyte-wrapper\" title=\"YoCausal: How Far is Video Generation from World Model? A Causality Perspective\" style=\"width:853px;max-width:100%;margin:5px auto;\"><div class=\"lyMe\" id=\"WYL_dCaAfWTUCFA\" itemprop=\"video\" itemscope itemtype=\"https:\/\/schema.org\/VideoObject\"><div><meta itemprop=\"thumbnailUrl\" content=\"https:\/\/infernews.com\/blog\/wp-content\/plugins\/wp-youtube-lyte\/lyteCache.php?origThumbUrl=https%3A%2F%2Fi.ytimg.com%2Fvi%2FdCaAfWTUCFA%2Fhqdefault.jpg\" \/><meta itemprop=\"embedURL\" content=\"https:\/\/www.youtube.com\/embed\/dCaAfWTUCFA\" \/><meta itemprop=\"duration\" content=\"PT1M53S\" \/><meta itemprop=\"uploadDate\" content=\"2026-05-28T12:06:09Z\" \/><\/div><div id=\"lyte_dCaAfWTUCFA\" data-src=\"https:\/\/infernews.com\/blog\/wp-content\/plugins\/wp-youtube-lyte\/lyteCache.php?origThumbUrl=https%3A%2F%2Fi.ytimg.com%2Fvi%2FdCaAfWTUCFA%2Fhqdefault.jpg\" class=\"pL\"><div class=\"tC\"><div class=\"tT\" itemprop=\"name\">YoCausal: How Far is Video Generation from World Model? A Causality Perspective<\/div><\/div><div class=\"play\"><\/div><div class=\"ctrl\"><div class=\"Lctrl\"><\/div><div class=\"Rctrl\"><\/div><\/div><\/div><noscript><a href=\"https:\/\/youtu.be\/dCaAfWTUCFA\" rel=\"nofollow\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/infernews.com\/blog\/wp-content\/plugins\/wp-youtube-lyte\/lyteCache.php?origThumbUrl=https%3A%2F%2Fi.ytimg.com%2Fvi%2FdCaAfWTUCFA%2F0.jpg\" alt=\"YoCausal: How Far is Video Generation from World Model? A Causality Perspective\" width=\"853\" height=\"460\" \/><br \/>Watch this video on YouTube<\/a><\/noscript><meta itemprop=\"description\" content=\"TL;DR: YoCausal is the first benchmark evaluating causal cognition in video generation models, inspired by cognitive science experiments that test whether infants perceive causality using reversed videos. Our benchmark can incorporate any real-world video at zero cost, making it arbitrarily extensible to easily assess video generation models&#039; understanding of diverse types of causality. Project Page: https:\/\/www.youzhexie.me\/papers\/YoCausal\/index.html As video diffusion models (VDMs) advance toward world models, a key question arises: do they truly understand causality, or merely overfit to statistical temporal patterns? Existing benchmarks mostly rely on synthetic data, limiting real-world generalization due to the sim-to-real gap. We present YoCausal, a two-level benchmark inspired by the Violation of Expectation (VoE) paradigm from cognitive science. By temporally reversing real-world videos at zero cost as natural counterfactual samples, YoCausal establishes an arbitrarily extensible evaluation protocol. Level 1 introduces the Reverse Surprise Index (RSI), quantifying arrow-of-time perception via denoising loss. Level 2 introduces the Causality Cognition Index (CCI), which leverages a VLM to stratify datasets into causal and non-causal subsets, disentangling genuine causal reasoning from temporal bias. Evaluation of 13 state-of-the-art VDMs reveals that perceiving the arrow of time does not imply understanding causality, and a significant gap persists relative to human-level causal cognition.\"><\/div><\/div><div class=\"lL\" style=\"max-width:100%;width:853px;margin:5px auto;\"><\/div><figcaption><\/figcaption><\/figure>\n\n\n<ul class=\"wp-block-list\">\n<li>\u7528\u771f\u5be6\u4e16\u754c\u5f71\u7247\u5012\u64ad\u505a counterfactual\uff0c\u6bd4\u7d14\u5408\u6210\u8cc7\u6599\u66f4\u8cbc\u8fd1\u5e38\u898b\u5834\u666f<\/li>\n\n\n\n<li>\u4ee5 denoising loss \u6bd4\u8f03\u6b63\u64ad\u8207\u5012\u64ad\uff0c\u6e2c\u6cd5\u6e05\u695a\u800c\u4e14\u53ef\u64f4\u5145<\/li>\n\n\n\n<li>RSI \u6e2c\u6642\u9593\u65b9\u5411\u611f\u77e5\uff0cCCI \u5617\u8a66\u62c6\u51fa\u66f4\u63a5\u8fd1\u56e0\u679c\u7406\u89e3\u5605\u90e8\u5206<\/li>\n\n\n\n<li>\u5df2\u8a55\u6e2c 13 \u500b state-of-the-art VDMs\uff0c\u7d50\u679c\u986f\u793a\u6642\u9593\u611f\u77e5\u4e0d\u7b49\u65bc\u56e0\u679c\u7406\u89e3<\/li>\n\n\n\n<li>\u6587\u4ef6\u63d0\u5230 Wan Model Evaluation\uff08DiffSynth-Studio\uff09\uff0c\u4ea6\u652f\u63f4\u6392\u884c\u699c\u63d0\u4ea4\u6d41\u7a0b<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">\u7531\u8ad6\u6587\u5167\u5bb9\u770b\uff0cYoCausal \u6700\u5927\u50f9\u503c\u4fc2\u6307\u51fa\u4e00\u500b\u5e38\u88ab\u5ffd\u7565\u5605\u843d\u5dee\uff1a\u5f71\u7247\u751f\u6210\u6108\u975a\uff0c\u5514\u4ee3\u8868\u6108\u63a5\u8fd1 world model\u3002\u8a55\u6e2c\u7d50\u679c\u986f\u793a\uff0c\u5373\u4f7f\u4fc2\u8868\u73fe\u8f03\u524d\u5605\u6a21\u578b\uff0c\u4f8b\u5982 Wan2.2-A14B\uff0c\u8207 human baseline \u4e4b\u9593\u4f3c\u4e4e\u4ecd\u6709\u660e\u986f\u5dee\u8ddd\uff1b\u4e2d\u5f8c\u6bb5\u6a21\u578b\u5982 CogVideoX1.5-5B\u3001AnimateDiff-SDXL \u5247\u8f03\u6613\u51fa\u73fe\u9055\u53cd\u56e0\u679c\u5605\u756b\u9762\u8b8a\u5316\u3002<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">\u5462\u500b\u9805\u76ee\u9069\u5408\u7814\u7a76 Video Diffusion Models\uff08VDMs\uff09\u3001world model\u3001\u5f71\u7247\u7406\u89e3\u8207\u751f\u6210\u8a55\u6e2c\u5605\u4eba\uff0c\u4e5f\u9069\u5408\u60f3\u6bd4\u8f03\u4e0d\u540c\u6a21\u578b\u56e0\u679c\u80fd\u529b\u5605\u5718\u968a\u3002\u5c0d\u4e00\u822c\u958b\u767c\u8005\u800c\u8a00\uff0c\u5b83\u6700\u6709\u7528\u4e4b\u8655\u4fc2\u63d0\u4f9b\u4e00\u5957\u8f03\u6709\u89e3\u91cb\u529b\u5605\u6aa2\u67e5\u65b9\u6cd5\uff0c\u5e6b\u4f60\u77e5\u9053\u6a21\u578b\u5931\u5206\u4fc2\u56e0\u70ba\u5514\u61c2\u56e0\u679c\uff0c\u5b9a\u53ea\u4fc2\u5c0d\u6642\u9593\u65b9\u5411\u53cd\u61c9\u4e0d\u8db3\u3002<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>GitHub\uff1a<\/strong> <a href=\"https:\/\/github.com\/youzhe0305\/YoCausal\" rel=\"noopener noreferrer\">https:\/\/github.com\/youzhe0305\/YoCausal<\/a><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>\u9805\u76ee\uff1a<\/strong> <a href=\"https:\/\/www.youzhexie.me\/papers\/YoCausal\/index.html\" rel=\"noopener noreferrer\">https:\/\/www.youzhexie.me\/papers\/YoCausal\/index.html<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>YoCausal\u5514\u4fc2\u6559\u4f60\u751f\u6210\u5f71\u7247\uff0c\u800c\u4fc2\u6aa2\u67e5\u6a21\u578b\u6709\u5187\u771f\u6b63\u7406\u89e3\u5148\u5f8c\u56e0\u679c\u3002\u5b83\u7528\u771f\u5be6\u5f71\u7247\u5012\u64ad\u505a\u57fa\u6e96\uff0c\u8a2d\u8a08\u76f8\u7576\u6e05\u6670\u3002<\/p>\n","protected":false},"author":8,"featured_media":8627,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"ai_generated_summary":"","wpai_meta_description":"","footnotes":""},"categories":[133,132,149,186,197],"tags":[],"class_list":["post-8628","post","type-post","status-publish","format-standard","hentry","category-133","category-3d","category-149","category-186","category-framework"],"_links":{"self":[{"href":"https:\/\/infernews.com\/blog\/wp-json\/wp\/v2\/posts\/8628","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/infernews.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/infernews.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/infernews.com\/blog\/wp-json\/wp\/v2\/users\/8"}],"replies":[{"embeddable":true,"href":"https:\/\/infernews.com\/blog\/wp-json\/wp\/v2\/comments?post=8628"}],"version-history":[{"count":1,"href":"https:\/\/infernews.com\/blog\/wp-json\/wp\/v2\/posts\/8628\/revisions"}],"predecessor-version":[{"id":8630,"href":"https:\/\/infernews.com\/blog\/wp-json\/wp\/v2\/posts\/8628\/revisions\/8630"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/infernews.com\/blog\/wp-json\/wp\/v2\/media\/8627"}],"wp:attachment":[{"href":"https:\/\/infernews.com\/blog\/wp-json\/wp\/v2\/media?parent=8628"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/infernews.com\/blog\/wp-json\/wp\/v2\/categories?post=8628"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/infernews.com\/blog\/wp-json\/wp\/v2\/tags?post=8628"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}