LongCat-Video 1.5：生成更實用的長片

LongCat-Video 是一個 13.6B 參數的影片生成項目，主打把文字生成影片、圖片生成影片，以及影片續寫放進同一套架構。對一般使用者來說，最易明白的價值是：不用為不同影片任務分開找不同模型，處理流程可以更集中。

它解決長影片生成常見的畫面走樣、色彩飄移，以及愈生成愈差的情況。項目特別提到自己原生預訓練了影片續寫能力，因此在長時間內容上較有優勢，目標是生成分鐘級影片時仍保持穩定。

先決定輸入方式：有文字概念就做 Text-to-Video，有單張圖片就做 Image-to-Video，要接續既有片段就用 Video-Continuation。提供相關模型與延伸版本，包括 LongCat-Video、LongCat-Video-Avatar 1.5，以及 Hugging Face 與 ModelScope 上提供的模型頁面。

它同時強調速度與畫質。項目表示透過時間與空間兩個方向的 coarse-to-fine 生成策略，再配合 Block Sparse Attention，可在數分鐘內產出 720p、30fps 影片；這類設計對高解析度生成尤其重要，因為影片模型最常見瓶頸就是算力與等待時間。

單一模型支援 Text-to-Video、Image-to-Video、Video-Continuation
強調長影片生成，主打減少色偏與畫質退化
以 coarse-to-fine 加速推理，兼顧效率與解析度
提到用多重獎勵的 GRPO 強化學習提升整體表現

這項目較適合關注開源影片生成、長片段內容、角色或場景延續的人，也適合想研究統一式影片模型設計的開發者。其表現可比肩領先開源模型與新近商業方案，但更細的分數與比較細節，仍需要配合技術報告一併閱讀會較穩妥。

Evaluation Results

Text-to-Video

The Text-to-Video MOS evaluation results on our internal benchmark.

MOS score	Veo3	PixVerse-V5	Wan 2.2-T2V-A14B	LongCat-Video
Accessibility	Proprietary	Proprietary	Open Source	Open Source
Architecture	–	–	MoE	Dense
# Total Params	–	–	28B	13.6B
# Activated Params	–	–	14B	13.6B
Text-Alignment↑	3.99	3.81	3.70	3.76
Visual Quality↑	3.23	3.13	3.26	3.25
Motion Quality↑	3.86	3.81	3.78	3.74
Overall Quality↑	3.48	3.36	3.35	3.38

Image-to-Video

The Image-to-Video MOS evaluation results on our internal benchmark.

MOS score	Seedance 1.0	Hailuo-02	Wan 2.2-I2V-A14B	LongCat-Video
Accessibility	Proprietary	Proprietary	Open Source	Open Source
Architecture	–	–	MoE	Dense
# Total Params	–	–	28B	13.6B
# Activated Params	–	–	14B	13.6B
Image-Alignment↑	4.12	4.18	4.18	4.04
Text-Alignment↑	3.70	3.85	3.33	3.49
Visual Quality↑	3.22	3.18	3.23	3.27
Motion Quality↑	3.77	3.80	3.79	3.59
Overall Quality↑	3.35	3.27	3.26	3.17

GitHub： https://github.com/meituan-longcat/LongCat-Video