無縫 M4T
Meta 三個月前公佈的 SeamlessM4T (Massively Multilingual and Multimodal Machine Translation model) ,目前已更新到 v2,於 GitHub 開放下載最新的源碼。SeamlessM4T v2 採用 UnitY2 架構的更新版本。與 SeamlessM4T v1 相比,此新模型在品質以及語音生成任務中的推理延遲方面有所改進。
M4T 是一體式大規模多語言和多模式的機器翻譯模型,可為近 100 種語言的語音和文字提供高品質翻譯。
SeamlessM4T 模型支援以下任務:
- 語音轉語音翻譯 (S2ST)
- 語音轉文字翻譯 (S2TT)
- 文字轉語音翻譯 (T2ST)
- 文本到文本翻譯 (T2TT)
- 自動語音辨識 (ASR)
下面列出了 SeamlessM4T-large (v1/v2) 支援的語言。 來源列指定是否支援某種語言作為來源語音 (Sp) 和/或來源文字 (Tx)。 目標列指定是否支援某語言作為目標語音 (Sp) 和/或目標文字 (Tx)。可惜暫時未見有廣東話 tts!
| 編碼 | 語言 | script | 來源 | 目標 |
|---|---|---|---|---|
| afr | Afrikaans | Latn | Sp, Tx | Tx |
| amh | Amharic | Ethi | Sp, Tx | Tx |
| arb | Modern Standard Arabic | Arab | Sp, Tx | Sp, Tx |
| ary | Moroccan Arabic | Arab | Sp, Tx | Tx |
| arz | Egyptian Arabic | Arab | Sp, Tx | Tx |
| asm | Assamese | Beng | Sp, Tx | Tx |
| ast | Asturian | Latn | Sp | — |
| azj | North Azerbaijani | Latn | Sp, Tx | Tx |
| bel | Belarusian | Cyrl | Sp, Tx | Tx |
| ben | Bengali | Beng | Sp, Tx | Sp, Tx |
| bos | Bosnian | Latn | Sp, Tx | Tx |
| bul | Bulgarian | Cyrl | Sp, Tx | Tx |
| cat | Catalan | Latn | Sp, Tx | Sp, Tx |
| ceb | Cebuano | Latn | Sp, Tx | Tx |
| ces | Czech | Latn | Sp, Tx | Sp, Tx |
| ckb | Central Kurdish | Arab | Sp, Tx | Tx |
| cmn | Mandarin Chinese | Hans | Sp, Tx | Sp, Tx |
| cmn_Hant | Mandarin Chinese | Hant | Sp, Tx | Sp, Tx |
| cym | Welsh | Latn | Sp, Tx | Sp, Tx |
| dan | Danish | Latn | Sp, Tx | Sp, Tx |
| deu | German | Latn | Sp, Tx | Sp, Tx |
| ell | Greek | Grek | Sp, Tx | Tx |
| eng | English | Latn | Sp, Tx | Sp, Tx |
| est | Estonian | Latn | Sp, Tx | Sp, Tx |
| eus | Basque | Latn | Sp, Tx | Tx |
| fin | Finnish | Latn | Sp, Tx | Sp, Tx |
| fra | French | Latn | Sp, Tx | Sp, Tx |
| fuv | Nigerian Fulfulde | Latn | Sp, Tx | Tx |
| gaz | West Central Oromo | Latn | Sp, Tx | Tx |
| gle | Irish | Latn | Sp, Tx | Tx |
| glg | Galician | Latn | Sp, Tx | Tx |
| guj | Gujarati | Gujr | Sp, Tx | Tx |
| heb | Hebrew | Hebr | Sp, Tx | Tx |
| hin | Hindi | Deva | Sp, Tx | Sp, Tx |
| hrv | Croatian | Latn | Sp, Tx | Tx |
| hun | Hungarian | Latn | Sp, Tx | Tx |
| hye | Armenian | Armn | Sp, Tx | Tx |
| ibo | Igbo | Latn | Sp, Tx | Tx |
| ind | Indonesian | Latn | Sp, Tx | Sp, Tx |
| isl | Icelandic | Latn | Sp, Tx | Tx |
| ita | Italian | Latn | Sp, Tx | Sp, Tx |
| jav | Javanese | Latn | Sp, Tx | Tx |
| jpn | Japanese | Jpan | Sp, Tx | Sp, Tx |
| kam | Kamba | Latn | Sp | — |
| kan | Kannada | Knda | Sp, Tx | Tx |
| kat | Georgian | Geor | Sp, Tx | Tx |
| kaz | Kazakh | Cyrl | Sp, Tx | Tx |
| kea | Kabuverdianu | Latn | Sp | — |
| khk | Halh Mongolian | Cyrl | Sp, Tx | Tx |
| khm | Khmer | Khmr | Sp, Tx | Tx |
| kir | Kyrgyz | Cyrl | Sp, Tx | Tx |
| kor | Korean | Kore | Sp, Tx | Sp, Tx |
| lao | Lao | Laoo | Sp, Tx | Tx |
| lit | Lithuanian | Latn | Sp, Tx | Tx |
| ltz | Luxembourgish | Latn | Sp | — |
| lug | Ganda | Latn | Sp, Tx | Tx |
| luo | Luo | Latn | Sp, Tx | Tx |
| lvs | Standard Latvian | Latn | Sp, Tx | Tx |
| mai | Maithili | Deva | Sp, Tx | Tx |
| mal | Malayalam | Mlym | Sp, Tx | Tx |
| mar | Marathi | Deva | Sp, Tx | Tx |
| mkd | Macedonian | Cyrl | Sp, Tx | Tx |
| mlt | Maltese | Latn | Sp, Tx | Sp, Tx |
| mni | Meitei | Beng | Sp, Tx | Tx |
| mya | Burmese | Mymr | Sp, Tx | Tx |
| nld | Dutch | Latn | Sp, Tx | Sp, Tx |
| nno | Norwegian Nynorsk | Latn | Sp, Tx | Tx |
| nob | Norwegian Bokmål | Latn | Sp, Tx | Tx |
| npi | Nepali | Deva | Sp, Tx | Tx |
| nya | Nyanja | Latn | Sp, Tx | Tx |
| oci | Occitan | Latn | Sp | — |
| ory | Odia | Orya | Sp, Tx | Tx |
| pan | Punjabi | Guru | Sp, Tx | Tx |
| pbt | Southern Pashto | Arab | Sp, Tx | Tx |
| pes | Western Persian | Arab | Sp, Tx | Sp, Tx |
| pol | Polish | Latn | Sp, Tx | Sp, Tx |
| por | Portuguese | Latn | Sp, Tx | Sp, Tx |
| ron | Romanian | Latn | Sp, Tx | Sp, Tx |
| rus | Russian | Cyrl | Sp, Tx | Sp, Tx |
| slk | Slovak | Latn | Sp, Tx | Sp, Tx |
| slv | Slovenian | Latn | Sp, Tx | Tx |
| sna | Shona | Latn | Sp, Tx | Tx |
| snd | Sindhi | Arab | Sp, Tx | Tx |
| som | Somali | Latn | Sp, Tx | Tx |
| spa | Spanish | Latn | Sp, Tx | Sp, Tx |
| srp | Serbian | Cyrl | Sp, Tx | Tx |
| swe | Swedish | Latn | Sp, Tx | Sp, Tx |
| swh | Swahili | Latn | Sp, Tx | Sp, Tx |
| tam | Tamil | Taml | Sp, Tx | Tx |
| tel | Telugu | Telu | Sp, Tx | Sp, Tx |
| tgk | Tajik | Cyrl | Sp, Tx | Tx |
| tgl | Tagalog | Latn | Sp, Tx | Sp, Tx |
| tha | Thai | Thai | Sp, Tx | Sp, Tx |
| tur | Turkish | Latn | Sp, Tx | Sp, Tx |
| ukr | Ukrainian | Cyrl | Sp, Tx | Sp, Tx |
| urd | Urdu | Arab | Sp, Tx | Sp, Tx |
| uzn | Northern Uzbek | Latn | Sp, Tx | Sp, Tx |
| vie | Vietnamese | Latn | Sp, Tx | Sp, Tx |
| xho | Xhosa | Latn | Sp | — |
| yor | Yoruba | Latn | Sp, Tx | Tx |
| yue | Cantonese | Hant | Sp, Tx | Tx |
| zlm | Colloquial Malay | Latn | Sp | — |
| zsm | Standard Malay | Latn | Tx | Tx |
| zul | Zulu | Latn | Sp, Tx | Tx |
請注意,seamlessM4T-medium 在文字模式中支援 200 種語言,並且基於 NLLB-200(請參閱資產卡中的完整清單)
Hugging Face Demo (A100 GPU)