Scan to Download Gate App
qrCode
More Download Options
Don't remind me again today

ByteDance and Zhejiang University jointly launched Vista-LLaMA, a multimodal large language model that can interpret video content

Bit ByteDance has partnered with Zhejiang University to launch Vista-LLaMA, a multimodal large language model designed for video content understanding and capable of outputting high-quality video descriptions. Through innovative visual and verbal token processing, Vista-LLaMA solves the problem of “hallucinations” in video content.

Vista-LLaMA excels in multiple open video Q&A benchmarks, especially in the NExT-QA and MSRVTT-QA tests. It achieved an accuracy rate of 60.7% in the zero-shot NExT-QA test and 60.5% in the MSRVTT-QA test, surpassing all current SOTA methods. These results demonstrate the efficiency and accuracy of Vista-LLaMA in video content understanding and description generation.

TOKEN-0.29%
View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • 1
  • Repost
  • Share
Comment
0/400
TalkingAboutCurrencyvip
· 2024-03-14 21:37
Stud All in 🙌
View OriginalReply0
Trade Crypto Anywhere Anytime
qrCode
Scan to download Gate App
Community
English
  • 简体中文
  • English
  • Tiếng Việt
  • 繁體中文
  • Español
  • Русский
  • Français (Afrique)
  • Português (Portugal)
  • Bahasa Indonesia
  • 日本語
  • بالعربية
  • Українська
  • Português (Brasil)