-
Nanjing University
- Nanjing
- https://czczup.github.io/
Highlights
- Pro
Block or Report
Block or report czczup
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abuseStars
Language
Sort by: Recently starred
Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning
OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
This is the official implementation of the paper "Needle In A Multimodal Haystack"
✨✨Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
An efficient, flexible and full-featured toolkit for fine-tuning LLM (InternLM2, Llama3, Phi3, Qwen, Mistral, ...)
VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and laptops)
PsyDI: A MBTI agent that helps you understand your personality type through a relaxed multi-modal interaction.
Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing images as inputs. Supports MiniGPT-4, LLaMA-Adapter V2, LLaVA, B…
MuLan: Adapting Multilingual Diffusion Models for 110+ Languages (无需额外训练为任意扩散模型支持多语言能力)
[ICLR'24 spotlight] Chinese and English Multimodal Large Model Series (Chat and Paint) | 基于CPM基础模型的中英双语多模态大模型系列
OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
ms-swift: Use PEFT or Full-parameter to finetune 250+ LLMs or 35+ MLLMs. (Qwen2, GLM4, Internlm2, Yi, Llama3, Llava, MiniCPM-V, Deepseek, Baichuan2, Phi3-Vision, ...)
An OpenAI API compatible API for chat with image input and questions about the images. aka Multimodal.
This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.
ICML'2024 | MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4V. 接近GPT-4V表现的可商用开源多模态对话模型
Accelerating the development of large multimodal models (LMMs) with lmms-eval
[ICLR 2024 Spotlight] Bounding Box Stability against Feature Dropout Reflects Detector Generalization across Environments
Data and benchmark code for the EgoExoLearn dataset
Open source implementation and models of One-step Diffusion with Distribution Matching Distillation
Official repository of NEFTune: Noisy Embeddings Improves Instruction Finetuning
The suite of modeling video with Mamba