智源bge开源一系列多模态向量模型，冲~

智源研究院BAAI开源一系列多模态检索模型BGE-VL，包括BGE-VL-CLIP（base版和large版）和BGE-VL-MLLM。

可很方便使用BGE-VL-CLIP模型，支持图文检索召回图文，可用于RAG、Agentic应用场景。

import torchfrom transformers import AutoModelMODEL_NAME = "BAAI/BGE-VL-base" # or "BAAI/BGE-VL-large"model = AutoModel.from_pretrained(MODEL_NAME, trust_remote_code=True) # You must set trust_remote_code=Truemodel.set_processor(MODEL_NAME)model.eval()with torch.no_grad():    query = model.encode(        images = "./assets/cir_query.png",         text = "Make the background dark, as if the camera has taken the photo at night"    )    candidates = model.encode(        images = ["./assets/cir_candi_1.png", "./assets/cir_candi_2.png"]    )        scores = query @ candidates.Tprint(scores)

BGE-VL在MegaPairs上训练而成，这是一种新颖的数据合成方法，利用开放领域的图像创建异构KNN三元组，用于通用多模态检索，包含超过2600万个三元组。

零样本组合图像检索

BGE-VL在零样本组合图像检索任务中树立了新的性能标杆。在CIRCO基准测试中，BGE-VL-base模型，尽管只有1.49亿个参数，却超越了所有之前的模型，包括那些参数量多出50倍的模型。此外，BGE-VL-MLLM相较于之前的最先进模型，性能提升了8.1%。

在MMEB上的零样本性能

尽管仅在图像文本到图像的范式下进行训练，BGE-VL-MLLM在大规模多模态嵌入基准测试（MMEB）上实现了最先进的零样本性能。这表明MegaPairs在多模态嵌入方面具有出色的泛化能力。

更多信息：《动手设计AI Agents：CrewAI版》、《高级RAG之36技》、新技术实战：中文Lazy-GraphRAG/Manus+MCP/GRPO+Agent、大模型日报/月报、最新技术热点追踪解读（GPT4-o/数字人/MCP/Gemini 2.5 Pro）

https://github.com/VectorSpaceLab/MegaPairshttps://hf-mirror.com/BAAI/BGE-VL-MLLM-S2https://arxiv.org/pdf/2412.14475

（文：PaperAgent）

一	二	三	四	五	六	日
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30	31

发表评论 取消回复

发表评论取消回复