Mixture of Agents—a framework that leverages the collective strengths of multiple LLMs. Each layer contains multiple agents that refine responses using outputs from the preceding layer.
Together MoA achieves a score of 65.1% on AlpacaEval 2.0. Together MoA uses six open source models as proposers and Qwen1.5-110B-Chat as the final aggregators with three layers.
We also evaluate on FLASK which offers more fine-grained evaluation and outperforms original models on most of the dimensions. Both Together MoA and Together MoA-Lite are on the Pareto front, indicated by the dashed curve, in the performance vs. cost plot.
Try Together MoA through our interactive demo. Please note that the TTFT is slow at the moment due to the iterative refinement process of MoA, but we are actively working on optimizations.
📣 Blog: https://lnkd.in/g26Ya_98
📄 Paper: https://lnkd.in/g_vc9Hac
⌨️ Code: https://lnkd.in/gF2xEEeb
🖥️ CLI Demo: https://gttps://https://lnkd.in/gifKtUQv
This work was made possible through the collaborative efforts of several open-source projects. We appreciate Meta, Mistral AI, Microsoft, Alibaba Cloud, and Databricks for developing the Llama, Mixtral, WizardLM, Qwen, and DBRX models. We also thank Tatsu Labs, lmsys.org, and KAIST AI for the AlpacaEval, MT-Bench, and FLASK evaluation benchmarks.