Arthur

Arthur

Software Development

New York, New York 6,427 followers

The AI Performance Company

About us

The AI Performance Company. We work with enterprise teams to monitor, measure, and improve machine learning models for better results across accuracy, explainability, and fairness. We are deeply passionate about building technology to make AI work for everyone. Arthur is an equal opportunity employer and we believe strongly in "front-end ethics": building a sustainable company and industry where strong performance and a positive human impact are inextricably linked. We're hiring! Take a look at our open roles at arthur.ai/careers.

Website
https://arthur.ai/
Industry
Software Development
Company size
11-50 employees
Headquarters
New York, New York
Type
Privately Held
Founded
2018

Locations

Employees at Arthur

Updates

  • View organization page for Arthur, graphic

    6,427 followers

    6️⃣ Tools for Getting Started with LLM Experimentation & Development 🛠️🧰 With the field of AI changing at such a rapid pace, it can feel nearly impossible to stay up to date with the latest tools and techniques. Here are a few that our ML Research Scientist Max Cembalest thinks are productive, innovative, and easy to use! 🧑🔬 For Experimentation: - LiteLLM (YC W23): A simple client API that makes it easy to test major LLM providers. It maintains enough of a common format for your LLM inputs for painless swapping between providers. - Ollama: A tool for experimenting with open-source models, with a git-like CLI to fetch all the latest models (at various levels of quantization so you can run quickly from a laptop) and prompt from the terminal. - MLX: Built specifically for Apple hardware, MLX brings massive improvements to the speed and memory-efficiency of running and training all the standard and state-of-the-art AI models on Apple devices. - DSPy: Designed to be analogous to PyTorch—every time the LLM, retriever, evaluation criteria, or anything else is modified, DSPy can re-optimize a new set of prompts and examples that max out your evaluation criteria. 📊 For Evaluation: - Elo: Traditionally used to rank chess players, the Elo rating system has been employed to compare the relative strengths of various AI language models based on votes from human evaluators. It has become a very popular and cost-effective general purpose metric to quantitatively rank LLMs from head-to-head blind A/B preference tests. - Arthur Bench: Last but not least, Bench is our open-source evaluation product for comparing LLMs, prompts, and hyperparameters for generative text models. It enables businesses to evaluate how different LLMs will perform in real-world scenarios so they can make informed, data-driven decisions when integrating the latest AI technologies into their operations. 

  • View organization page for Arthur, graphic

    6,427 followers

    2024 is the year of multimodal AI. 💬 🖼️ 🎥 🎤 AI systems are unlocking new applications and seeing improved performance by combining data types like text, image, video, audio. In our latest blog post, learn about multimodal AI techniques, business use cases, and why it’s poised to revolutionize the way we interact with technology: https://bit.ly/4bXKy0f

    • No alternative text description for this image
  • View organization page for Arthur, graphic

    6,427 followers

    Let’s talk LLM experimentation. 🧑🔬 One day, there may be a principled, scientific, and repeatable way to pick the right LLM and the right tools for any job. But until we have that, a level of flexibility and ad-hoc artistry is necessary to decide which patchwork of features is best suited to serve an application’s needs. So, in order to continue experimenting and ensure you’re getting the most value out of LLMs, it’s important to stay up to date on the latest tools and techniques to do so. In this comprehensive guide, we highlighted a number of projects in three categories: 🤳 Touchpoints: Quick, minimal LLM experimentation interfaces ⚖️ Evaluation: Metrics and relevant benchmark datasets 🪄 Enhancing Prompts: RAG, APIs, and well-chosen examples for your LLM to see how it’s done Check it out: https://bit.ly/4e5PPEr

    • No alternative text description for this image
  • View organization page for Arthur, graphic

    6,427 followers

    Our CEO Adam Wenchel caught up with the folks at Bloomberg Technology yesterday to talk about the latest in enterprise adoption of generative AI. He also discussed Arthur’s recent Generative Assessment Project study where we evaluated the industry’s top LLMs (from providers like OpenAI, Anthropic, Meta, and more) at answering questions and remaining grounded to context. 👉 Check out the full study here: https://bit.ly/3V3O4Pl

  • View organization page for Arthur, graphic

    6,427 followers

    Large language models, small language models, closed-source models, open-source models—how do you know which of these to use and whether they’ve contributed to positive ROI? Register for our webinar next Thursday to learn how to more easily run language models, compare them, evaluate them, and understand their performance: https://bit.ly/3QDGjhF

    • No alternative text description for this image

Similar pages

Browse jobs

Funding