Arthur

Software Development

New York, New York 6,427 followers

The AI Performance Company

View all 53 employees

About us

The AI Performance Company. We work with enterprise teams to monitor, measure, and improve machine learning models for better results across accuracy, explainability, and fairness. We are deeply passionate about building technology to make AI work for everyone. Arthur is an equal opportunity employer and we believe strongly in "front-end ethics": building a sustainable company and industry where strong performance and a positive human impact are inextricably linked. We're hiring! Take a look at our open roles at arthur.ai/careers.

Website: https://arthur.ai/
External link for Arthur
Industry: Software Development
Company size: 11-50 employees
Headquarters: New York, New York
Type: Privately Held
Founded: 2018

Locations

Primary

140 Crosby St

6th Floor

New York, New York 10012, US

Get directions
1140 3rd St NE

Washington, District of Columbia, US

Get directions

Employees at Arthur

See all employees

Updates

Arthur

6,427 followers
20h
Report this post
6️⃣ Tools for Getting Started with LLM Experimentation & Development 🛠️🧰 With the field of AI changing at such a rapid pace, it can feel nearly impossible to stay up to date with the latest tools and techniques. Here are a few that our ML Research Scientist Max Cembalest thinks are productive, innovative, and easy to use! 🧑🔬 For Experimentation: - LiteLLM (YC W23): A simple client API that makes it easy to test major LLM providers. It maintains enough of a common format for your LLM inputs for painless swapping between providers. - Ollama: A tool for experimenting with open-source models, with a git-like CLI to fetch all the latest models (at various levels of quantization so you can run quickly from a laptop) and prompt from the terminal. - MLX: Built specifically for Apple hardware, MLX brings massive improvements to the speed and memory-efficiency of running and training all the standard and state-of-the-art AI models on Apple devices. - DSPy: Designed to be analogous to PyTorch—every time the LLM, retriever, evaluation criteria, or anything else is modified, DSPy can re-optimize a new set of prompts and examples that max out your evaluation criteria. 📊 For Evaluation: - Elo: Traditionally used to rank chess players, the Elo rating system has been employed to compare the relative strengths of various AI language models based on votes from human evaluators. It has become a very popular and cost-effective general purpose metric to quantitatively rank LLMs from head-to-head blind A/B preference tests. - Arthur Bench: Last but not least, Bench is our open-source evaluation product for comparing LLMs, prompts, and hyperparameters for generative text models. It enables businesses to evaluate how different LLMs will perform in real-world scenarios so they can make informed, data-driven decisions when integrating the latest AI technologies into their operations.

1 Comment

Like Comment Share
Arthur

6,427 followers
1d
Report this post
2024 is the year of multimodal AI. 💬 🖼️ 🎥 🎤 AI systems are unlocking new applications and seeing improved performance by combining data types like text, image, video, audio. In our latest blog post, learn about multimodal AI techniques, business use cases, and why it’s poised to revolutionize the way we interact with technology: https://bit.ly/4bXKy0f
Like Comment Share
Arthur

6,427 followers
2d
Report this post
Which of the industry’s top LLMs are best at answering questions using provided context (or “staying grounded”)? Check out the latest iteration of our Generative Assessment Project where our team compared LLMs from providers like OpenAI, Anthropic, Meta, and more: https://bit.ly/3V3O4Pl
Like Comment Share
Arthur

6,427 followers
3d Edited
Report this post
Attending AI for Finance by Artefact this Wednesday in NYC? Check out our CEO Adam Wenchel’s talk about high-performance LLM deployment! More information: https://lnkd.in/d7nhWbsf
Like Comment Share
Arthur

6,427 followers
1w
Report this post
Let’s talk LLM experimentation. 🧑🔬 One day, there may be a principled, scientific, and repeatable way to pick the right LLM and the right tools for any job. But until we have that, a level of flexibility and ad-hoc artistry is necessary to decide which patchwork of features is best suited to serve an application’s needs. So, in order to continue experimenting and ensure you’re getting the most value out of LLMs, it’s important to stay up to date on the latest tools and techniques to do so. In this comprehensive guide, we highlighted a number of projects in three categories: 🤳 Touchpoints: Quick, minimal LLM experimentation interfaces ⚖️ Evaluation: Metrics and relevant benchmark datasets 🪄 Enhancing Prompts: RAG, APIs, and well-chosen examples for your LLM to see how it’s done Check it out: https://bit.ly/4e5PPEr
Like Comment Share
Arthur

6,427 followers
1w
Report this post
Our CEO Adam Wenchel caught up with the folks at Bloomberg Technology yesterday to talk about the latest in enterprise adoption of generative AI. He also discussed Arthur’s recent Generative Assessment Project study where we evaluated the industry’s top LLMs (from providers like OpenAI, Anthropic, Meta, and more) at answering questions and remaining grounded to context. 👉 Check out the full study here: https://bit.ly/3V3O4Pl

Adam Wenchel, Arthur CEO | Bloomberg Technology

2 Comments

Like Comment Share
Arthur

6,427 followers
2w
Report this post
Earlier this month, our co-founder and CEO Adam Wenchel was featured on NVIDIA AI’s podcast for an insightful discussion about: 🗂️ Enterprise use cases of generative AI 🔒 Challenges like hallucinations, prompt injections, and toxicity 🛡️ Arthur Shield, the first LLM firewall 📈 And more! Check it out: https://bit.ly/456jFV5
1 Comment

Like Comment Share
Arthur

6,427 followers
3w
Report this post
Large language models, small language models, closed-source models, open-source models—how do you know which of these to use and whether they’ve contributed to positive ROI? Register for our webinar next Thursday to learn how to more easily run language models, compare them, evaluate them, and understand their performance: https://bit.ly/3QDGjhF
Like Comment Share
Arthur

6,427 followers
3w
Report this post
Are you signed up for our webinar on May 30th? Max Cembalest, ML Engineer at Arthur, will be discussing the changes we’ve seen across our enterprise customer base with regards to LLM app development, and what developing a new LLM application looks like in 2024. 👉 Register to attend live or to receive the recording afterwards: https://bit.ly/3QDGjhF
Like Comment Share
Arthur

6,427 followers
1mo
Report this post
Yesterday, OpenAI debuted its new flagship model, GPT-4o (“o” for “omni”). GPT-4o is natively multimodal, it’s faster and less expensive than GPT-4 Turbo, and it boasts improved capabilities in text, video, and audio. Learn more and watch it in action: https://lnkd.in/gm-eFHU4
Like Comment Share

Browse jobs

Funding

Arthur 3 total rounds

Last Round

Series B Jul 30, 2022

US$ 42.0M

Investors

Greycroft Acrew Capital + 4 Other investors

See more info on crunchbase

Arthur

Software Development

New York, New York 6,427 followers

The AI Performance Company

About us

Locations

Employees at Arthur

Noriaki (Nori) Tatsumi

Head of Platform Engineering at Arthur

Greg Munves

Max James

Vice President of Sales & Partnerships @ Arthur

Roshan Subudhi

DevOps dude at Arthur

Updates

Adam Wenchel, Arthur CEO | Bloomberg Technology

Join now to see what you are missing

Similar pages

Fiddler AI

Credo AI

Arize AI

WhyLabs

Hebbia AI

Snorkel AI

Anthropic

Hugging Face

Holistic AI

Cohere

Browse jobs

Director jobs

Engineer jobs

Manager jobs

Developer jobs

President jobs

Head jobs

Account Executive jobs

Machine Learning Engineer jobs

Digital Director jobs

Account Manager jobs

Senior jobs

Head of Risk jobs

Software Engineer jobs

Technical Lead jobs

Python Developer jobs

Credit Controller jobs

Assistant Vice President jobs

Recruiter jobs

Customer Associate jobs

Vice President Operations jobs

Funding