Improving Search with rigorous testing
Our goal is always to provide you with the most useful and relevant information. Any changes that we make to Search are always to improve the usefulness of results that you see. That's why we never accept payment from anyone to be included in Search results.
A woman with a clipboard inspecting a website
See how real people make Google Search better
See how real people make Google Search better
Watch the video
2:03
Testing for usefulness

Search has changed over the years to meet the evolving needs and expectations of the people who use Google. From innovations like the Knowledge Graph, to updates to our systems that ensure that we’re continuing to highlight relevant content, our goal is always to improve the usefulness of your results.

We put all possible changes to Search through a rigorous evaluation process to analyse metrics and decide whether to implement a proposed change.

Data from these evaluations and experiments go through a thorough review by experienced engineers and search analysts, as well as other legal and privacy experts, who then determine if the change is approved to launch. In 2022, we ran over 800,000 experiments that resulted in more than 4,000 improvements to Search.

We evaluate in multiple ways. In 2022, we ran:
4,725 launches

Every single proposed change to Search goes through a review by our most experienced engineers and data scientists, who carefully review the data from all the different experiments to decide if the change is approved to launch. Of the proposed changes this past year, many never went live, because unless we can show that a change actually makes things better for people, we don’t launch it.

Three sets of Search results being approved or unapproved.

13,280 live traffic experiments

We conduct live traffic experiments to see how real people interact with a feature, before launching it to everyone. We enable the feature in question to just a small percentage of people, usually starting at 0.1%. We then compare the experiment group to a control group that did not have the feature enabled. We look at a very long list of metrics, such as what people click on, how many queries were done, whether queries were abandoned, how long it took for people to click on a result and so on. We use these results to measure whether engagement with the new feature is positive, to ensure that the changes that we make are increasing the relevance and usefulness of our results for everyone.

Websites with line graphs

894,660 search quality tests

We work with external Search Quality Raters to measure the quality of search results on an ongoing basis. Raters assess how well content fulfills a search request, and evaluate the quality of results based on the expertise, authoritativeness, and trustworthiness of the content. These ratings do not directly impact ranking, but they do help us benchmark the quality of our results and make sure these meet a high bar all around the world.

Raters assess how well a website gives people who click on it what they are looking for, and evaluate the quality of results based on the expertise, authoritativeness and trustworthiness of the content.

To ensure a consistent approach, we publish Search quality rater guidelines to give these raters guidance and examples for appropriate ratings. While evaluating the quality of results might sound simple, there are many tricky cases to think through, so this feedback is critical to ensuring that we maintain high-quality results for users.

A website with a checklist

148,038 side-by-side experiments

Search isn’t static. We’re constantly improving our systems to return better results and search quality raters play an important role in the launch process. In a side-by-side experiment, we show raters two different sets of Search results: one with the proposed change already implemented and one without. We ask them which results they prefer and why.

Two websites with voting boxes next to them