Scale Effects in Web Search

Privacy and Security and Innovation and Economic Growth

Article Snapshot


Di He, Aadharsh Kannan, Tie-Yan Liu, R. Preston McAfee, Tao Qin and Justin Rao


Proceedings of the 10th International Conference on Web and Internet Economics (WINE), December, 2017


This study considers how learning affects competition between search engines. As learning proceeds, it tends to slow down. Could a new search engine with a better algorithm overcome a large search engine with more data?

Policy Relevance

A new search engine with a better algorithm could overtake its rival. However, a long history of data collection gives an older, larger search engine a big advantage.

Main Points

  • Most statistical learning systems such as search engines improve as they accumulate more data; but, although search engines collect more data over time, they face an increasingly difficult task as more content is posted online.
  • Could a new search engine with a superior algorithm overtake a larger competitor, or is learning from a long history of data searches too important?
    • Learning is faster at low levels of data, as every new data point matters more.
    • A new search engine might handle common queries well, but fumbles on rare queries.
  • This study uses browsing logs that show the rate at which users click on search results to discover whether search engines are learning more effectively, or less effectively, as the scale of web searches increases.
    • Search queries improve significantly as data accumulates.
    • Learning slows down after about 1000 data points are gathered.
  • Data from searches that once were rare, but became common over time, shows that search engines improve as more data flows in; search engines are “data starved,” benefitting from more data on about half of queries for specific pages, and 15% of the searches.
  • The effectiveness of learning for single searches is hard to measure, because the search engines also learn from related searches, a “knowledge spillover.”
    • Constructing a graph of related searches allows researchers to adjust for this effect.
    • 10% of queries for a specific web site have less than 1000 relevant observations.
    • 18% of queries have less than 10,000 relevant observations.
  • Search engines must match billions of web queries to billions of pages of data; eventually, this task will prove so overwhelming that more data will be less and less helpful; In North America, the learning gains from additional data are shrinking, but are still significant.
  • A smaller search engine could overcome the advantages of a larger search engine by developing a better search algorithm; however, scale does give larger engines an advantage.

Get The Article

Find the full article online

Search for Full Article