Machine Learning in an Auction Environment

Innovation and Economic Growth, Search and Advertising, Internet and Networks, the Internet, and Cloud Computing

Article Snapshot


Patrick Hummel and R. Preston McAfee


Journal of Machine Learning Research, Vol. 17, pp. 1-37, 2016


In online auctions to determine which ads will be displayed, ads that have received many clicks compete against new ads of unknown appeal. This paper describes the best strategy for the auctioneer to use in ranking competing ads, when the appeal of some ads is unknown.

Policy Relevance

The best strategy for auctioneers is to consider the value of learning about a particular ad as well as the expected click-through rate.

Main Points

  • Many economists have explored strategies for selecting among several options when the payoff of each is unknown, known as a “multi-armed bandit” problem; the value of experimental learning about each option can increase the gains from choosing that option.
  • In online auctions for placing ads, bidders bid the amount they are willing to pay per click, and the auctioneer ranks ads by multiplying the bid by the ad’s click-through rate to derive its expected cost-per-1000-impressions (eCPM).
  • This paper describes the optimal ranking method for choosing ads in online auctions, when an ad with uncertain eCPM competes with many other ads; online ad auctions differ from standard multi-armed bandit problems.
    • The competing ads vary tremendously in quality at random.
    • The auctioneer values a dollar in future less than one received in the present.
  • Using the optimal ranking method, the auctioneer should rank the value of the ads on the basis of the expected sum of their eCPM, plus a term representing the value of learning about each ad; however, the value of learning in online auctions is much smaller than the value of learning in most other multi-armed bandit problems.
  • Choosing the optimal ranking method results in a tiny change in the probability that an ad will be shown.
    • The additional value of learning for ads that have received many clicks in the past is tiny.
    • The value of learning for every ad that is not expected to receive many clicks is tiny.
  • “Dynamic programming, the standard approach to programming ad auctions, involves calculating how each choice affects future decisions; a simpler method known as “knowledge gradients” considers only one step ahead, but achieves roughly the same result.
  • Compared to the purely greedy approach of ranking ads solely based on their eCPMs, the benefits of the optimal ranking method are tiny.
    • Typical click-through rates are known for both search and display ads.
    • Error rates in estimating click-through rates are usually less than 30% even for new ads.
    • Using realistic assumptions, the optimal ranking method yields only tiny gains.
  • Using the optimal ranking method, auctioneers would gain little in efficiency, but might make some gains in revenue.


Get The Article

Find the full article online

Search for Full Article