Median or Average


One of the issues I struggled with on MTG Card Prices was how to price a card, based on historical sales.

My initial solution was to take the average price of the prices that had occurred in the last 4 weeks, rolling. Prices for individual cards always fluctuated a little, so smoothing out bumps worked well. Also, it was easy (MySQL has an avg() function).

While it worked for most of the cards, it would occasionally lead to strange results. An example of this is the Beta Sol Ring, which had sales like this:

The $1,725.98 price is a huge outlier, and while the auction itself included other beta cards, the auction title only mentioned the Sol Ring, so of course the matcher matched it to a Beta Sol Ring

So, with the above auctions, using average, the price is $253.59, which no one will pay for.

Knowing that any matching system will incorrectly match an auction like this, the question becomes, "How can I get a reasonable price given the occasional outlier?".

Fortunately, there is a solution to this problem, and I came across it recently while reading Data Analysis with Open Source Tools. Philip mentions that when you have a set that in not evenly distributed, you should use median, not average.

Using our set of prices above, the median comes to $73.77, which is close to what someone would actually pay for a Beta Sol Ring.

This is a much better solution, and it's a shame that MySQL doesn't have a built-in median function. I ended up using a substring_index solution that seemed like the least painful, and didn't require any self-joins.

So, the rule of thumb is: "If you know your data is evenly distributed, use average, otherwise you probably want median".