In distributed systems a process often needs to wait for more than one of its peers before completing some action. For example, we may send a request to 5 of our peers and require that we hear back from 3 of them before considering the action completed. Even when the distribution of round trip times for a single request is pretty simple, it’s not entirely clear what the distribution of the aggregate request will be. (To me, at least; maybe if you’re smart or know statistics well it’s simpler.) In any case, we can use a histogram to get an idea of the shape of the density function.
Order statistics describe this situation exactly. We have a random sample \(X_1, X_2,\dots, X_n\) drawn from some distribution and we want to find the \(k\)-th smallest element. In the situation above the process has to wait for 3 out of 5 responses, and the time it waits is the time it takes for the 3rd response to arrive. This density function is shown (approximately, using a histogram) in the picture above.
What do the PDFs of these distributions look like for various underlying distributions and parameters? I’m sure it’s possible to find an analytical solution for some underlying distributions but I’m lazy and would like an approximation for any given underlying distribution. So I threw together some charts with D3.
The notation \(U(n, m)\) means the \(m\)-th order statistic from a sample of size \(n\) drawn from a uniform distribution. It’s pretty clear that \(U(2, 2)\) has a similar distribution to \(U(2, 1)\) (reflected about the line \(x=1/2\)), \(U(3, 3)\) is similar to \(U(3, 1)\), and so on, so I’ve left out distributions with a symmetric partner.
The distributions that arise in this way are very interesting! Something as simple as taking the 2nd smallest number of a set of random numbers can create a beautiful PDF. I’ve only shown plots with a uniform underlying distribution, but I suspect there are cool phenomena that happen with other distributions.
The code is embedded in this page. Please play with it! Change the underlying distribution, make prettier charts by increasing the number of bins and random samples, change the colors. There’s lots to do here if you’re looking for a project.
Approx. 696 words, plus code