Software! Math! Data! The blog of R. Sean Bowman
The blog of R. Sean Bowman
March 04 2016

The C++ standard says that vector::push_back should run in amortized constant time. With a simple implementation where the vector is stored as an array which doubles in size when needed and each element is copied on every doubling, does this satisfy the time complexity requirement?

I was talking with a colleague of mine about this problem. We managed to confuse ourselves mightily (which could have had something to do with, ahem, libations we were enjoying at the time). Here is what I figured out later after a bit of thought.

The question, phrased a bit more carefully

What is the cost of a vector append, assuming that the vector starts with size zero, doubles in size whenever it needs to grow, and memory is copied each time the vector grows? Let’s further assume that we append \(n\) elements to the vector where \(n=2^m\) is a power of two. (This assumption, of course, does not affect the end outcome, but simplifies the analysis.)

We need to know how many copies are made and how many elements are copied each time. The first time we push an element, one element is copied, and the vector has size one. The second time, two elements are copied, and the resulting vector has size two. We push another, the two elements are copied to a vector of size four, and the element we just pushed appended. Now we can push one more before another push requires a copy; we copy four elements to a vector of size eight and append. So on and so on… the total number of copies is

\[ 1 + 2 + 4 + \dots + n/2, \]

where there are \(\log n = m\) terms in the sum. From arithmetic we know that

\[ 1 + 2 + 4 + \dots + 2^{m-1} = 2^m-1,\]

and so the total number of elements copied is

\[ \begin{align*} 1 + 2 + 4 +\dots + n/2 &= 2^m-1\\ &= n-1. \end{align*}\]

Since we performed \(n-1\) copies in pushing \(n\) elements, the amortized cost per push is constant.

Another colleague pointed out that in this scenario we only resize when we absolutely must; what happens if we resize slightly before? In this case, the total number of copies is

\[ \begin{align*} 1 + 2 + 4 + \dots + n/2 + n &= 2^{m+1}-1\\ & = 2\cdot 2^m -1\\ & = 2n-3, \end{align*} \]

still constant amortized cost.

Reflection

Amortized analysis, as with most things algorithmic, is covered in the excellent and slightly encyclopedic CLRS. I must admit that, although I learned a lot from that book, the first time I really understood it was in reading the excellent, awesome, seminal book Purely Functional Data Structures.

Amortized analysis is usually associated with complicated data structures that might otherwise seem to have poor performance characteristics. However, the example above is interesting because it’s easy enough to simply do the calculation without resorting to more complicated methods (the accounting method or the potential method come to mind as the most common; I’m sure there are others). In that sense, this is an "easy" view of what amortized analysis can buy us.

So, the verdict: The C++ standard says that vector::push_back should run in amortized constant time, and with a reasonable (and fairly simple) implementation, it does! Cool!

Approx. 552 words, plus code