Wednesday, July 29, 2009

How many errors does this book contain?


That's a question every author anxiously asks him/her/itself, as soon as the manuscript goes to press. Well, here is what happened regarding the Little Purple Book. By the way, the current list of errata appears on the Book's Page.

As errors and typos were being communicated by readers and carefully recorded, it appeared that it took till the thirtieth (30th) notification before a typo was reported twice. What does this tell us about the total number N of typos in the book?

Naturally, N is unknown. Assume that errors are reported uniformly at random. You'll recognize now a case of the well-known "birthday surprise" (also known as "birthday paradox"). Namely, on Earth (where N=365), it takes about 23 persons (on average, or with probability larger than 1/2) to obtain a sample where two people will have the same birthday. Now, our problem with estimating the number of errors in the book is of the same nature, but with a year that would have N days, instead of 365. The answer is as follows: the number of persons it takes to first obtain a colision of birthdates is on average approximately sqrt{pi N/2}. A discussion can be found on pages 114-115 of AC; this problem is a nice illustration of the power of exponential generating functions.

Let us say we observe that the first collision occurs on the rth trial. Then, we equate sqrt{\pi N/2} with r, and cleverly "deduce"(!) that N should be approximately 2r^2/pi. Fair enough, eh? With r=30, this predicts a number of errors of the order of 600 (the exact solution is 572.95...).

If we actually simulate the experiment, fixing the number N of errors, we find that this formula does predict the rough order of magnitude of N, but with a systematic multiplicative bias. What is going wrong here?

The problem is that the expectation of a square and the square of an expectation are not the same thing. Here, analysis teaches us that the expectation of the square of the time it takes to reach the first collision is asymptotic to 2N. Thus the "right" (i.e., asymptotically unbiased) estimator is in fact r^2/2. See AC, Note VI.25, p. 417, for a discussion. This leads us to infer that we should expect the actual number of errors in the Little Purple Book to be closer to 450, rather than the infamous 600. Soothing, isn't it?

Of course, there are limitations. First, the statistical procedure is not very accurate: its accuracy can be gauged by estimating the expectation of the fourth power of r, in itself a less trivial math exercise. Worse, we expect the figures, 450 and even 600, to be gross understimates since errors are not picked up uniformly at random --the ones closer to the beginning are more likely to be found than the ones in arcane corners of the book. So, a mistake per page sounds more like a reasonable possibility, and the book has 824 pages in total... Depressing!

Where does the idea of such an estimator come from? I first heard it more than twenty years ago from Gilles Brassard, while he was preparing his book with P. Bratley on algorithms. In order to determine how many balls an urn has, repeatedly pick up a ball at random (shake the urn!), mark the ball just picked up (say, with a touch of paint), and repeat until an already marked ball surfaces. Return half the square of the number of steps. Et voila!


To be honest, till to-day, I couldn't think of a real case of application of Brassard's algorithm and I was viewing it mostly as a thought experiment, a gedanken experiment, as a physicist would say. Perhaps we have here the first actual use of Brassard's algorithm. At any rate, his sampling-based algorithm has pedagogical value: it demonstrates that you can really achieve something with sublinear resources. Refer to the many probabilisic stream algorithms that are now known, the current research on property testing, and so on.

Wednesday, June 24, 2009

Hi! After nearly a year of final adjustments, copy-editing, and production work at the Press, the book Analytic Combinatorics eventually became available in January 2009. Cambridge University Press has done a great job and our editor, David Tranah, was a great help, throughout the whole process.

I'd like to remind everybody that, thanks to a special arrangement with CUP, we, authors, retain the non-exclusive copyright of the electronic edition. Thus, the manuscript continues to be freely available to individual users from the authors' home pages.

I plan to maintain a page with some goodies at the book's URL, from which the entire text (about 13MB in PDF) can be downloaded.

In this blog, I hope to report on new results relative to analytic combinatorics, possibly discuss the choice of topics in the book and/or stories. Comments are also most welcome regarding the contents of the book and possible mistakes or oversights--we hope to have a new edition out some time in the future. It'd be nice also, if people who teach from the book were kind enough to drop a line here, possibly with a pointer to their home/course page.