Book Reviews

Programming Collective Intelligence: Building Smart Web 2.0 Applications by Toby Segaran

ISBN-13: 978-0596529321
Publisher: O'Reilly Media
Pages: 362

I bought Programming Collective Intelligence years ago. My idea at that time was to use it as a basis for practical applications as I learned Python. I picked a topic that looked interesting, say genetic programming, and experimented with the example code to test my Python fluency. In the process I picked up the basics of the most common families of machine learning algorithms. Since then I've returned over and over again to Programming Collective Intelligence. But now I use the book as a refresher or introduction to different machine learning domains.

This book was released at the height of the Web 2.0 craze back in 2007. Social media sites built their businesses around user content; You, as a user, wasn't really a customer but more like the product itself. The data you generated in your interactions with the site was mined and used to both improve your experience (e.g. by suggesting relevant content) and, of course, to generate revenue by targeting ads or recommendations. And it's right here where Programming Collective Intelligence takes off.

The book starts by a gentle introduction to recommendation engines. In a few pages you learn to build a collaborative filtering algorithm that searches potentially large datasets for users that are similar to you. We get to learn about simple similarity scores like Euclidean distance and how we can use them to create product recommendations. From there the book continues to cover K-Means clustering, Bayesian classification, a neural network to rank search results, support-vector machines (SVM) for sophisticated classifications and much more.

The book does a great job at introducing common machine learning techniques and I keep using it as a refresher when I start to experiment with new applications. The book takes a pragmatic approach, which means that each topic is introduced by a mixture of theory and a Python implementation. This makes it easy to follow along and iteratively build an understanding of the more complex algorithms. Sure, the Python code is a bit dated by now, but it's still as close to executable pseudo-code as it gets. In addition, Python also comes with lots of useful libraries. That practical orientation combined with its readability makes Python an excellent choice for the topic. That said, you do need to have a basic understanding of Python's object model and dynamic nature to make sense of the book.

The book is more about breadth than depth. I found that a strength; Programming Collective Intelligence gives me enough background on each topic to let me make sense of more advanced material in other books. The focus is on algorithms and little on the computation itself. That means you shouldn't expect the sample implementations to scale well on large datasets or under heavy user load. Again, it's a good trade-off since it allows us to focus on the more timeless math dimension rather than volatile technology platforms. After all these years I'm yet to find a better introduction to machine learning than Programming Collective Intelligence. Highly recommended.

Reviewed August 2015