We Built a Search Engine – Presenting OutOfMemoryError

Cast: Vamsee Krishna, Nirtyanath Jaganathan, Leo Thumma, and Sam Stern

Our final project for CIS 455 (Internet and Web Systems) was to build a search engine.  Not a search algorithm, not a frontend for a search engine, but an entire search engine from start to finish.  That included a crawler, and indexer, a PageRank-er, and a web frontend.  All of these components had to be distributed over a 10-machine, 50-core Amazon EC2 computing network.  We also threw in some extra features like intelligent spellchecking, DuckDuckGo contextual integration, and EBay results to make it more fun.

Most of the distribution organization was performed using FreePastry to distribute work based on a partition of the URL namespace.

The name of our product?  OutOfMemoryError.  The product was named for the error that plagued our development from day one.  Eventually, to eradicate the error, we removed every single data structure from memory and replaced it with our own combination of transactional Berkeley databases, S3 storage, and other persistent data stores.   The result was an intensely robust system that could recover seamlessly from any fault (you could even pull the plug on the computer and lose nothing).

This is the coolest thing I ever coded, below are some pictures of the final product.  I’ll also include a link to our documentation paper where you can get a better understanding of the program architecture.

Continue reading