Ever searched for a message from a Facebook contact? Chances are that your search was carried out against a huge cluster of distributed indexes, stored on more than 600 servers in a way that resembles the architecture of P2P networks. Facebook is using its own distributed storage system called Cassandra for these searches, and Cassandra's co-developer Avinash Lakshman shared a few observations about the system on Facebook this week. From his article:

"Reliability at massive scale is a very big challenge. Outages in the service can have significant negative impact. Hence Cassandra aims to run on top of an infrastructure of hundreds of nodes (possibly spread across different datacenters)."

Cassandra is an open source project, with the source being available on Google Code. To be fair, Facebook isn't the only one utilizing distributed architectures for its data storage. Google has been running similar systems internally for a long time, and Yahoo has developed an open source framework for distributed computing and data storage called Hadoop that's quickly gaining influence.

Tags: , , ,