July 30, 2012

Building the open source distributed cloud

I’ve been thinking a lot about cloud computing and storage lately. Moving to the cloud scares me. But at the same time, the convenience of having my files everywhere and always in sync is too good to pass up. Although I’ve been using a multitude of different cloud services, I’m still searching for that perfect solution where I still have control of my data.

I recently received my Raspberry Pi in the mail, which I was really excited for. I stayed up hours on end the night of preorder availability, bashing my F5 key in hopes of getting a hold of one. Yes, I eventually got my preorder in, but not after pulling close to an all-nighter and feeling like a zombie the next day. Worth it.

Of course one of the first things a lot of people do (including myself) is load up some quake 3 or connect the Pi to a TV and play around with it as a media center. Cool stuff for sure, but I kept thinking there has to be more I can get out of this thing. Just the fact that you have a full *nix based system that can fit in the palm of your hand for $25 could (and has proven to) be earth shattering.

Now of course most geeks and sysadmins will tell you they’ve been running their own servers in their homes for ages. They’re awesome for mass file storage, media transcoding/streaming, and backups. Of course there are also the downsides. You’ve dedicated a whole PC to a single task, that PC is sucking down power 24 hours a day, and you need a good place to store it. Now that devices such as the Raspberry Pi exist, everything has changed. We now have a low power, silent, and small PC that can easily handle the task of acting as a home server. Hmm…maybe we can use this to our advantage to solve the cloud issue.

So what am I proposing? An open source distributed cloud. A cloud that everyone contributes storage space to, that everyone can store data on, and that keeps data private unless explicitly shared. Sounds like a tall order, huh? I totally agree, but I think it may be feasible.

A few projects have already attempted to tackle this problem. One of the most prevalent is/was the OFF System. Although originally designed to combat copyright law, the same concept could be used for building a shared cloud. OFF works by breaking out files into small chunks, encrypting them, and storing these chunks between multiple nodes. This “encryption” essentially builds chunks from other chunks, allowing chunks to be shared between different files. No one owns these chunks but a retrieval URL acts as the “recipe” for combining the seemingly random blocks back into the original file.

Unfortunately development has come to an end and the network is severely lacking the number of nodes required to make this much more than an experiment. Because the intent of the network was essentially to skirt around copyright, this may have doomed the network from the start. Repurposed for something like a distributed storage cloud, OFF really could have taken off.

So how does the Raspberry Pi fit in? They’re cheap, they’re low power, and most owners already using them as servers/media centers keep them on 24 hours a day. Perfect.

There is still an issue, though. We aren’t quite there with storage yet. With the Raspberry Pi, everything is currently based around SD cards. Sure, you can connect USB drives but now we’ve just made that perfect server a bulkier package. In the next few years we should see PCs like the Raspberry Pi ship with an SSD, allowing for mass data storage out of the box. This is going to be important for this open source cloud idea to succeed.

If we could build something like this, we truly would have a massive open cloud storage system. Backups would be a thing of the past. All data would be replicated across several nodes and you could even control how redundant your data is. I’m not sure about the technical implementation, but there would need to be some sort of limits on the system. Something along the lines of only allowing users to store as much data as the storage space they have contributed.

Another problem this could potentially solve – bandwidth. When you host all of your data from your single Cable/DSL connection, it can take forever to grab files or stream some media from outside your network. With a distributed storage system like this, you’d be able to pull your media from multiple sources at a time. Technically, I’m not sure how well this would scale. Perhaps it wouldn’t scale at all. I definitely think it’s worth looking into though.

Imagine if the official Raspberry Pi image eventually shipped with something like this. By default each Pi that is brought online could contribute to the network and continue to grow the storage space available. Just as the web has come to embrace open standards, we need open data. Storing data locally has become a chore. Problems like retrieving data, out of sync files, and running backups should be a thing of the past.

I’d love to keep this an open dialogue. Leave a comment and let me know your thoughts! If you’re a dev, would you be interested in working on a project like this?

© Eric Barch 2023