On May 13th, 2002 a new filesharing client called eMule entered into our world of sharing. Ten years later we’d like to take this anniversary as an opportunity to look back at some major technical achievements of filesharing applications since then and what might come in the years ahead. With further innovation, even the mighty BitTorrent can be improved to become impossible to shut down.
The first mainstream filesharing applications like Napster (started in the year 1999) operated completely centralized.
Napster relied on a single server to store the files every user shared, provided a central file search, and even initiated file transfers between users. Due to this single point of failure, Napster collapsed once the server was shut down by RIAA.
Fortunately, the next generation of less centralized filesharing networks was already on the horizon. On the one hand there were completely decentralized networks like Gnutella. They used query flooding to find other clients, i.e. they just sent their requests from client to client until either enough results were found or the search timed out.
Yet this advantage of a completely server-independent network topology came with the disadvantage of the network not being scalable. Simply put, you can’t search the whole network efficiently.
On the other hand there was eDonkey2000 with its server-based network (first release on September 6th, 2000). Unlike with Napster, everyone could run a server. While the existence of multiple servers meant that the network couldn’t be shut down by closing a single central point, it also had the disadvantage that users could now only search for and share files with users on the same server.
This system had similarities with BitTorrent, at a time where the tracker was the sole mechanism through which to find other peers. However, with BitTorrent (started in the year 2001) this dependence on the tracker was intended because it meant that the tracker can control who is allowed to join the swarm, how many peers each client gets, etc…
The eDonkey2000 Network had a different design goal – a fully decentralized and yet scalable network. In this spirit eDonkey2000 started a new project called ‘Flock’ in May 2002. After beta testing it was renamed ‘Overnet’ and finally merged with the original eDonkey2000 client in August 2004.
In 2002 a new and rapidly growing client entered the ed2k network, a term which refers only to the server-based part of the eDonkey2000 network. An open source client for the ed2k network – our birthday-client eMule – was founded on May 13th, 2002 – 10 years ago today.
In June 2004, ed2k had about 2 million users while eDonkey2000’s Overnet network only had about 800,000 users. So eMule was the leading client in the ed2k network and together with BitTorrent it dominated the following years of filesharing.
Both networks, BitTorrent and eMule, slowly headed towards a more decentralized structure. In order to make files from all servers available to every user, eMule added keyword search via UDP to query all servers and source exchange between clients via TCP to get all available sources for a specific file. BitTorrent adopted the latter in peer exchange.
Early 2004 eMule implemented Kademlia, a decentralized DHT-like key-value store capable of finding sources as well as performing keyword search, thus making ed2k servers completely obsolete. Once again, BitTorrent headed in the same direction, implementing DHT in 2005.
DHT marks a revolutionary step in filesharing. Not just because you can download a file with only its hash (and a few nodes to bootstrap the network), but now a decentralized scalable network becomes available. While decentralized networks like Gnutella were capable of finding information using query flooding in O(n), DHT finds information in O(log2(n)). So if the size of the network doubles, only 1 additional request is needed on average – regardless of the actual size of the network.
The following example illustrates this advantage: Say you have a network with 2 million users and you want to find information about a specific file which unfortunately doesn’t exist in the network (i.e. no user shares this file). Using query flooding every client in the network has to be asked before we can be sure that the file isn’t available. Usually the search just runs into a time-out before, assuming (but not knowing) that the file isn’t available.
Thanks to DHT you only have to ask about 21 nodes (log2(2 mio)) before being sure that the file isn’t available in the entire network. Even better, this was the mathematically worst case scenario. Usually the actual number of required requests is much lower because on your search path you’ll likely reach the node closest to your requested file after only 3-4 requests (empirical evidence on eMule’s current Kademlia).
The next feature we think torrents should adopt is a real DHT-based keyword search. Tribler already made a step in that direction. However, their torrents are being broadcasted to other known clients which results in a search with bad scalability.
We already know that after switching to magnet links only, The Pirate Bay has a total size of about 90 MB. Now think of those 90 MB being stored decentralized. A network with millions of nodes in which each node stores a few hundred Kilobytes means you have thousands of replicas of each torrent entry.
This ensures each entry can be found, even if many nodes leave the network simultaneously. Unfortunately, all previous decentralized search implementations had huge amounts of spam in their search results. This is where we can learn from the torrent community. Sites like The Pirate Bay provide trusted search results.
In a completely decentralized search without any spam they would simply continue to provide this functionality using public-key cryptography to sign torrents. A user relying on his favorite torrent site’s search results would simply add its public key to his torrent client, thus allowing the client to check the signature of each torrent search result and filtering all fakes.
In this completely decentralized future a torrent site such as The Pirate Bay would simply be a laptop with average computing power connecting itself to the internet once every few hours to sign new torrents with its private key. Think about how hard it is just to trace such a “torrent site”. Shutting it down is practically impossible.
We are currently working on a client which will offer the above mentioned torrent search. It is currently in a closed alpha testing phase and will soon enter public beta tests.
via TorrentFreak