by Girish Venkatachalam
Girish Venkatachalam is a UNIX hacker with more than a decade of
networking and crypto programming experience.
His hobbies include yoga,cycling, cooking and he runs his own
business. Details here:
The traditional UNIX client server model is passe. The in thing now
is peer to peer communications. With instant messaging,voice over IP
and other person to person communications gaining popularity this is
a natural consequence of it.
Today's desktop computers come with powerful processors, ample disk
space, RAM and bandwidth availability is increasing worldwide; so they
are equal in capabilities if not better than server grade hardware
of yore. This facilitates desktop computers to double up as servers
and clients on a p2p network. Something unimaginable a few decades ago.
How different really is peer to peer networking compared to the
proven age old UNIX model of a daemon process running as server
forking processes to handle multiple simultaneous clients?
It is the same thing with a twist, that is all. Moreover peer to peer
networks are not totally decentralised and they also have certain
powerful nodes acting as mediators in every p2p architecture. So it
builds on the existing time tested client server model making it more
robust and feature rich. It is not a departure.
p2p as disruptive technology
Internet has led in recent times to help people solve several teething
problems in an unbelievably elegant and efficient manner. Mostly people
think of Internet as the web or browsing or chatting or email but in
reality it is much more than that. The rich end user applications
are built on a solid foundation of robust underlying protocols that
adapt to changes seamlessly.
It is indeed the software protocols and the hardware interconnectivity
that makes the Internet do all the wonders we see today. The IP packet
switching mechanisms have proven to be a tremendously capable and
scalable architecture. But that alone will not do. We need a protocol
as sophisticated and complex as TCP on top to ensure reliability and
guard against network congestion.
p2p has only started showing itself in recent years and it is another
layer of robustness and fault tolerance built over TCP and IP. p2p
is not only about TCP however. There is VoIP which is delay sensitive
rather than loss sensitive and hence VoIP payloads are always carried
Decentralisation is a win win situation for all. It serves to reduce
the load on the server and most importantly amortizes rising load
amongst multiple clients interested in connecting to the server by
acting as servers to each other.
There is perhaps no protocol that gets everything right like
bittorrent. And we shall be talking about it in graphic detail.
Why is p2p the wave of the future?
Peer to peer technology is going to take over client server model
the same way data took over traditional voice networks. Just like
voice is going to be carried over data networks, we are going to have
client server traffic getting carried over peer to peer architectures
in the coming years.
The full power of Internet is waiting to be discovered. The consumer
electronics and entertainment industry are yet to fully wake up to
the possibilities afforded by the completely decentralised, robust
and above all economically competitive Internet. Till now direct to
home satellite dishes have not taken off. It is already picking up
as the premier delivery mechanism for high definition television or
digital television. It also incidentally can provide extremely high
bandwidth Internet access. But the latency of electromagnetic waves
travelling all the way to the geosynchronous satellite and back(36000
kilometres x 2) will tell upon the user experience.
Once satellite communications become prevalent then there will be
a bouquet of services delivered over Internet through satellite. It
is not hard to see that IPTV and interactive television will be one
Let us get back to p2p now after that short digression. Lower layer
infrastructures facilitate higher level applications which in turn fuel
economic and business models built around them. Talking of economics,
there is a deep relationship between p2p and economics.
It is really interesting to see that an economic and social concept
is technically relevant. bittorrent is designed around the pareto
efficiency concept pioneered by the Italian economist Vilfredo Pareto.
p2p networks break the traditional monopoly of one provider and
multiple consumers. The consumers are at the mercy of the provider
and he is free to charge money or restrict distribution in different
ways. p2p cannot work in such a social model. There is a certain
element of lack of accountability built automatically into the p2p
This has been a curse rather than a boon for media and copyrighted
material. Perhaps future technology can solve some of them but p2p
does certainly influence the Internet economy more than we can imagine.
Introduction to bittorrent
Bittorrent is the brainchild of Bram Cohen to solve the decades old
problem of file distribution. Servers had been crumbling under the
pressure of high popularity websites like slashdot and experiencing
mysterious failures or crashes during releases or at times of heavy
loads. Companies had been shelling out several thousand dollars
on upgrading the server hardware and bandwidth to cater to ever
Bittorrent is an elegant protocol that gave a lot of relief to
companies dishing out large files on the Internet. It is interesting
to note that large video files are usually the business of the p0rn
industry and that those folks have done the most research in this
technology. And ISPs and several other people are very worried about
the way these protocols work.
Let us take a purely technical view from this point onwards. First
the bittorrent architecture.
Traditional and bittorrent models of file download
Bittorrent also uses the traditional client server model to obtain
the first few bits and to get started. Once the game starts however
the rules are completely different.
Instead of continuing to stress the server to dish out more data ,
in bittorrent data is obtained from the other clients downloading at
that time. Obviously the clients also are downloading at that time and
they don't have all the bits but they share whatever they have. This
model will fail if the download happens sequentially. The key is that
each client downloads a different piece and thus any new client can
download different pieces from different clients(peers). The most
interesting thing here being that as more and more clients connect
for downloading at the same time, there are more and more peers and
hence the load gets distributed amongst themselves. Hence instead of
loading the server, the load is distributed.
This simultaneous uploading and downloading calls for very
sophisticated logic and processing. This works because typically the
upload links of clients are unused most of the time. It is important to
bear in mind that there are several additional problems to solve. How
to upload from behind NAT? What to do if the upload link is a fraction
of download speed as in ADSL?
Diving deeper into the bittorrent protocol
As always what makes a product successful is not just the brilliance of
the design or concept but also the care taken in implementation. Real
world implementation is completely a different ball game compared to
designing things in a laboratory.
When researchers were fighting amongst each other arguing why today's
Internet cannot deliver acceptably quality for VoIP skype just went
ahead and implemented a workable solution especially with tremendous
ease of use. Engineers worked hard to make things easy for users so
that users don't have to work hard. Skype grapples with several non
standard NAT implementations and real life gotchas well and that is
what accounts for its popularity and success.
Coming to bittorrent, what makes it special is its "*redacted* for tat" model
of file sharing or "pareto efficiency" to be precise. This draws on
a beautiful concept from game theory called prisoner's dilemma.
What is pareto efficiency? Simply stated it means that together we
prosper by sharing what we have with one another. It aims at a win win
situation for everybody. This approach can work only upto a certain
point of time however. Bittorrent tackles the real life problem of
people always wanting to take without giving anything in return by
penalising users who don't give enough.
Let us take a look at the various participants of the bittorrent
Bittorrent protocol flow sequence
First the torrent file is exported using a simple HTTP server.
Then a tracker and a seed enter the picture. The tracker and seed
could be the same machine.
The seed is the machine that really dishes out the bits. Hence the
seed should have access to good bandwidth. The seed has the complete
copy of the file to be distributed.
The torrent file contains details about the SHA1 hashes of different
"pieces" of the file in addition to the URL of the tracker.Hence
integrity checking is built into the protocol itself.
Each file is divided into multiple pieces typically of 250 kilobytes
size. Each piece is further subdivided into subpieces of blocks of 16
KB but pieces form the fundamental unit in bittorrent since integrity
can be checked only with piece level granularity.
The tracker is central to bittorrent design as all the clients
connecting will query the tracker to figure who to connect to.
This approach of dividing a file into pieces and downloaded a different
piece from a different peer randomly is shown in the diagram below.
How pieces are used for file downloading
To begin with , space for the entire file is allocated on disk. This
is why you find that the file size on disk apparently never increases
during bittorrent downloads. As and when a piece arrives it is placed
on the appropriate slot on the file.
This approach is very different from the normal way of downloading
a file from beginning to end with the file growing on disk as when
bits are downloaded.
The algorithms involved and modus operandi
Each downloading client is assigned a peer which has a piece that
the downloader does not already have by the tracker. This is done
since each peer constantly updates its status to the tracker.After
a while of operation the tracker knows which pieces are present with
which peers and all new clients connecting will be asked to download
the rarest piece first. This is done to guard against peers going
away which have the missing pieces. The protocol works differently
at different times of this distributed state machine.
This is done as typically the churn rates are very high in real life
p2p networks. End user machines enter and leave the network as and when
they are switched on unlike web servers that stay online all the time.
To begin with, rarest piece first algorithm is used. Then peers are
chosen at random. After this a *redacted* for tat scheme is employed where
peers that upload to us are allowed to download from us.
Every 10 ms or so "rechoking" happens and every three rechoking
intervals, an optimistic unchoke happens.Choking is a scheme employed
to punish bad uploaders and "leeches". However connection speeds are
also taken into account. To figure out connection speeds a 20 second
rolling average is taken since TCP is a dynamic protocol adjusting to
different conditions in different ways to assign bandwidth. Bandwidth
is not a constant in packet switched networks!
At last when the download is about to complete, the protocol goes into
"endgame mode" where the same piece is requested from multiple peers
and those pieces that arrive first lead to cancel requests sent to
other peers. This is to make sure that the download finishes quickly.
Another thing is done to ensure TCP efficiency by pipelining requests
for multiple blocks of the same piece. This is done to avoid TCP
The bittorrent wire protocol is very simple. It is just plain HTTP
requests and responses sent using something called "bencoding". This
is nothing but simple byte string encoded with a length parameter
followed by content. This uses python data structures like lists
and dictionaries. But the wire protocol is the simplest part, what
matters is the design and algorithms we spoke of earlier.
NAT traversal and other issues
No treatment of p2p protocols will be complete without talking about
the issues involved in working across NAT devices. Bittorrent is no
different .It uses TCP ports 6881-6889. Firewalls typically
allow outgoing TCP traffic. All of bittorrent connections to
its peers involve full duplex communications and since it is all TCP,
there are no big problems. Since the listening ports are communicated
through the tracker using the bittorrent protocol and since every
node is connected to the tracker, things are simplified.
It is extremely hard to connect to ports behind a NAT device but
very easy for machines behind NAT to connect to outside. So incoming
connections can be mimicked by outgoing connections that take inward
Although the protocol per se is a distributed one with good robustness
and redundancy, the tracker continues to be a single point of
failure. But the fact that the tracker is not involved in real
downloads help us here. It only acts as an intermediary to set up
transfers between peers. That way the load on the tracker is minimized.
Several enhancements to the base protocol are being made but the core
remains the same.
Got something to add? Send me email.
Increase ad revenue 50-250% with Ezoic
More Articles by Girish Venkatachalam
© 2012-07-01 Girish Venkatachalam