Federico Mengozzi

Application Layer

Peer-to-Peer

One of the main use for a Peer-to-Peer architecture is usually file distribution. The P2P architecture, in contrast to a client-server architecture, is inherently self-scalable.

To understand this possible to consider a scenario in which a file of size $S$ bits has to be distributed from a single source node to $N$ other nodes. The upload rate of the source node is $u_s$, the upload rate of the $i$-th node is $u_i$ and the download rate of the $i$-th node is $d_i$. Denote $D_{cs}$ as the distribution time in a client-server architecture and $D_{P2P}$ the distribution time in a P2P architecture.

  • In a client-server architecture the source node (server) must transmit one copy of the file to each of the $N$ nodes (clients). The server must upload at least the file $N$ time, hence at least $N\cdot S$ bits and since it’s upload rate is $u_s$ the $D_{cs}$ would be at least $\dfrac{N\cdot S}{u_s}$. In addition, denote $d_{min} = \min({d_1, d_2, \cdots, d_N})$ the lowest download rate among all peers, such peer will require at least $\dfrac{S}{d_{min}}$. The $D_{cs}$ would be at least $\dfrac{S}{d_{min}}$. To sum in best case $D_{cs} = \max(\dfrac{N\cdot S}{u_s}, \dfrac{S}{d_{min}})$

For $N$ large enough the $D_{cs}$ increase linearly unbounded with $N$.

  • In a P2P architect at the beginning only the uploading node has the file and to get the file on the network would require at least $\dfrac{S}{u_s}$. Similarly to the client-server architecture the node with lowest download rate can’t obtain the file faster than $\dfrac{S}{d_{min}}$. But with P2P the total upload rate of the system became $u_{total} = u_s + \sum\limits_{i = 1}^N u_i$. The system will have to upload $N\cdot S$ bits and this would take at least $\dfrac{N\cdot S}{u_{total}}$. Finally $D_{P2P}$ would be, in the best case, at least $D_{P2P} = \max(\dfrac{S}{u_s}, \dfrac{S}{d_{min}}, \dfrac{N\cdot S}{u_{total}})$.

If peers have a reasonable upload rate $D_{P2P}$ will probably have an upper bound. It means that the time to distribute the file to a network of peer doesn’t depend by the number of those peers.

BitTorrent Protocol

In the BitTorrent protocol a set of peers participating in the distribution of a file is called torrent. Files are distributed in equal-size chuncks of data, usually of size 256KBytes. Each torrent is provided with a node that keeps track of all peer participating in the torrent, such node is called tracker.

When a new peer join the torrent, the tracker select about $50$ peers and send their IPs to the new peer to allow communications. Each peers has a subset of chuncks of the file and it also know what chuncks its neighbor peers have. In this way is possible for peer to request the chucks is missing; to select which chuncks to request the peer use the rarest first approach (by doing that the rarest chuncks get more quickly redistributed).

To pair peers together trading algorithm are used. A peer usually select the peers with the highest upload rate, usually $4$, that became the unchocked peers. Every $10$ second the upload upload rate from the peer is recalculated and the set of unchocked peers might get updated. In addition, every $30$ second a random peer is selected; if this optimistically unchocked peer turn out to be better that one of the unchocked peer it will replace a peer on such set. Peer that don’t belong to the unchocked set of a peer are chocked by that peer (they don’t receive any data from it).

With this algorithm peers with similar upload rates tend to find each other. Other mechanism are used in the protocol such as minichuncks, pipelining, random first selection, endgame mode and anti-snubbing.

Distributed Hash Tables

Distributed Hash Tables (DHTs) are used to implement a distributed database. In general it’s possible to assume that the database will contains pair of elements, an identifier (key) and a related data (value).

The very first approach of making every peer to store the entire database is obviously inconceivable. It would be possible, on the other hand, to randomly distribute the database among all peers (now each peer should store just a fraction of the overall data).

In this case other peers would be responsible for locating the peers that store a particular piece of data. A naive approach would require that every peer maintains a list of all participating peers on the network. That’s unscalable since a querying peer should have to contact all the other peers (it’s not know what kind of data each peer store).

It would be better to assign an identifier to each peer (represented by an $N$ bits integer in the range $0..2^N -1$ where $N$ is the number of peers on th network). Then each peer could store the (key, value) pairs whose key is closest to its identifier. To do that some requirements must be fulfilled

  • Each key should be comparable to the keys - It’s possible to do that by mapping each key to and integer using an hash function
  • Define what “closest” mean - Let’s say that the closest identifier is the smallest identifier that is greater or equal to the key.

Each peer could also maintain a reference to its predecessor and successor peer. In this way each query would require, on average, $\frac{N}{2}$ peers to be involved.

Circular DHTs

Another way to optimize the performance could be to imagine the network organized in a circular topology and add intermediate link between peers (a peer also keeps track of other peers in the network). The smaller the number of peer tracked, the more messages should be sent on the network to find the desire peer. The more peers each peer track, the fewer step must be done to find the peer.

circular distributed hash table

A good trade-off require each peer to store $\log N$ peers and, on average, this configuration require only $\log N$ messages to be sent.

In a network were peers came and go randomly some mechanism to keep the peer connected must be applied. It’s enough that every peer keeps track of the second successor too. Each peer should periodically ping its successor peer and update all reference to the other peers.

  • If the first successor it’s no more available, then the second successor takes its place
  • If the the second successor it’s missing, then the second successor of the peer’s first successor takes its place

When a peer want to join the network is sufficient that one peer of the network is somehow globally tracked. Then the joining peer contact such peer that in turn send message to its successor until the future successor of the joining peer is located.

Go to top