Lose Systems
This section does not actually address software repositories but rather content repositories that are mostly known for distributing copyrighted content. However, due to their proven resilience, a decentralized repository should adopt some of their features. We also ignore group sites and the release system in this description.
BitTorrent has existed
for almost two decades and is still widely used, even in corporate settings. It
is a peer-to-peer file-sharing network that allows users to share arbitrary
files identified using
Content Addressing.
Applying a hashing function to the content of a file creates a rather unique
value used to identify files. This content address is subsequently used in
.torrent
files and
magnet links
to share pointers to files.
Another popular distribution system is the
Usenet, which holds various
newsgroups. It was initially used to post text content and is the spiritual
predecessor of online forums or, for example,
Reddit. Arbitrary files are published in the
form of encoded text, which is similar to the mechanism used for email file
attachments. A file is usually split into multiple text messages identified by
server-generated ids. Then, similar to .torrent
files, .nzb
files address
these individual messages and describe the reconstruction of the original files.
The Usenet itself is hosted using various service providers holding a copy of the entire system and synchronizing the data among each other.1 While the BitTorrent network is publicly and free-of-charge accessible, accessing the Usenet usually requires a paid account with a Usenet provider.
Trackers and indexers are used to find and share content stored in these
unstructured systems. They are separate, usually centralized systems that are
often represented as websites. They allow users to search for content and obtain
the respective magnet links, .torrent
, or .nzb
files.
Even though the community is somewhat organized, no central authority declares an indexer as the standard or a piece of content as the valid one. This means that multiple indexers and trackers trace and reference several versions of the same content in, for example, different formats or qualities. Consequently, there is a many-to-many relationship between indexers and content. Indexers might reference the same files or different ones, still representing the same content.2
Resilience
BitTorrent
Due to copyright laws, law enforcement is interested in taking down both the content and the indexers, even though the latter does not hold any copyrighted content.
As trackers and indexers are, for the most part, centralized services, they are reasonably easy to identify. Consequently, as long as legal grounds and measures exist, taking them down is straightforward. However, this measure only removes a single tracker instance. Its data is mostly publicly available and, more importantly, separated from the actual content. Another instance can easily replace it without disturbing the overall system.
Files in the BitTorrent network can essentially never be removed as there is no central, addressable point of failure to exploit.3 Thus, peers that share files containing copyrighted material are being targeted and taken down in an attempt to remove the files from the network. Since the p2p network communicates openly, one can identify the IP addresses of peers sharing specific data. From here, law enforcement can find the real identities of participating peers.
Another possibility is to block the BitTorrent network traffic altogether through ISPs. Such a measure, however, is extensive and would prevent using the system for other legal purposes. Additionally, circumventing this blockade is relatively trivial.
VPNs and other privacy-preserving technologies can not only evade the restrictions above but also hide the real identity of peers within the open network. This makes it impossible to track down all peers sharing a specific file.
Usenet
Take-downs of copyrighted material are handled differently on the Usenet. Take-down notices are issued to the Usenet providers legally bound to comply. They consequently delete the messages in question from their servers.
To fight this removal, the community takes multiple measures to keep content available. As the files or messages in question are often not completely removed from all the existing providers but only in part, chances are that a large chunk of the files is still available. Combining the files from different providers might result in a complete file. Additionally, since the synchronization between providers is sometimes imperfect or the upload process partially fails, providing parity data for files is common. This allows the recovery of incomplete files, which might also do the trick in this case.
If a file is unrecoverable, it is often simply re-uploaded, making the content
available again. Nevertheless, the content is actually uploaded to a new
location. Subsequently, it is necessary to create new references in the form of
.nzb
files. They, again, must be shared across indexers, so users can
eventually find the new content.
The third measure is the obfuscation of uploaded content. This includes using
random names for files and possibly uploading across various newsgroups. Thus,
it is harder to identify the content and all of its parts. An indexer or a
.nzb
file is necessary to find the desired content without analyzing the data
on the Usenet. As this scheme requires indexers to find content, still valid
content might get ’lost’ if an indexer becomes unavailable and the content is
not referenced elsewhere.
-
Not every provider holds the entire Usenet. Depending on their service, only specific parts are synchronized and served. ↩︎
-
A movie might be available as a Blu-ray file or a rip of a digital release. They are very different files that represent the same content. ↩︎
-
The only way to permanently remove a file from a content addressable p2p network is to delete all instances of the file globally and permanently. This would entail that the file is deleted on any computer that ever shared the file and that the file cannot be reproduced, an almost impossible task. ↩︎