Common Properties

The Goal

The original author of the master thesis, Fan Xu, analyzed a wide range of existing software repositories. This included language-specific repositories like Java’s Maven Central, the NPM Registry, Python’s PyPI, and the Debian repositories and the Arch User Repository. However, only repositories that host open-source software were inspected.

She aimed to find a minimal set of properties defining a software repository. These feature definitions could be used as building blocks of a new system if they were compatible with the goals of decentralization and enhanced security. Otherwise, alternatives had to be found to replace or improve a given function.

Identified Properties

Traditional software repositories share a mutual set of properties. The following are the essential traits that make up a software repository:

  • Ease of access: There is a single, central1, and commonly known access point to the repository for its users.

  • Coordinates: Software packages within a repository are addressed via a coordinate system often comprised of a combination of Group - Package - Version.

  • Build Artifacts: Depending on the repository type and its ecosystem, the systems primarily hold built or compiled artifacts ready to use.

  • Source Code and Documentation: If not the primary artifacts, a software release package’s corresponding source code and documentation are typically published along with built artifacts.

  • Metadata: Licenses, dependencies, author’s names, links to the project website, and development repository are usually added to each release package.

  • Verification Data: Checksums and digital signatures can be found next to the built artifacts to verify the authenticity of downloaded files.

  • Historizing: Most analyzed repositories store artifacts and old software versions in perpetuity, even if newer versions are available.

  • Compatibility: HTTP and other standard web technologies, among others, are used to access most repositories, making them available even in restrictive corporate environments.

  • Content Moderation: Administrators and platform owners have eventual control over all content published within a given repository and can remove unwanted artifacts.

  • Reliability: The analyzed software repositories are highly available by leveraging content delivery networks, globally distributed mirrors, and others.

  1. Central does not necessarily mean centralized but rather points to a well-known entry point. ↩︎

5 / 18