Skip to content

Reference Projects: GitHub Data Standards

There are categories of open-source projects on GitHub that offer great examples, particularly concerning common data schemas, multi-language support, and provenance tracking.

1. Interoperability & Event Specifications (Cross-Platform)

These standards focus on defining a common data format to ensure different systems and languages can communicate seamlessly. They typically define schemas and transport rules.

CloudEvents Specification

This specification describes event data in a common way. It is designed to simplify event declaration and delivery across services, platforms, and languages (e.g., Go, Java, Python, C#).

It's a good example of a cross-platform specification managed openly on GitHub. It defines a mandatory set of attributes (like a base entity identifier) that must be present in every data payload.

Link: https://github.com/cloudevents/spec

CDEvents Specification

This is a popular specification for Continuous Delivery events, extending CloudEvents by introducing purpose and semantics to the event data.

It shows how a standard is built on top of another standard (CloudEvents), specializing the common metadata for a specific domain (CI/CD provenance and flow).

Link: https://github.com/cdevents/spec

2. Provenance and Data Tracking Standards

These projects relate to provenance aspects, focusing on tracking the history, inputs, and derivation of data.

PROV-CPL (Core Provenance Library)

This is the Core Provenance Library for collecting data provenance with multiple language bindings (C/C++, Java, Python, R). It uses the W3C PROV standard as its foundation.

It demonstrates a multi-language implementation of a provenance standard, providing APIs to record who/what/when/where data was created, which is central to provenance.

Link: https://github.com/ProvTools/prov-cpl

3. General Data Schemas and Monorepo Structure

These focus on using JSON Schema to define strict data structures and managing them in a versioned repository.

JSON Schema Specification

This is the official specification for JSON Schema, a declarative language used to annotate and validate JSON documents.

This foundational tool is by many standards (including CloudEvents) to define specific fields and types. This repo illustrates how a core schema standard is defined and versioned.

Link: https://github.com/json-schema-org/json-schema-spec

Consumer Data Standards (Australian DSB Schemas)

This repository holds a collection of JSON schema files derived from the Australian Consumer Data Standards, used for robust schema validation in banking and energy sectors.

This offers a practical example of a large-scale data standard implementation in a monorepo (single repository), organized by release version, providing strict, enforceable JSON schemas for real-world data exchange.

Link: https://github.com/ConsumerDataStandardsAustralia/dsb-schema-tools