Graph Normalization Specification (GNS v1)¶

Version: 1.0.0
Status: Draft
Applies to: CEP entity graphs, provenance graphs, and merged graphs
Purpose: Define a deterministic normal form for CEP graphs so that equivalent graphs produce identical canonical representations and hashes.

1. Overview¶

CEP represents civic data as graphs:

Entity graphs: entities, relationships, and their links.
Provenance graphs: entities, activities, and agents (PROV-style).
Merged graphs: combined views over multiple sources and versions.

The Graph Normalization Specification (GNS) defines how to:

Represent graphs in a uniform structural model.
Canonically label nodes that lack global identifiers.
Order nodes and edges deterministically.
Serialize graphs via CEC v1 for hashing and verification.
Guarantee that semantically equivalent graphs normalize to the same form.

GNS is versioned independently from CEC, schemas, and vocabularies.

2. Graph Model¶

2.1 Nodes¶

Each node has:

nodeId: a string identifier (global or canonical local)
nodeType: one of
entity
relationship
activity
agent
envelope (optional, for packaging)
payload: a CEC-compatible JSON object containing the node's attributes

Entity nodes should carry their verifiableId (from EFS) in the payload.

2.2 Edges¶

Each edge has:

sourceNodeId
targetNodeId
edgeType: a label (e.g., used, wasGeneratedBy, wasAttributedTo, hasRelationship, participatesIn, domain-specific relationship types)
edgePayload: optional attributes (e.g., role labels, qualifiers)

2.3 Graph¶

A graph G is:

{
  "graphId": "<optional, see hashing>",
  "nodes": [ ... ],
  "edges": [ ... ],
  "graphMetadata": {
    "gnsVersion": "1.0.0",
    "cecVersion": "1.0.0",
    "schemaVersions": { "...": "..." },
    "vocabularyVersions": { "...": "..." }
  }
}

The normalization process produces a canonical graph object of this shape.

3. Goals of Graph Normalization¶

GNS aims to ensure that:

Determinism
For a fixed input graph semantics and version tuple, every implementation produces the same normalized graph.

Idempotence
Normalizing an already normalized graph yields the identical graph.

Equivalence
Graphs that differ only by node-labeling or edge ordering but represent the same structure normalize to the same canonical form.

Hash stability
The normalized graph can be CEC-serialized and hashed to yield a stable graph-level CTag.

4. Node Identity and Labeling¶

4.1 Entity Nodes¶

Entity nodes have stable global identifiers.

The payload MUST include:

verifiableId (from the Entity Fingerprint Specification)
entityTypeUri

The nodeId for entity nodes MUST be set to verifiableId.

No relabeling is required for entity nodes.

4.2 Non-Entity Nodes (Activities, Agents, Relationships, Envelopes)¶

Non-entity nodes may not have globally stable IDs. For such nodes, GNS defines a canonical local ID computed from:

The node's type
A deterministic summary of its payload
The multiset of incident edges (types + endpoint identifiers)
Optional timestamps where present

Canonical local ID:

canonicalLocalId = "gns:" || base64url( H( summary(node) ) )

Where summary(node) is a CEC-serialized JSON object:

nodeType
payload (CEC-normalized)
incidentEdges: a sorted list of { edgeType, direction, otherNodeId }

Hash function H is typically SHA-256.

base64url is standard URL-safe Base64 encoding.

incidentEdges sorting order:

edgeType
direction ("in" or "out")
otherNodeId

4.3 Labeling Procedure and Fixed Point¶

Because non-entity node labels depend on incident edges and incident edges depend on node labels, GNS defines a fixed-point procedure:

Initial labeling

Entity nodes: nodeId = verifiableId
Non-entity nodes: temporary IDs (internal indices)

Procedure

Compute summaries for all non-entity nodes using current IDs.
Compute canonicalLocalId for each non-entity node.
Replace temporary IDs with canonicalLocalId.
Rebuild edges with updated sourceNodeId / targetNodeId.

This converges in one iteration because:

entity IDs are stable
each non-entity ID depends only on:
node type
payload
incident edges with stable entity IDs and previously computed local IDs

Implementations MUST perform at least one full pass.
A second pass MUST produce identical results (idempotence).

5. Edge Normalization¶

After node IDs are canonical:

For each edge:

Ensure sourceNodeId and targetNodeId use canonical IDs.
Canonicalize edgePayload via CEP normalization + CEC.

Sort edges by:

edgeType
sourceNodeId
targetNodeId
CEC(edgePayload)

If edgePayload is absent, treat it as {} for CEC.

6. Node List Normalization¶

After node IDs are canonical:

Canonicalize each payload via CEP normalization + CEC.
Sort nodes by:
nodeType
nodeId
CEC(payload)

7. Canonical Graph Serialization¶

Construct the graph object:

{
  "graphMetadata": {
    "gnsVersion": "1.0.0",
    "cecVersion": "<cecVersion used>",
    "schemaVersions": { ... },
    "vocabularyVersions": { ... }
  },
  "nodes": [ ... ],
  "edges": [ ... ]
}

Serialize via CEC v1:

lexicographic key ordering
omit null values
normalize numbers, strings, lists per CEC

The result is the canonical graph JSON.

8. Graph Hash and Graph-Level CTag¶

graphHash = H( CEC(graphObject) )
graphTag  = "cep-graph:" || base64url(graphHash)

Uses:

as a graphId
as a provenance reference
for caching and deduplication

9. Recursion Guards and Idempotence¶

Implementations MUST ensure:

Idempotence: normalizing a normalized graph yields identical output.
No recursive expansion: do not re-run canonicalization or adapters.
Finite behavior: normalization must terminate for all finite graphs.
Order of operations: all canonicalization must occur before graph normalization begins.

10. Relationship to Other CEP Specifications¶

GNS builds upon:

Canonical Encoding (CEC v1)
Entity Fingerprint Specification (EFS v1)
Adapter Algebra Specification (AAS v1)

GNS ensures graph-level determinism and hashing; it does not redefine canonicalization or adapter semantics.

11. Versioning¶

Semantic versioning:

MAJOR: breaking changes
MINOR: backward-compatible additions
PATCH: clarifications, editorial fixes

Every normalized graph MUST include:

"graphMetadata": {
  "gnsVersion": "1.0.0",
  ...
}

12. Summary¶

GNS v1 defines:

a uniform CEP graph model
canonical labeling for non-entity nodes
deterministic node and edge ordering
CEC-based canonical serialization
a graph-level CTag mechanism

These together provide a stable foundation for provenance, merges, cross-source entity graphs, and higher-level CEP reasoning.