11 Implementation guidelines

11.0 Overview

This chapter provides practical guidance for implementing OSIRIS producers (parsers) and consumers (tools that read and process OSIRIS documents). These guidelines are intended to improve interoperability, stability and long-term maintainability across implementations.

OSIRIS is designed for real-world infrastructure exports that may be incomplete or partially discoverable. Implementations SHOULD prioritize correctness, determinism and forward compatibility.

11.1 Parser development

11.1.1 Core responsibilities

A parser (also called a producer) is any component that generates OSIRIS documents from a source system (API, inventory database, CLI outputs, telemetry snapshots etc.).

A producer MUST:

Emit documents that pass JSON Schema validation (Chapter 9).
Populate required fields (version, metadata.timestamp and required resource fields).
Produce stable, well-formed id values and valid references.

A producer SHOULD:

Provide metadata.generator with a stable tool name and version.
Describe export boundaries in metadata.scope (accounts/projects/subscriptions, regions, environments, sites).
Produce deterministic outputs for the same input (see 11.1.3).

11.1.2 Mapping strategy

Type mapping

Producers SHOULD map to standard resource types from Chapter 7 whenever possible.
When a native object has no suitable standard mapping, producers MAY:
- Use a custom type following the rules in Chapter 7.
- Represent additional semantics developing dedicated extensions (Chapter 8).

Provider attribution

provider.name MUST identify the originating platform/vendor in lowercase.
provider.native_id SHOULD capture the primary native identifier used by the provider to easily locate the resource.
Additional provider context (e.g. region, account, subscription, project, site) SHOULD be included when applicable and stable.

Properties vs extensions

Generic, cross-vendor attributes SHOULD go in properties.
Vendor-specific or organization-specific details SHOULD go in extensions using a namespaced key (Chapter 8).
Producers SHOULD NOT duplicate the same data in both properties and extensions unless required for interoperability.

11.1.3 Identity and ID stability

Stable identity is critical for topology merging, diffing and downstream automation.

Producers MUST ensure:

Resource id values are unique within the document.
Connection/group references resolve to valid resource IDs (Chapter 9 semantic rules).
IDs remain stable across exports when the underlying entity is the same.

Recommended ID patterns (examples)

Cloud and hyperscaler: provider::native-id Examples: aws::i-0abc123, azure::/subscriptions/.../virtualMachines/vm01
On-prem: site::identifier Examples: mxp::sw-core-01, mxp::srv-r770-001
OT: site::identifier Examples: mxp-plant-01::sensor-temp-01

Determinism

If the source provides a stable unique identifier, producers SHOULD build id from it.
Producers SHOULD NOT generate random IDs for real resources.
If a stable native identifier is not available, producers MAY derive a deterministic ID from a stable tuple (e.g. {site, name, serial}) and SHOULD document the strategy.

11.1.4 Relationship extraction for connections and groups

Producers SHOULD emit explicit relationships whenever they are known:

Network connectivity, dependency, flow, containment, attachment, routing and similar (Chapter 5).
Logical boundaries (VPC, subnet, security zone, cluster, rack, availability zone, etc.) as groups (Chapter 6).

Guidance:

Use connections when the relationship must be traversed as a graph edge (e.g. network path, dependency chain, flow).
Use groups for classification, organization and boundaries (e.g. ownership, cost center, environment, zone).
Producers SHOULD avoid encoding relationships implicitly inside properties when they can be expressed as connections or groups.

11.1.5 Partial data and unknowns

OSIRIS supports incomplete inventories.

Producers SHOULD:

Omit optional fields when unknown rather than emitting incorrect values.
Prefer conservative modeling: it is better to omit a connection than to invent one.
Use tags or namespaced metadata to document limitations (e.g. “no routing table available”, “LLDP disabled”).

Producers MUST NOT:

Emit placeholders that look real in production exports (e.g. fake serials, fake hostnames, fake IPs) unless explicitly flagged as redacted/anonymized.

11.1.6 Validation workflow for producers

A producer MUST validate the output before publishing:

Level 1 (structural): JSON Schema validation
Level 2 (semantic): ID uniqueness, reference integrity, type format rules
Level 3 (domain): optional type recognition and best-practice checks

Producers SHOULD fail the export pipeline on Level 1 errors.
Producers SHOULD treat Level 2 errors as export failures unless explicitly configured otherwise.

11.1.7 Document splitting and export scope

Large infrastructures may require multiple split documents.

Producers MAY split by:

Provider hierarchy (account/subscription/project, or subscription/resource group/resources)
Region and availability zone
Environment (prod/stage/dev)
Physical site (data center, plant)
Domain boundary (IT vs OT)

When splitting, producers SHOULD:

Ensure scope is clearly described in metadata.scope.
Keep IDs consistent across documents (so consumers can merge reliably).

11.1.8 Logging and telemetry

Producers are often executed in automated pipelines (CI/CD, scheduled exports, inventory collectors). Consistent logging and basic telemetry greatly improve troubleshooting, reliability and performance tuning.

11.1.8.1 What to log during parsing

Producers SHOULD emit structured logs (JSON logs recommended) with a stable set of fields.

Recommended log events:

Run start/end
- Export scope summary (provider, account/subscription/project, region/site, environment)
- Input source (API/CLI/file) and collector version
Discovery summary
- Counts: discovered resources, emitted resources, emitted connections, emitted groups
- Skipped items and reasons (unsupported type, missing permissions, filtered by scope)
Normalization decisions
- Type mapping used (native type > OSIRIS type) when non-obvious
- ID strategy (native ID used vs derived tuple)
Validation results
- Schema validation pass/fail
- Semantic validation pass/fail
- Rule identifiers and JSONPath for failures (when applicable)

Logs SHOULD include:

run_id (unique per export execution)
generator.name and generator.version
scope identifiers from metadata.scope
severity (debug, info, warn, error)
event (stable event name)
Optional resource_id (OSIRIS id) and/or provider.native_id when a log line is resource-specific

Producers MUST NOT log secrets or sensitive values (see Chapter 13).

Example of a structured log output:

{
  "timestamp": "2026-01-16T22:23:45Z",
  "severity": "info",
  "event": "discovery_complete",
  "run_id": "20260115-102340-aws-prod",
  "generator": {
    "name": "osiris-aws-parser",
    "version": "1.0.0"
  },
  "scope": {
    "provider": "aws",
    "account": "123456789012",
    "regions": ["eu-west-1"]
  },
  "counts": {
    "resources_discovered": 847,
    "resources_emitted": 842,
    "connections_emitted": 1203,
    "groups_emitted": 45,
    "skipped": 5
  },
  "skipped_reasons": {
    "unsupported_type": 3,
    "missing_permissions": 2
  }
}

11.1.8.2 Performance metrics to track

Producers SHOULD track basic metrics to detect regressions and capacity issues:

Timing
- Total runtime
- Time spent per phase: discovery, normalization, relationship inference, validation, serialization
Volume
- Resources emitted, connections emitted, groups emitted
- Input objects fetched/parsed (if different from emitted)
API/IO
- API request count (by endpoint if possible)
- API error count and retry count
- Rate-limit/backoff occurrences
Quality
- Validation errors and warnings count (by rule ID if available)
- Skipped/filtered count (by reason)
Resource usage (optional)
- Peak memory usage
- Output document size (bytes)

Metrics SHOULD be emitted in a machine-readable form suitable for pipeline dashboards.

11.1.8.3 Error reporting best practices

Producers SHOULD classify failures into clear categories:

Source access errors
- Authentication/authorization failures, missing permissions, rate limits
Parsing / normalization errors
- Unexpected native formats, unsupported types, missing required source fields
Validation errors
- Level 1 (schema) and Level 2 (semantic) failures
Operational errors
- IO failures, serialization issues, timeouts

When reporting an error, producers SHOULD include:

A stable error code or rule identifier (when applicable)
Human-readable message
JSONPath to the failing OSIRIS element (if the error is in emitted data)
The smallest useful context (e.g. provider name, native id, OSIRIS id)
A remediation hint (e.g. required permission, missing API scope, mapping fix)

Producers SHOULD:

Exit with non-zero status on Level 1 errors by default.
Provide a configurable mode to downgrade selected semantic issues to warnings only when explicitly requested.
Avoid cascading failures: continue collecting other resources when safe, but fail the run if the resulting output would be invalid.

Producers MUST:

Never include credentials, tokens or secrets in logs, error messages or stack traces (see Chapter 13).
Avoid logging raw payloads unless explicitly enabled in debug mode and appropriately redacted.

11.1.8.4 Observability platforms integration

[!NOTE] OSIRIS remains a static snapshot format.
This subsection describes optional integration patterns for shipping OSIRIS snapshots into observability platforms.
The transport, storage, indexing and retention model are implementation concerns and outside the OSIRIS core scope.

Producers deployed in production environments MAY integrate with observability platforms in two ways:

Parser operational telemetry (monitoring the parser itself):

Metrics systems: Zabbix, Prometheus, Datadog, CloudWatch, Azure Monitor
- Track parser performance, API usage, validation results
Log aggregation: ELK, Splunk, Loki, CloudWatch Logs
- Centralize parser logs for troubleshooting
Tracing: OpenTelemetry, Jaeger, Zipkin
- For parsers with complex multi-step flows

Infrastructure topology snapshots (OSIRIS documents as observability artifacts):

Observability platforms MAY ingest OSIRIS documents as a snapshot series to enable:
- Topology change tracking and visualization
- Configuration drift detection
- Incident correlation with infrastructure changes
- Compliance monitoring across snapshots
Platforms with topology/service map capabilities may support this natively or via plugins
Producers emitting documents for this purpose SHOULD run at consistent intervals and maintain stable IDs across snapshots

When integrating with observability systems, producers SHOULD:

Use consistent metric names with appropriate prefixes (e.g. osiris.parser.aws.)
Include standard labels/tags:
- parser_name, parser_version
- scope_provider, scope_region
- snapshot_timestamp, document_size_bytes
- resource_count, connection_count, group_count
Support sampling/filtering to manage high-volume telemetry

Producers MUST ensure that telemetry integration does not expose sensitive data (Chapter 13).

11.2 Consumer implementation

11.2.1 Version handling and negotiation

Consumers MUST read version and apply compatibility rules (Chapter 12).
Consumers MUST ignore unknown fields as required for forward compatibility.

Consumers SHOULD:

Support all 1.x.y documents within the same major version when feasible.
Provide clear diagnostics when a document uses an unsupported major version.

11.2.2 Consumer validation policy

Consumers MUST perform Level 1 validation (or equivalent structural checks) before processing. Consumers SHOULD perform Level 2 validation before building graph structures.

Consumers MAY implement configurable strictness:

basic: Level 1 only
default: Level 1 + Level 2
strict: Level 1 + Level 2 + selected Level 3 rules

11.2.3 Graph construction and traversal

Consumers SHOULD treat:

resources as nodes
connections as edges where:
- bidirectional treat as undirected for traversal
- forward source to target
- reverse target to source
groups as membership relations (one-to-many references)

Consumers SHOULD build efficient indexes:

resourceById
connectionsBySource, connectionsByTarget
groupsById, membershipsByResource

Consumers MUST NOT assume ordering in arrays is meaningful.

11.2.4 Unknown types and extensions

Consumers MUST accept unknown:

Resource types
Group types
Connection types (if structurally valid)
Extension namespaces

Consumers encountering unknown types SHOULD:

Preserve them when re-exporting/translating
Display type strings verbatim for debugging
Continue processing known core fields

11.2.5 Merging diff and snapshot correlation

Consumers often process multiple OSIRIS documents over time.

Recommended approach:

Treat metadata.timestamp as the snapshot point in time.
Use stable resource.id as the primary key for correlation.
If merging multiple documents, namespace collisions MUST be prevented (IDs must remain unique in the merged graph).

Consumers SHOULD support “soft merge” strategies when duplicates occur (e.g. prefer newest timestamp, or prefer a configured source).

11.3 Best practices

This section provides recommended practices for producers (parsers/exporters) and consumers (readers/ingestion tools) to maximize interoperability, stability and long-term maintainability.

11.3.1 Best practices for producers

Prefer standard types first
Map native objects to the standard resource types from Chapter 7 whenever possible. Use custom types only when no suitable standard type exists.
Use properties for generic data and extensions for vendor/org specifics
Put broadly applicable attributes in properties. Put vendor-specific or organization-specific details in extensions using a namespaced key (Chapter 8). Avoid duplicating the same data in both locations unless required for interoperability.
Generate stable, deterministic id values
IDs should remain stable across exports for the same underlying entity. Prefer building id from stable provider/native identifiers. Avoid random IDs for real resources.
Always include provider traceability
Populate provider.name and strongly prefer provider.native_id. Include stable scope context (account/subscription/project, region/site) when available.
Model relationships explicitly
Use connections for graph edges that must be traversed (paths, dependencies, flows). Use groups to represent boundaries and classification (zones, clusters, environments, ownership). Avoid encoding relationships only inside properties.
Be conservative when data is incomplete
Omit unknown optional fields rather than guessing. Prefer missing relationships over invented ones. Document known collection limitations via tags or namespaced metadata/extension fields.
Validate before publishing
Integrate validation in the export pipeline: Level 1 (schema) and Level 2 (semantic) should be treated as failures by default. Use Level 3 checks as optional strict mode.
Keep exports reproducible
For the same input snapshot, outputs should be deterministic. Avoid non-deterministic ordering or unstable derived values.
Split documents by stable boundaries
For large infrastructures, split by account/subscription/project, region, environment, site or IT/OT domain boundary. Ensure metadata.scope clearly describes what the document contains.

Recommended minimum metadata

"metadata": {
  "timestamp": "2026-01-01T10:30:00Z",
  "generator": { 
    "name": "osiris-aws-parser", 
    "version": "1.0.0" },
  "scope": {
    "providers": "aws",
    "regions": ["eu-west-1"],
    "accounts": "123456789012",
    "environment": "prod"
  }
}

11.3.2 Best practices for consumers

Validate early, fail safely
- Consumers MUST perform Level 1 validation (or equivalent structural checks) before processing.
- Consumers SHOULD perform Level 2 validation before building graph structures (IDs, references, type format rules).
- Consumers MAY offer configurable strictness (e.g. basic, default, strict) to match different use cases.
Implement forward compatibility by default
- Consumers MUST ignore unknown fields to support newer documents and extensions.
- Consumers MUST accept unknown resource, connection and group types if the objects are structurally valid.
- Consumers MUST accept unknown extensions namespaces and treat them as opaque data.
Do not rely on array ordering
- Consumers MUST NOT assume ordering of resources, connections or groups is meaningful.
- Consumers SHOULD operate using IDs and indexes rather than positions.
Build indexes before complex processing Consumers SHOULD build efficient lookup structures such as:
- resourceById
- connectionsBySource, connectionsByTarget
- groupsById
- membershipsByResource (reverse membership index)
Preserve fidelity during transformation
- When filtering, translating or re-exporting documents, consumers SHOULD preserve:
  - unknown fields
  - unknown types
  - unknown extension namespaces
- Consumers SHOULD NOT drop or rename data unless explicitly configured.
Treat the topology as a graph
- Consumers SHOULD interpret:
  - resources as nodes
  - connections as edges where:
    - bidirectional treat as undirected for traversal
    - forward source to target
    - reverse target to source
  - groups as membership relations (classification/boundaries)
- Consumers SHOULD keep a clear separation between graph edges (connections) and classification (groups).
Support snapshot correlation and diffing
- Consumers SHOULD treat metadata.timestamp as the snapshot time.
- Consumers SHOULD use stable resource.id values as primary keys for correlation across snapshots.
- When merging documents, consumers MUST prevent ID collisions (IDs must be unique in the merged graph).
Provide actionable diagnostics Consumers SHOULD emit diagnostics that include:
- severity (error / warning / info)
- rule identifier when available (from Chapter 9)
- JSONPath (or equivalent) to the failing element
- a human-readable message
- a suggested fix or remediation hint
Handle partial data gracefully
- Consumers SHOULD continue processing when optional fields are missing.
- Consumers SHOULD avoid inferring relationships unless explicitly required and clearly marked as derived.
- Consumers MAY expose confidence or provenance for derived insights.
Security-aware ingestion
- Consumers SHOULD treat all string fields (including extensions) as untrusted input.
- Consumers SHOULD sanitize or escape data before rendering in UIs or exporting to other formats.
- Consumers SHOULD support redaction and filtering policies before storing or sharing documents (see Chapter 13).

11.3.3 Common pitfalls

This section highlights recurring implementation mistakes and provides practical tips to improve interoperability across exporters, validators, pipelines and visualization tools.

Common pitfalls to avoid

Unstable resource IDs
- Using random UUIDs for real resources breaks correlation, diffing and merging across snapshots.
- Suggestion: derive id deterministically from stable provider identifiers (provider.name + provider.native_id) or a stable tuple when native IDs are not available.
Duplicate or conflicting identity fields
- Storing the same identifier in multiple places (e.g. id, provider.native_id and a custom extension) without a clear rule creates ambiguity.
- Suggestion: keep id as the document identifier and provider.native_id as the authoritative provider locator; use extensions only for additional native identifiers.
Encoding relationships implicitly
- Hiding dependencies or topology relationships in properties (strings like connected_to, peer, uplink) prevents graph traversal and consistent tooling.
- Suggestion: use connections for traversable edges and groups for classification/boundaries.
Misusing groups as topology edges
- Treating group membership as a network link or dependency leads to incorrect graph semantics.
- Suggestion: groups organize and classify; connections express relationships that can be traversed.
Rejecting unknown types or namespaces
- Consumers that fail on unknown resource types or unknown extension namespaces are not forward-compatible.
- Suggestion: accept unknown types/namespaces if structurally valid; preserve them as opaque data.
Assuming array ordering is meaningful
- Depending on the order of resources, connections or groups causes non deterministic behavior across exporters and serializers.
- Suggestion: always index by id and traverse via references.
Dropping unknown fields during transformation
- Normalizers and converters that remove fields they do not recognize can silently destroy information.
- Suggestion: preserve unknown fields by default; only drop data when explicitly configured.
Inventing values for unknowns
- Filling missing fields with “fake but plausible” values (serials, IPs, hostnames) contaminates datasets and downstream automation.
- Suggestion: omit unknown optional fields; if anonymization is required, mark it clearly and use consistent redaction patterns.
Merging documents without collision control
- Combining OSIRIS documents can introduce duplicate IDs and broken references.
- Suggestion: define a merge strategy and enforce uniqueness (e.g. stable global IDs, or deterministic prefixing rules at merge time).
Overloading extensions with core semantics
- Putting essential cross-tool data only in extensions reduces interoperability.
- Suggestion: place broadly relevant attributes in core fields (name, provider, properties), use extensions for vendor/org-specific details.

11.3.4 Interoperability tips and checklist

Tips

Normalize identity early
- In ingestion pipelines, build a consistent internal key using {resource.id} first and {provider.name + provider.native_id} as a secondary correlation hint.
Emit and consume explicit scope
- Producers should populate metadata.scope; consumers should surface it for users and use it to avoid accidental cross-scope merges.
Prefer conservative inference
- If you infer connections or groups (e.g. from naming, subnets, LLDP), mark them as derived (e.g. via tags or namespaced metadata) and avoid overwriting explicit data.
Use validation levels consistently
- Producers should fail exports on Level 1 and usually Level 2 errors.
- Consumers should offer strictness modes and clearly distinguish errors vs warnings.
Preserve unknown namespaces end-to-end
- A good ecosystem behavior is: parse > validate > enrich > re-export without losing unknown namespaces or future fields.

Practical checklist

Resource IDs are deterministic and stable across exports for the same entity.
All connections and groups reference valid resource.id values.
Provider traceability is present (provider.name and preferably provider.native_id, plus stable scope context).
Consumers ignore unknown fields, unknown types and unknown extension namespaces without failing.
No implementation assumes array ordering for correctness.
Transformation tools preserve unknown fields unless explicitly configured otherwise.
Merge operations prevent ID collisions and do not break references.
No invented “real-looking” values are emitted for unknown data unless explicitly flagged as redacted/anonymized.