11 Implementation guidelines
11.0 Overview
This chapter provides practical guidance for implementing OSIRIS producers (parsers) and consumers (tools that read and process OSIRIS documents). These guidelines are intended to improve interoperability, stability and long-term maintainability across implementations.
OSIRIS is designed for real-world infrastructure exports that may be incomplete or partially discoverable. Implementations SHOULD prioritize correctness, determinism and forward compatibility.
11.1 Parser development
11.1.1 Core responsibilities
A parser (also called a producer) is any component that generates OSIRIS documents from a source system (API, inventory database, CLI outputs, telemetry snapshots etc.).
A producer MUST:
- Emit documents that pass JSON Schema validation (Chapter 9).
- Populate required fields (
version,metadata.timestampand required resource fields). - Produce stable, well-formed
idvalues and valid references.
A producer SHOULD:
- Provide
metadata.generatorwith a stable tool name and version. - Describe export boundaries in
metadata.scope(accounts/projects/subscriptions, regions, environments, sites). - Produce deterministic outputs for the same input (see 11.1.3).
11.1.2 Mapping strategy
Type mapping
- Producers SHOULD map to standard resource types from Chapter 7 whenever possible.
- When a native object has no suitable standard mapping, producers MAY:
- Use a custom type following the rules in Chapter 7.
- Represent additional semantics developing dedicated
extensions(Chapter 8).
Provider attribution
provider.nameMUST identify the originating platform/vendor in lowercase.provider.native_idSHOULD capture the primary native identifier used by the provider to easily locate the resource.- Additional provider context (e.g.
region,account,subscription,project,site) SHOULD be included when applicable and stable.
Properties vs extensions
- Generic, cross-vendor attributes SHOULD go in
properties. - Vendor-specific or organization-specific details SHOULD go in
extensionsusing a namespaced key (Chapter 8). - Producers SHOULD NOT duplicate the same data in both
propertiesandextensionsunless required for interoperability.
11.1.3 Identity and ID stability
Stable identity is critical for topology merging, diffing and downstream automation.
Producers MUST ensure:
- Resource
idvalues are unique within the document. - Connection/group references resolve to valid resource IDs (Chapter 9 semantic rules).
- IDs remain stable across exports when the underlying entity is the same.
Recommended ID patterns (examples)
- Cloud and hyperscaler:
provider::native-idExamples:aws::i-0abc123,azure::/subscriptions/.../virtualMachines/vm01 - On-prem:
site::identifierExamples:mxp::sw-core-01,mxp::srv-r770-001 - OT:
site::identifierExamples:mxp-plant-01::sensor-temp-01
Determinism
- If the source provides a stable unique identifier, producers SHOULD build
idfrom it. - Producers SHOULD NOT generate random IDs for real resources.
- If a stable native identifier is not available, producers MAY derive a deterministic ID from a stable tuple (e.g.
{site, name, serial}) and SHOULD document the strategy.
11.1.4 Relationship extraction for connections and groups
Producers SHOULD emit explicit relationships whenever they are known:
- Network connectivity, dependency, flow, containment, attachment, routing and similar (Chapter 5).
- Logical boundaries (VPC, subnet, security zone, cluster, rack, availability zone, etc.) as groups (Chapter 6).
Guidance:
- Use connections when the relationship must be traversed as a graph edge (e.g. network path, dependency chain, flow).
- Use groups for classification, organization and boundaries (e.g. ownership, cost center, environment, zone).
- Producers SHOULD avoid encoding relationships implicitly inside
propertieswhen they can be expressed asconnectionsorgroups.
11.1.5 Partial data and unknowns
OSIRIS supports incomplete inventories.
Producers SHOULD:
- Omit optional fields when unknown rather than emitting incorrect values.
- Prefer conservative modeling: it is better to omit a connection than to invent one.
- Use
tagsor namespaced metadata to document limitations (e.g. “no routing table available”, “LLDP disabled”).
Producers MUST NOT:
- Emit placeholders that look real in production exports (e.g. fake serials, fake hostnames, fake IPs) unless explicitly flagged as redacted/anonymized.
11.1.6 Validation workflow for producers
A producer MUST validate the output before publishing:
- Level 1 (structural): JSON Schema validation
- Level 2 (semantic): ID uniqueness, reference integrity, type format rules
- Level 3 (domain): optional type recognition and best-practice checks
Producers SHOULD fail the export pipeline on Level 1 errors.
Producers SHOULD treat Level 2 errors as export failures unless explicitly configured otherwise.
11.1.7 Document splitting and export scope
Large infrastructures may require multiple split documents.
Producers MAY split by:
- Provider hierarchy (account/subscription/project, or subscription/resource group/resources)
- Region and availability zone
- Environment (prod/stage/dev)
- Physical site (data center, plant)
- Domain boundary (IT vs OT)
When splitting, producers SHOULD:
- Ensure scope is clearly described in
metadata.scope. - Keep IDs consistent across documents (so consumers can merge reliably).
11.1.8 Logging and telemetry
Producers are often executed in automated pipelines (CI/CD, scheduled exports, inventory collectors). Consistent logging and basic telemetry greatly improve troubleshooting, reliability and performance tuning.
11.1.8.1 What to log during parsing
Producers SHOULD emit structured logs (JSON logs recommended) with a stable set of fields.
Recommended log events:
- Run start/end
- Export scope summary (provider, account/subscription/project, region/site, environment)
- Input source (API/CLI/file) and collector version
- Discovery summary
- Counts: discovered resources, emitted resources, emitted connections, emitted groups
- Skipped items and reasons (unsupported type, missing permissions, filtered by scope)
- Normalization decisions
- Type mapping used (native type > OSIRIS type) when non-obvious
- ID strategy (native ID used vs derived tuple)
- Validation results
- Schema validation pass/fail
- Semantic validation pass/fail
- Rule identifiers and JSONPath for failures (when applicable)
Logs SHOULD include:
run_id(unique per export execution)generator.nameandgenerator.versionscopeidentifiers frommetadata.scopeseverity(debug,info,warn,error)event(stable event name)- Optional
resource_id(OSIRISid) and/orprovider.native_idwhen a log line is resource-specific
Producers MUST NOT log secrets or sensitive values (see Chapter 13).
Example of a structured log output:
{
"timestamp": "2026-01-16T22:23:45Z",
"severity": "info",
"event": "discovery_complete",
"run_id": "20260115-102340-aws-prod",
"generator": {
"name": "osiris-aws-parser",
"version": "1.0.0"
},
"scope": {
"provider": "aws",
"account": "123456789012",
"regions": ["eu-west-1"]
},
"counts": {
"resources_discovered": 847,
"resources_emitted": 842,
"connections_emitted": 1203,
"groups_emitted": 45,
"skipped": 5
},
"skipped_reasons": {
"unsupported_type": 3,
"missing_permissions": 2
}
}
11.1.8.2 Performance metrics to track
Producers SHOULD track basic metrics to detect regressions and capacity issues:
- Timing
- Total runtime
- Time spent per phase: discovery, normalization, relationship inference, validation, serialization
- Volume
- Resources emitted, connections emitted, groups emitted
- Input objects fetched/parsed (if different from emitted)
- API/IO
- API request count (by endpoint if possible)
- API error count and retry count
- Rate-limit/backoff occurrences
- Quality
- Validation errors and warnings count (by rule ID if available)
- Skipped/filtered count (by reason)
- Resource usage (optional)
- Peak memory usage
- Output document size (bytes)
Metrics SHOULD be emitted in a machine-readable form suitable for pipeline dashboards.
11.1.8.3 Error reporting best practices
Producers SHOULD classify failures into clear categories:
- Source access errors
- Authentication/authorization failures, missing permissions, rate limits
- Parsing / normalization errors
- Unexpected native formats, unsupported types, missing required source fields
- Validation errors
- Level 1 (schema) and Level 2 (semantic) failures
- Operational errors
- IO failures, serialization issues, timeouts
When reporting an error, producers SHOULD include:
- A stable error code or rule identifier (when applicable)
- Human-readable message
- JSONPath to the failing OSIRIS element (if the error is in emitted data)
- The smallest useful context (e.g. provider name, native id, OSIRIS id)
- A remediation hint (e.g. required permission, missing API scope, mapping fix)
Producers SHOULD:
- Exit with non-zero status on Level 1 errors by default.
- Provide a configurable mode to downgrade selected semantic issues to warnings only when explicitly requested.
- Avoid cascading failures: continue collecting other resources when safe, but fail the run if the resulting output would be invalid.
Producers MUST:
- Never include credentials, tokens or secrets in logs, error messages or stack traces (see Chapter 13).
- Avoid logging raw payloads unless explicitly enabled in debug mode and appropriately redacted.
11.1.8.4 Observability platforms integration
[!NOTE] OSIRIS remains a static snapshot format.
This subsection describes optional integration patterns for shipping OSIRIS snapshots into observability platforms.
The transport, storage, indexing and retention model are implementation concerns and outside the OSIRIS core scope.
Producers deployed in production environments MAY integrate with observability platforms in two ways:
Parser operational telemetry (monitoring the parser itself):
- Metrics systems: Zabbix, Prometheus, Datadog, CloudWatch, Azure Monitor
- Track parser performance, API usage, validation results
- Log aggregation: ELK, Splunk, Loki, CloudWatch Logs
- Centralize parser logs for troubleshooting
- Tracing: OpenTelemetry, Jaeger, Zipkin
- For parsers with complex multi-step flows
Infrastructure topology snapshots (OSIRIS documents as observability artifacts):
- Observability platforms MAY ingest OSIRIS documents as a snapshot series to enable:
- Topology change tracking and visualization
- Configuration drift detection
- Incident correlation with infrastructure changes
- Compliance monitoring across snapshots
- Platforms with topology/service map capabilities may support this natively or via plugins
- Producers emitting documents for this purpose SHOULD run at consistent intervals and maintain stable IDs across snapshots
When integrating with observability systems, producers SHOULD:
- Use consistent metric names with appropriate prefixes (e.g.
osiris.parser.aws.) - Include standard labels/tags:
parser_name,parser_versionscope_provider,scope_regionsnapshot_timestamp,document_size_bytesresource_count,connection_count,group_count
- Support sampling/filtering to manage high-volume telemetry
Producers MUST ensure that telemetry integration does not expose sensitive data (Chapter 13).
11.2 Consumer implementation
11.2.1 Version handling and negotiation
Consumers MUST read version and apply compatibility rules (Chapter 12).
Consumers MUST ignore unknown fields as required for forward compatibility.
Consumers SHOULD:
- Support all
1.x.ydocuments within the same major version when feasible. - Provide clear diagnostics when a document uses an unsupported major version.
11.2.2 Consumer validation policy
Consumers MUST perform Level 1 validation (or equivalent structural checks) before processing. Consumers SHOULD perform Level 2 validation before building graph structures.
Consumers MAY implement configurable strictness:
basic: Level 1 onlydefault: Level 1 + Level 2strict: Level 1 + Level 2 + selected Level 3 rules
11.2.3 Graph construction and traversal
Consumers SHOULD treat:
resourcesas nodesconnectionsas edges where:bidirectionaltreat as undirected for traversalforwardsource to targetreversetarget to source
groupsas membership relations (one-to-many references)
Consumers SHOULD build efficient indexes:
resourceByIdconnectionsBySource,connectionsByTargetgroupsById,membershipsByResource
Consumers MUST NOT assume ordering in arrays is meaningful.
11.2.4 Unknown types and extensions
Consumers MUST accept unknown:
- Resource types
- Group types
- Connection types (if structurally valid)
- Extension namespaces
Consumers encountering unknown types SHOULD:
- Preserve them when re-exporting/translating
- Display type strings verbatim for debugging
- Continue processing known core fields
11.2.5 Merging diff and snapshot correlation
Consumers often process multiple OSIRIS documents over time.
Recommended approach:
- Treat
metadata.timestampas the snapshot point in time. - Use stable
resource.idas the primary key for correlation. - If merging multiple documents, namespace collisions MUST be prevented (IDs must remain unique in the merged graph).
Consumers SHOULD support “soft merge” strategies when duplicates occur (e.g. prefer newest timestamp, or prefer a configured source).
11.3 Best practices
This section provides recommended practices for producers (parsers/exporters) and consumers (readers/ingestion tools) to maximize interoperability, stability and long-term maintainability.
11.3.1 Best practices for producers
-
Prefer standard types first
Map native objects to the standard resource types from Chapter 7 whenever possible. Use custom types only when no suitable standard type exists. -
Use
propertiesfor generic data andextensionsfor vendor/org specifics
Put broadly applicable attributes inproperties. Put vendor-specific or organization-specific details inextensionsusing a namespaced key (Chapter 8). Avoid duplicating the same data in both locations unless required for interoperability. -
Generate stable, deterministic
idvalues
IDs should remain stable across exports for the same underlying entity. Prefer buildingidfrom stable provider/native identifiers. Avoid random IDs for real resources. -
Always include provider traceability
Populateprovider.nameand strongly preferprovider.native_id. Include stable scope context (account/subscription/project, region/site) when available. -
Model relationships explicitly
Useconnectionsfor graph edges that must be traversed (paths, dependencies, flows). Usegroupsto represent boundaries and classification (zones, clusters, environments, ownership). Avoid encoding relationships only insideproperties. -
Be conservative when data is incomplete
Omit unknown optional fields rather than guessing. Prefer missing relationships over invented ones. Document known collection limitations viatagsor namespaced metadata/extension fields. -
Validate before publishing
Integrate validation in the export pipeline: Level 1 (schema) and Level 2 (semantic) should be treated as failures by default. Use Level 3 checks as optional strict mode. -
Keep exports reproducible
For the same input snapshot, outputs should be deterministic. Avoid non-deterministic ordering or unstable derived values. -
Split documents by stable boundaries
For large infrastructures, split by account/subscription/project, region, environment, site or IT/OT domain boundary. Ensuremetadata.scopeclearly describes what the document contains.
Recommended minimum metadata
"metadata": {
"timestamp": "2026-01-01T10:30:00Z",
"generator": {
"name": "osiris-aws-parser",
"version": "1.0.0" },
"scope": {
"providers": "aws",
"regions": ["eu-west-1"],
"accounts": "123456789012",
"environment": "prod"
}
}
11.3.2 Best practices for consumers
-
Validate early, fail safely
- Consumers MUST perform Level 1 validation (or equivalent structural checks) before processing.
- Consumers SHOULD perform Level 2 validation before building graph structures (IDs, references, type format rules).
- Consumers MAY offer configurable strictness (e.g.
basic,default,strict) to match different use cases.
-
Implement forward compatibility by default
- Consumers MUST ignore unknown fields to support newer documents and extensions.
- Consumers MUST accept unknown resource, connection and group types if the objects are structurally valid.
- Consumers MUST accept unknown
extensionsnamespaces and treat them as opaque data.
-
Do not rely on array ordering
- Consumers MUST NOT assume ordering of
resources,connectionsorgroupsis meaningful. - Consumers SHOULD operate using IDs and indexes rather than positions.
- Consumers MUST NOT assume ordering of
-
Build indexes before complex processing Consumers SHOULD build efficient lookup structures such as:
resourceByIdconnectionsBySource,connectionsByTargetgroupsByIdmembershipsByResource(reverse membership index)
-
Preserve fidelity during transformation
- When filtering, translating or re-exporting documents, consumers SHOULD preserve:
- unknown fields
- unknown types
- unknown extension namespaces
- Consumers SHOULD NOT drop or rename data unless explicitly configured.
- When filtering, translating or re-exporting documents, consumers SHOULD preserve:
-
Treat the topology as a graph
- Consumers SHOULD interpret:
resourcesas nodesconnectionsas edges where:bidirectionaltreat as undirected for traversalforwardsource to targetreversetarget to source
groupsas membership relations (classification/boundaries)
- Consumers SHOULD keep a clear separation between graph edges (
connections) and classification (groups).
- Consumers SHOULD interpret:
-
Support snapshot correlation and diffing
- Consumers SHOULD treat
metadata.timestampas the snapshot time. - Consumers SHOULD use stable
resource.idvalues as primary keys for correlation across snapshots. - When merging documents, consumers MUST prevent ID collisions (IDs must be unique in the merged graph).
- Consumers SHOULD treat
-
Provide actionable diagnostics Consumers SHOULD emit diagnostics that include:
- severity (
error/warning/info) - rule identifier when available (from Chapter 9)
- JSONPath (or equivalent) to the failing element
- a human-readable message
- a suggested fix or remediation hint
- severity (
-
Handle partial data gracefully
- Consumers SHOULD continue processing when optional fields are missing.
- Consumers SHOULD avoid inferring relationships unless explicitly required and clearly marked as derived.
- Consumers MAY expose confidence or provenance for derived insights.
-
Security-aware ingestion
- Consumers SHOULD treat all string fields (including
extensions) as untrusted input. - Consumers SHOULD sanitize or escape data before rendering in UIs or exporting to other formats.
- Consumers SHOULD support redaction and filtering policies before storing or sharing documents (see Chapter 13).
- Consumers SHOULD treat all string fields (including
11.3.3 Common pitfalls
This section highlights recurring implementation mistakes and provides practical tips to improve interoperability across exporters, validators, pipelines and visualization tools.
Common pitfalls to avoid
-
Unstable resource IDs
- Using random UUIDs for real resources breaks correlation, diffing and merging across snapshots.
- Suggestion: derive
iddeterministically from stable provider identifiers (provider.name+provider.native_id) or a stable tuple when native IDs are not available.
-
Duplicate or conflicting identity fields
- Storing the same identifier in multiple places (e.g.
id,provider.native_idand a custom extension) without a clear rule creates ambiguity. - Suggestion: keep
idas the document identifier andprovider.native_idas the authoritative provider locator; use extensions only for additional native identifiers.
- Storing the same identifier in multiple places (e.g.
-
Encoding relationships implicitly
- Hiding dependencies or topology relationships in
properties(strings likeconnected_to,peer,uplink) prevents graph traversal and consistent tooling. - Suggestion: use
connectionsfor traversable edges andgroupsfor classification/boundaries.
- Hiding dependencies or topology relationships in
-
Misusing groups as topology edges
- Treating group membership as a network link or dependency leads to incorrect graph semantics.
- Suggestion: groups organize and classify; connections express relationships that can be traversed.
-
Rejecting unknown types or namespaces
- Consumers that fail on unknown resource types or unknown extension namespaces are not forward-compatible.
- Suggestion: accept unknown types/namespaces if structurally valid; preserve them as opaque data.
-
Assuming array ordering is meaningful
- Depending on the order of
resources,connectionsorgroupscauses non deterministic behavior across exporters and serializers. - Suggestion: always index by
idand traverse via references.
- Depending on the order of
-
Dropping unknown fields during transformation
- Normalizers and converters that remove fields they do not recognize can silently destroy information.
- Suggestion: preserve unknown fields by default; only drop data when explicitly configured.
-
Inventing values for unknowns
- Filling missing fields with “fake but plausible” values (serials, IPs, hostnames) contaminates datasets and downstream automation.
- Suggestion: omit unknown optional fields; if anonymization is required, mark it clearly and use consistent redaction patterns.
-
Merging documents without collision control
- Combining OSIRIS documents can introduce duplicate IDs and broken references.
- Suggestion: define a merge strategy and enforce uniqueness (e.g. stable global IDs, or deterministic prefixing rules at merge time).
-
Overloading
extensionswith core semantics- Putting essential cross-tool data only in extensions reduces interoperability.
- Suggestion: place broadly relevant attributes in core fields (
name,provider,properties), use extensions for vendor/org-specific details.
11.3.4 Interoperability tips and checklist
Tips
-
Normalize identity early
- In ingestion pipelines, build a consistent internal key using
{resource.id}first and{provider.name + provider.native_id}as a secondary correlation hint.
- In ingestion pipelines, build a consistent internal key using
-
Emit and consume explicit scope
- Producers should populate
metadata.scope; consumers should surface it for users and use it to avoid accidental cross-scope merges.
- Producers should populate
-
Prefer conservative inference
- If you infer connections or groups (e.g. from naming, subnets, LLDP), mark them as derived (e.g. via
tagsor namespaced metadata) and avoid overwriting explicit data.
- If you infer connections or groups (e.g. from naming, subnets, LLDP), mark them as derived (e.g. via
-
Use validation levels consistently
- Producers should fail exports on Level 1 and usually Level 2 errors.
- Consumers should offer strictness modes and clearly distinguish errors vs warnings.
-
Preserve unknown namespaces end-to-end
- A good ecosystem behavior is: parse > validate > enrich > re-export without losing unknown namespaces or future fields.
Practical checklist
- Resource IDs are deterministic and stable across exports for the same entity.
- All
connectionsandgroupsreference validresource.idvalues. - Provider traceability is present (
provider.nameand preferablyprovider.native_id, plus stable scope context). - Consumers ignore unknown fields, unknown types and unknown extension namespaces without failing.
- No implementation assumes array ordering for correctness.
- Transformation tools preserve unknown fields unless explicitly configured otherwise.
- Merge operations prevent ID collisions and do not break references.
- No invented “real-looking” values are emitted for unknown data unless explicitly flagged as redacted/anonymized.