OSIRIS JSON Toolbox Core - Validation Engine Internals

Field	Value
Authors	Tia Zanella skhell
Revision	1.0.0-DRAFT
Creation date	14 February 2026
Last revision date	15 February 2026
Status	Draft
Document ID	OSIRIS-ADG-TLB-CORE-1.0
Document URI	OSIRIS-ADG-TLB-CORE-1.0
Document Name	OSIRIS JSON Toolbox Core - Validation Engine Internals
Specification ID	OSIRIS-1.0
Specification URI	OSIRIS-1.0
Schema URI	OSIRIS-1.0
License	CC BY 4.0
Repository	github.com/osirisjson/osiris-toolbox

Table of Content

Table of Content
1 Engine internals
2 The diagnostic model
- 2.1 Internal representation of fiagnostic objects
- 2.2 JSON pointer and range resolution engine
3 Performance & scaling
- 3.1 Concurrency model
- 3.2 Memory management for massive graphs
4 Extensibility (experimental/future)
- 4.1 Rule plugin architecture (planned)
- 4.2 Custom rule packaging conventions

1 Engine internals

@osirisjson/core is the canonical validation engine for the OSIRIS ecosystem. It is implemented in TypeScript, distributed as an NPM package and consumed by the CLI, editor integrations and third-party tools. This chapter defines its internal architecture: how the schema is loaded and cached, how the three-stage pipeline executes and when stages are skipped.

[!NOTE] Back-reference: The validation levels model (codes, severities, profiles) is defined in OSIRIS-ADG-VL-1.0. The ecosystem boundaries, dependency rules and canonical truth rule are defined in OSIRIS-ADG-1.0. For standard JSON Schema mechanics, refer to json-schema.org. All V-* codes referenced in this guide are illustrative unless present in the published Diagnostic Code Registry for the targeted OSIRIS specification version. This guide MUST NOT be treated as inventing codes; the registry is the canonical source.

Public API surface (frozen within MAJOR)

The engine exposes a single entry point. This is the public contract of @osirisjson/core:

function validate(
  document: object,
  options?: {
    profile?: 'basic' | 'default' | 'strict';
    sourceText?: string;
    maxDiagnosticsPerCode?: number;
    maxTotalDiagnostics?: number;
    plugins?: RuleDefinition[]; // experimental, stage 3 only
  }
): Diagnostic[];

Inputs: a parsed JSON object (required) and optional raw source text (enables range resolution for editor integrations). All other fields have sensible defaults. The return value is an immutable Diagnostic[] array ordered by pipeline stage.

1.1 Schema loading and memory caching

1.1.1 Version routing (SemVer + forward compatibility)

OSIRIS document.version MUST be treated as SemVer.

MAJOR mismatch (unsupported): emit V-DOC-004 (error) and stop validation (no Stage 2/3).
MAJOR supported, MINOR newer than supported: select the highest supported MAJOR.MINOR schema as a fallback, emit V-DOC-005 (warning/info), and continue validation.
PATCH differences: MUST be considered compatible within the same MAJOR.MINOR and MUST NOT influence schema selection.

This behavior preserves OSIRIS forward-compat guarantees: validators accept newer patch/minor versions within a supported major and continue validating what they understand.

Schema selection algorithm (normative):

Parse version as SemVer > (major, minor, patch).
If major not in supportedMajors: ERROR V-DOC-004 and stop.
Let best = maxSupportedMinor(major).
If minor > best: use schema (major.best) and emit V-DOC-005 (fallback notice).
Else: use schema (major.minor).

[!NOTE] Schema compilation MUST be cached per resolved (MAJOR.MINOR) key (not per patch).

1.1.2 Schema engine configuration (implementation guidance: Ajv)

@osirisjson/core currently uses Ajv as its JSON Schema engine. This subsection is implementation guidance (not a spec requirement), provided to keep validator behavior deterministic and editor-friendly:

Set allErrors: true to collect a complete structural error set for Stage 1.
Set strict: false to avoid strict-mode warnings blocking execution when schemas evolve; this does not reduce validation rigor, it only relaxes Ajv strictness checks.
Cache compiled validators per resolved (MAJOR.MINOR) schema key; compile once, reuse across validations (never recompile per call).

1.1.3 Schema resolution precedence (bundled > local cache > update step)

Schema resolution MUST be deterministic and offline-first:

Bundled schemas (preferred): the engine MUST first resolve schemas from the versions shipped with @osirisjson/core (for all supported MAJOR.MINOR targets).
Local cache (optional): if enabled by the consumer, the engine MAY load a schema from a local on-disk cache (e.g. previously downloaded or enterprise-mirrored), but only if it matches the selected (MAJOR.MINOR) key.
No network during validation: the engine MUST NOT fetch schemas from the network as part of validate(). Any remote retrieval MUST occur in an explicit, separate “update/warm-cache” step outside the validation pipeline.

This guarantees validation remains reproducible, fast, and safe in editors and CI environments.

1.2 Sequential pipeline execution (Level 1 > 2 > 3)

The engine executes validation as a strictly ordered, three-stage pipeline. Each stage assumes the guarantees established by the stages before it.

flowchart LR
  INPUT["OSIRIS document
  (parsed JSON)"] --> L1

  subgraph PIPELINE["@osirisjson/core pipeline"]
    direction LR

    L1["Stage 1
    Structural"] --> GATEP{profile == basic?}

    GATEP -- "yes" --> EMIT1["Emit diagnostics
    (Stage 1 only)"]

    GATEP -- "no" --> GATE1{L1 errors?}

    GATE1 -- "any error" --> SHORT1["Short-circuit:
    skip L2 + L3"]

    GATE1 -- "none" --> L2["Stage 2
    Semantic"]

    L2 --> GATE3{profile == strict?}

    GATE3 -- "yes" --> L3["Stage 3
    Domain"]

    GATE3 -- "no" --> SKIP3["Skip L3"]
  end

  SHORT1 --> EMIT["Emit diagnostics"]
  SKIP3 --> EMIT
  L3 --> EMIT
  EMIT1 --> EMIT

1.2.1 Stage 1 mapping and emission policy

Stage 1 structural validation produces a 1:1 mapping from the underlying JSON Schema engine findings to OSIRIS Diagnostics before any global emission policies are applied.

The structural validator output MUST be representable as an equivalent set of OSIRIS Diagnostics (same meaning, same locations, same severity classification for schema violations).
Diagnostic caps / throttling are a post-processing emission policy applied after Stage 1/2/3 collection. Caps MAY suppress diagnostics from any stage, including Stage 1, but MUST:
- be deterministic,
- add a single summary diagnostic (e.g. V-DIAG-001) indicating suppression occurred,
- never change the meaning of diagnostics that are emitted.

This guarantees Stage 1 correctness while keeping UI/CLI output bounded.

1.2.2 Stage 2: Semantic (integrity)

Builds in-memory indexes from the parsed document and executes referential integrity checks, uniqueness checks and hierarchy safety checks.

Indexing phase (single pass): The engine walks topology.resources, topology.connections and topology.groups exactly once to build:

Index	Key	Value	Purpose
`resourceIndex`	`resource.id`	array position (number)	O(1) lookup for connection source/target and group member resolution
`connectionIndex`	`connection.id`	array position (number)	O(1) duplicate detection
`groupIndex`	`group.id`	array position (number)	O(1) lookup for children resolution and duplicate detection

During index construction, duplicate IDs are detected immediately (codes V-ID-001, V-ID-002, V-ID-003).

Checking phase: After indexes are built, checks execute against the indexed data:

Referential integrity - for each connection, verify source and target exist in resourceIndex (V-REF-001, V-REF-002). For each group, verify every members[] entry exists in resourceIndex (V-REF-003) and every children[] entry exists in groupIndex (V-REF-004).
Cycle detection - walk group children[] edges using a depth-first traversal with a visited-set to detect cycles (V-REF-005). The algorithm is O(V+E) where V = groups, E = children edges.
Self-reference guard - reject groups that list themselves in their own children[] array (trivial cycle, same code V-REF-005).

1.2.3 Stage 3: Domain (best practices)

Executes optional, opinionated rules against the indexed document. Stage 3 rules are organized as an iterable collection. The active validation profile (basic, default, strict) determines which rules run and at what severity.

Domain rules access the same indexes built during Stage 2. They MUST NOT make network calls, inspect file-system state or perform any non-deterministic operation.

When Stage 2 has emitted errors (e.g. broken references, duplicate IDs), Stage 3 rules SHOULD fail soft: if a rule’s prerequisite data is missing or unreliable due to upstream semantic breaks, the rule SHOULD skip the affected entity silently or emit at info/warning severity rather than producing misleading error diagnostics. Stage 3 rules MUST NOT assume Stage 2 passed cleanly.

1.3 Short-circuit behavior (L1 structural failure skipping L2/L3)

If Stage 1 emits any diagnostic with severity error, the pipeline MUST skip Stages 2 and 3 entirely. This is the engine’s short-circuit contract.

Rationale: Stage 2 assumes the document satisfies the structural schema. Building indexes from a structurally invalid document (e.g. topology.resources is a string instead of an array) would produce unpredictable failures and misleading diagnostics. Skipping downstream stages keeps error output focused and trustworthy.

Short-circuit rules:

Stage 1 errors > skip Stage 2 and Stage 3.
Stage 2 errors > Stage 3 still runs (if enabled by profile). Semantic errors do not block domain checks because Stage 3 rules are designed to tolerate broken references gracefully.
The engine MUST include all Stage 1 diagnostics in the output even when short-circuiting. Consumers rely on the full error set for debugging.
Schema (Level 1) violations MUST always be emitted as error in all profiles. No profile MAY downgrade a structural violation below error.

Profile interaction:

The basic profile runs Stage 1 only regardless of outcome.
The default profile runs Stage 1 + Stage 2 (with short-circuit from Stage 1).
The strict profile runs all three stages (with short-circuit from Stage 1).

2 The diagnostic model

Diagnostics are the engine’s sole output. Every validation finding is expressed as a Diagnostic object that can be consumed uniformly by CLI, editors and third-party tools.

[!NOTE] Back-reference: The diagnostic code registry, severity semantics and profile mapping are defined in OSIRIS-ADG-VL-1.0 chapter 2. The diagnostic model contract (minimum and optional fields) is defined in OSIRIS-ADG-1.0 section 4.1.1.

2.1 Internal representation of fiagnostic objects

The engine defines Diagnostic as a TypeScript interface. This interface is the primary public contract of @osirisjson/core; its shape MUST NOT change within a MAJOR version.

2.1.1 Code registry and examples

Any diagnostic codes referenced in this guide are normative only if present in the published Diagnostic Code Registry for the targeted OSIRIS spec version. If a code is not present in the registry, treat it as illustrative and do not rely on it in production.

interface Diagnostic {
  /** Stable code from the OSIRIS diagnostic registry (e.g. "V-REF-002") */
  code: string;

  /** Severity assigned by the active profile */
  severity: 'error' | 'warning' | 'info';

  /** Human-readable explanation (not normative, may evolve) */
  message: string;

  /** JSON Pointer (RFC 6901) to the relevant value */
  path: string;

  /** Source range when original text is available (0-based, LSP-style) */
  range?: DiagnosticRange;

  /** Related locations for cross-reference findings */
  related?: RelatedLocation[];

  /** Structured quick-fix metadata (deterministic and non-destructive only) */
  fix?: DiagnosticFix;
}

interface DiagnosticRange {
  start: Position;
  end: Position;
}

interface Position {
  line: number; // 0-based
  character: number; // 0-based
}

interface RelatedLocation {
  path: string;
  range?: DiagnosticRange;
  message: string;
}

interface DiagnosticFix {
  description: string;
  edits: FixEdit[];
}

interface FixEdit {
  path: string;
  range: DiagnosticRange;
  newText: string;
}

2.1.2 Diagnostic construction guidelines

code MUST come from the machine-readable diagnostic code registry. Rules MUST NOT invent ad-hoc codes at runtime.
severity is resolved from the registry’s defaultSeverity (or strictSeverity) field based on the active profile. Rules emit a code; the engine maps it to a severity.
message is built from a message template in the registry, interpolated with context values (e.g. the offending ID, the expected type). Message strings are for humans and MAY evolve.
path is always a JSON Pointer. When the finding relates to a nested value, the pointer MUST be fully qualified (e.g. /topology/connections/0/target, not connections[0].target).
range is only populated when the engine has access to the source text and can map JSON Pointer positions to line/character offsets.
fix is only emitted when the engine can determine a safe, deterministic and non-destructive correction.

2.1.3 Diagnostic accumulation

The engine maintains a Diagnostic[] array that grows across all pipeline stages. Diagnostics are appended in emission order (Stage 1 first, then Stage 2, then Stage 3). The array is the engine’s return value and is never mutated after the pipeline completes.

2.2 JSON pointer and range resolution engine

The engine MUST produce accurate location information for every diagnostic. This requires two cooperating subsystems: a JSON Pointer builder and an optional source-text range resolver.

2.2.1 JSON pointer builder (always available)

During validation, the engine knows the structural path to every value being checked. Ajv provides error paths natively (as JSON Pointer strings). For Stage 2 and Stage 3 rules, the engine constructs pointers from the indexes:

Context	Pointer pattern
Resource at index `i`	`/topology/resources/{i}`
Connection at index `j`, field `target`	`/topology/connections/{j}/target`
Group at index `k`, member at index `m`	`/topology/groups/{k}/members/{m}`
Metadata timestamp	`/metadata/timestamp`

The pointer is always derived from the parsed JSON structure (array indices, object keys). It MUST NOT depend on document formatting or whitespace.

2.2.2 Range offsets (editor correctness)

The position map MUST be defined in terms of UTF-16 code-unit offsets (string index) to match LSP expectations.

The resolver MAY additionally store UTF-8 byte offsets for interoperability, but OSIRIS range values MUST be computed in UTF-16 offsets when targeting VS Code/LSP.
range MUST be treated as optional and computed lazily; if raw text is not available, omit range.

2.2.3 Source-text range resolver (optional)

When the engine is invoked with the raw document text (in addition to the parsed JSON object), it MAY resolve JSON Pointers to DiagnosticRange values. This is the path that editor integrations use to render squiggles and inline diagnostics.

Resolution strategy:

Parse the source text into a lightweight token/position map that records the UTF-16 code-unit offset (string index) and line/character position of every JSON key and value.
When a diagnostic is emitted with a JSON Pointer, look up the pointer segments in the position map to find the corresponding {start, end} range.
If the source text is not provided (e.g. CLI validating a parsed object from stdin), the range field is omitted and consumers fall back to path only.

Implementation guidance:

The position map is built lazily on first range resolution request, not on every validation run. CLI workflows that only need JSON Pointer paths SHOULD never pay the cost of source-text parsing.
The position map SHOULD be computed in a single pass over the source text. Avoid re-parsing the entire document for each diagnostic.
Line/character values follow LSP conventions: 0-based line numbers, 0-based UTF-16 character offsets.

3 Performance & scaling

OSIRIS documents range from minimal topology snapshots (<=1 KB) to super scale inventories with thousands of resources, connections and groups. The engine MUST remain responsive; interactive consumers SHOULD show progress indicators when running validation on large documents.

[!NOTE] Back-reference: Ecosystem performance constraints are defined in OSIRIS-ADG-1.0 section 5.1.

3.1 Concurrency model

@osirisjson/core is designed as a synchronous, single-threaded library. This is a deliberate architectural choice driven by ecosystem constraints.

Rationale:

The engine runs inside VS Code extension host (single-threaded Node.js process), inside CLI processes and inside browser-based validators. A synchronous, reentrant design is the lowest common denominator that works everywhere.
Concurrency belongs to the consumer, not the engine. The CLI may validate multiple files in parallel using worker threads or Promise.all; the editor may run validation in a background worker. In both cases, each engine invocation is a self-contained synchronous call.

Contract:

validate(document, options) is a synchronous function that returns Diagnostic[].
The engine MUST NOT spawn threads, workers or async operations internally.
The engine MUST be safe to call concurrently from multiple workers within the same process, provided each invocation receives its own document input. The compiled schema cache (section 1.1) is read-only after initialization and safe for concurrent access.

Consumer-side parallelism patterns:

Consumer	Strategy
CLI (batch mode)	Validate files concurrently using Node.js `worker_threads`. Each worker imports `@osirisjson/core` and validates independently.
Editor (VS Code)	Run validation in a Language Server Protocol (LSP) worker. Debounce keystrokes; validate on idle.
CI pipeline	Invoke `@osirisjson/cli` per file or directory; the CLI handles parallelism.

3.2 Memory management for massive graphs

The engine’s memory footprint is dominated by three structures: the parsed document, the Stage 2 indexes and the diagnostics array.

3.2.1 Index budget

Indexes are lightweight maps from string keys to integer positions. For a document with R resources, C connections and G groups, the total index size is approximately:

O(R + C + G) entries × (key string + integer)

For a 10,000 resource document with 20,000 connections and 500 groups, the index memory overhead is typically < 5 MB. The engine MUST NOT copy resource/connection/group objects into indexes; it stores array positions and resolves objects lazily from the original parsed document.

3.2.2 Diagnostic capping

Large documents with systemic errors (e.g. every connection references a non-existent resource) can produce thousands of diagnostics. Unbounded diagnostic lists degrade consumer performance (CLI output, editor rendering).

Capping rules:

The engine SHOULD enforce a configurable per-code cap (default: 50 diagnostics per code). After the cap is reached, the engine emits a single summary diagnostic: "N additional occurrences of V-REF-002 suppressed." with severity matching the original.
The total diagnostic count across all codes SHOULD be capped at a configurable limit (default: 500). After the global cap, the engine emits a summary and halts diagnostic accumulation.
Consumers MAY override caps via engine options (e.g. maxDiagnosticsPerCode, maxTotalDiagnostics).

3.2.3 Streaming-friendly design

The engine receives a fully parsed JSON object (not a stream) because JSON Schema validation requires the complete document. However, the engine SHOULD avoid creating deep copies of the input. Indexes reference positions into the original arrays; rules read values directly from the input object. This keeps peak memory close to sizeof(input) + sizeof(indexes) + sizeof(diagnostics).

4 Extensibility (experimental/future)

This chapter describes planned extension points for the validation engine. These APIs are experimental and MUST NOT be treated as stable contracts until promoted in a future MINOR release.

[!NOTE] This section documents architectural intent. Implementation details will be finalized based on community feedback and ecosystem needs.

4.1 Rule plugin architecture (planned)

The engine is designed to support third-party rules that run alongside built-in Stage 3 domain checks. This enables organizations to enforce custom policies (e.g. naming conventions, required tags, mandatory provider fields) without forking @osirisjson/core.

4.1.1 Plugin contract (DRAFT)

A rule plugin is a module that exports a RuleDefinition conforming to the following shape:

interface RuleDefinition {
  /** Code MUST follow the V-<FAMILY>-<NNN> format. Custom rules MUST use a family prefix outside the spec-reserved set (e.g. V-CUSTOM-001, V-ACME-001) */
  code: string;

  /** Stage at which the rule executes. Plugins are restricted to Stage 3 */
  stage: 3;

  /** Default severity for diagnostic emission */
  defaultSeverity: 'error' | 'warning' | 'info';

  /** Human-readable short title */
  title: string;

  /** The check function receives a read-only view of the document and the Stage 2 indexes. It returns zero or more diagnostics */
  check: (context: RuleContext) => Diagnostic[];
}

interface RuleContext {
  /** The parsed OSIRIS document (read-only) */
  readonly document: OsirisDocument;

  /** Stage 2 indexes for O(1) lookups (read-only) */
  readonly indexes: DocumentIndexes;

  /** Active profile name. */
  readonly profile: 'basic' | 'default' | 'strict';

  /** Helper to build a Diagnostic with correct code and severity */
  createDiagnostic: (
    path: string,
    message: string,
    overrides?: Partial<Pick<Diagnostic, 'severity' | 'related' | 'fix'>>
  ) => Diagnostic;
}

4.1.2 Plugin constraints

Plugins MUST be restricted to Stage 3 (domain). They MUST NOT override or interfere with Stage 1 (structural) or Stage 2 (semantic) behavior.
Plugin codes MUST NOT collide with spec-defined V-* code families (DOC, META, TPGY, RES, CONN, GRP, PROV, EXT, TYPE, REF, DOM, ID). Plugins SHOULD use organization-prefixed families (e.g. V-ACME-001).
Plugin check functions MUST be synchronous and deterministic. They MUST NOT perform I/O, network calls or non-deterministic operations.
Plugin check functions receive read-only context. Mutating the document or indexes is a contract violation.
Plugins execute with full JavaScript privileges within the host process. Consumers SHOULD treat third-party rule packages as code execution and only load plugins from trusted sources.

4.1.3 Plugin registration

The engine will accept plugins through the options object at validation time:

const diagnostics = validate(document, {
  profile: 'strict',
  plugins: [NamingRule, TagRequirementRule]
});

Plugins are appended to the Stage 3 rule collection and execute after all built-in domain rules.

4.2 Custom rule packaging conventions

To support discoverability and consistent distribution, custom rule packages SHOULD follow these conventions:

4.2.1 NPM package naming

Pattern: osirisjson-rule-<name> (e.g. osirisjson-rule-acme-policies)
Scoped alternative: @<org>/osirisjson-rule-<name>

4.2.2 Package exports

The package SHOULD export an array of RuleDefinition objects as its default export. The package.json SHOULD include keywords: ["osirisjson", "osirisjson-rule"] for discoverability and peerDependencies: { "@osirisjson/core": "^1.0.0" } to align with the engine version.

4.2.3 Documentation requirements

Each custom rule SHOULD include a README entry documenting the rule code and default severity, what the rule checks and why, a minimal invalid example with the expected diagnostic and remediation guidance for producers.