Data Flow
This page follows the main paths through DataLinq. It is intentionally higher-level than the source code, but it names the real subsystems so you can map the diagrams back to the implementation.
One-Page Mental Model
flowchart LR
Models["Source models"] --> Generator["Source generator"]
DatabaseSchema["Database schema"] --> CLI["CLI / tools"]
CLI --> Models
Generator --> Generated["Generated database and model types"]
Generated --> Metadata["Generated metadata draft"]
Metadata --> Factory["MetadataDefinitionFactory"]
Factory --> Frozen["Frozen runtime metadata"]
Generated --> Runtime["Runtime API"]
Frozen --> Runtime
Runtime --> Query["Query pipeline"]
Runtime --> Mutation["Mutation pipeline"]
Runtime --> Cache["Cache and invalidation"]
Query --> Provider["Provider"]
Mutation --> Provider
Provider --> Db[("Database")]
Cache --> Runtime
The core loop is:
- generate a strong model surface
- build finalized metadata
- execute reads and writes through that metadata
- keep caches coherent around provider-key identity
- report behavior through diagnostics
Model Generation Flow
sequenceDiagram
participant Dev as Developer
participant CLI as datalinq create-models
participant Provider as Provider metadata reader
participant Models as Source model files
participant Generator as Source generator
participant Output as Generated model files
Dev->>CLI: Run create-models
CLI->>Provider: Read live schema
Provider-->>CLI: DatabaseDefinition boundary
CLI->>Models: Create or refresh abstract models
Models->>Generator: Compile project
Generator->>Generator: Parse models and attributes
Generator->>Generator: Build typed metadata draft
Generator->>Output: Emit immutable, mutable, metadata, keys, relations
Generation is not only a convenience step. The generated output carries runtime hooks that providers now require during normal startup.
Provider Startup Flow
sequenceDiagram
participant App as Application
participant Db as Database<T>
participant Generated as Generated TDatabase
participant Factory as MetadataDefinitionFactory
participant Provider as Provider
participant Cache as DatabaseCache
App->>Db: new MySqlDatabase<T>(connectionString)
Db->>Generated: GetDataLinqGeneratedMetadata()
Generated-->>Db: MetadataDatabaseDraft
Db->>Factory: Build(draft)
Factory-->>Db: Frozen DatabaseDefinition
Db->>Generated: SetDataLinqGeneratedMetadata(metadata)
Db->>Provider: Initialize provider with metadata
Db->>Cache: Create table cache state
If the generated metadata hook is missing or invalid, startup should fail loudly. That failure is better than silently running with stale model assumptions.
Query Execution Flow
flowchart TD
A["db.Query().Employees<br/>.Where(...).OrderBy(...)"] --> B["Remotion parses expression tree"]
B --> C["DataLinq validates supported shape"]
C --> D["Build SQL for supported predicates, ordering, paging, scalar operators"]
D --> E["Execute key/select SQL through provider"]
E --> F{"Rows in cache?"}
F -- "hit" --> G["Reuse immutable instance"]
F -- "miss" --> H["Fetch missing row data"]
H --> I["Materialize immutable instance"]
I --> J["Store by provider key"]
G --> K["Apply supported projection"]
J --> K
K --> L["Return result"]
The query pipeline is intentionally bounded. It supports documented predicates, ordering, paging, projections, scalar aggregates, one explicit inner join shape, and relation-backed existence predicates. Unsupported expression shapes are rejected rather than guessed.
Direct Primary-Key Lookup
flowchart LR
A["Generated Get(...)"] --> B["Normalize to provider key"]
B --> C["TableCache.GetRow<TKey>"]
C --> D{"RowStore<TKey> hit?"}
D -- "yes" --> E["Return cached immutable"]
D -- "no" --> F["Provider fetch by primary key"]
F --> G["Materialize immutable"]
G --> H["Store in RowStore<TKey>"]
H --> E
Generated scalar keys use provider CLR values directly. Generated composite keys use generated DataLinqPrimaryKey structs. Dynamic DataLinqKey is a bridge for metadata-driven paths, not the preferred generated row-cache key.
Relation Traversal Flow
flowchart TD
A["department.Managers"] --> B["Generated relation property"]
B --> C["Read relation handle and provider foreign key"]
C --> D{"Relation index cached?"}
D -- "yes" --> E["Load related primary keys from index"]
D -- "no" --> F["Query provider for relation keys"]
F --> G["Populate relation index"]
G --> E
E --> H["Resolve target rows through table cache"]
H --> I["Return immutable relation collection"]
Relation traversal is lazy and cache-aware. That is why relation/index invalidation is part of the cache design, not an afterthought.
Mutation And Transaction Flow
sequenceDiagram
participant App as Application
participant Immutable as Immutable model
participant Mutable as Mutable wrapper
participant Tx as Transaction
participant Provider as Provider SQL
participant Cache as Cache state
App->>Immutable: Mutate(...)
Immutable-->>Mutable: Mutable copy
App->>Mutable: Change properties
Mutable->>Tx: Save / Insert / Update
Tx->>Provider: Execute write command
Provider-->>Tx: Persisted row/defaults
Tx->>Cache: Apply state changes
Cache-->>Tx: Invalidated or refreshed rows/indexes
Tx-->>App: Fresh immutable instance
DataLinq does not rely on invisible dirty tracking. The mutation object is the write surface, and the transaction owns when changes become durable.
Cache Invalidation Flow
flowchart TD
A["Mutation, manual clear, or external event"] --> B["DatabaseCache facade"]
B --> C{"Scope"}
C -- "database" --> D["Clear database caches"]
C -- "table" --> E["Clear table rows/indexes"]
C -- "row / rows" --> F["Convert key components to provider keys"]
F --> G["Remove typed row-store entries"]
G --> H["Invalidate affected relation/index buckets"]
D --> I["Record metrics"]
E --> I
H --> I
Precise invalidation uses provider-key values. When a signal cannot provide enough detail, DataLinq falls back to a conservative table/database clear.
Schema Validation And Diff Flow
sequenceDiagram
participant User as User
participant CLI as datalinq validate / diff
participant Source as Generated/source metadata
participant Provider as Live provider metadata
participant Compare as SchemaComparer
participant Diff as SchemaDiffScriptGenerator
User->>CLI: validate or diff
CLI->>Source: Load model metadata
CLI->>Provider: Read live schema metadata
Source-->>Compare: Model DatabaseDefinition
Provider-->>Compare: Database DatabaseDefinition
Compare-->>CLI: Supported-boundary differences
alt diff command
CLI->>Diff: Generate conservative SQL suggestions
Diff-->>User: SQL plus manual-review comments
else validate command
CLI-->>User: Text or JSON drift report
end
Validation and diffing are schema trust tools. They depend on the provider metadata support matrix and intentionally avoid pretending to be full migration execution.
Diagnostics Flow
flowchart LR
Runtime["Runtime activity"] --> Metrics["DataLinqMetrics"]
Query["Provider commands"] --> Metrics
Cache["Row/cache/index activity"] --> Metrics
Invalidation["Cache invalidation"] --> Metrics
Metrics --> Snapshot["In-process snapshot"]
Metrics --> Telemetry["System.Diagnostics.Metrics"]
The metrics model is hierarchical:
- runtime totals
- provider-instance metrics
- table-level cache and relation metrics
That shape avoids flattening different provider instances or table caches into one misleading number.