My little blog

Definition

Blog

Motivation

Use cases and requirements

Cheat sheets

Quick reference

Screw long documentation, I’ll learn by example

Objects and abstractions

Schema

Schema defines data structure. It is a state of an object (typically on business level), which is valid for multiple applications and services. Data from schemas may be scattered on several databases. Schemas may be nested when children belong only to one parent.

User: schema (
    [version: 5]
    [maintainer: test@gmail.com]
    
    user_id: int [key]
    name: str
    age: int
    purchases: List[Purchase]
    Purchase: schema (
        order_id: int [key]
        items: List[Product]
    )
)

Product: schema (
    product_id: int [key]
    name: str
    ts: timestamp
)

We may reference schema’s fields as data types to add semantics. For example, by writing oid: Purchase.order_id we are saying that variable oid has type int, but actually it is a reference to Purchase’s order_id.

Definitions can be split into multiple files. Deployment (~) or variants (!) modify existing values. To “extend” definition outside of the original file, you just start with _ .

This adds two fields to the Product definition

_Product (
    image_url: str
    page_url: str
)

schema.config

subtype of schema that describes configuration file format.

For that subtype, modifier FORMAT[filepath] can be set for the schema itself to specify along with [env: ENVVAR_NAME] or [config: <config file path and option name>], or hardcoded values hostname: str = sample for each field.

FORMAT is one of: json, yaml, ini, xml, toml

Example:

UserMongoCreds: config json[env: "MONGO_CRED_PATH"] (
    username: str
    password: str [env: SECRET_MONGO_PASSWORD]
    hostname: str
    port: int
    
    hostname~prod = "prod-user-mongo"
    hostname~dev = "prod-user-mongo"
)

schema.enum

Advanced enum. It can be used to assign non-trivial values to enums. It is useful when different components have different enums, and they must be mapped to each other.

Example:

GenderInDB: enum(
    male = "male"
    female = "female"
)

GenderOnSite: enum(
    adult
    adult.male = "m" same[GenderInDB.male]
    adult.female = "f" same[.female]
    adult.unisex = "u" similar[.female, .male]
    kids = "k"
    kids.girls = "g" similar[.female]
    kids.boys = "b" similar[.male]
)

Nodes, applications, clusters and projects

Project is a complete collection of repos, machines and services that can be interacted by an external user. For instance, your online shop is a “project”

Cluster is a set of machines, and node is a single machine (virtual or dedicated).

Application is a piece of software that runs on a node. There are subtypes of applications:

services (app.service) which are expected to start accepting requests on start-up, and hang indefinitely
pipelines (app.pipeline) which are launched by a trigger or schedule. They transform data, and halt.

Project: projectname
MyCluster: cluster(
    autoscale: ...
    nodenames: ...
    kubernetes_config_path: ..
    MyNode: node(
        hostname: ...
        region: ...
        bandwidth: ...
        ram: ...
        hdd: ...
        sdd: ...
        gpu: ...
        cpu: ...
        docker_image: ...
        dockerfile: ...
        MyApp: app(
            triggered: <schedule>
            # 
            >-- inName : mode[type] -->
            |>-- inName : mode[type] -->
            <-- outName: mode[type] --<
            |<-- outName: mode[type] --<
            |< apiCallName:   argmode[argtype] 
            |> apiCallName:   argmode[argtype]
            - - - > apiDefinition: argmode[argtype] < retval: argmode[argtype]
        )
    )
)

A > B , A |> B, B <| A A sends/writes some data to B

We may specify sent data and stream type:

A batch[data] > B , A |batch[data]> B, B <batch[data]| A A sends/writes a batch of data to B

B >| A, A |< B A reads data from B

A|request>--<response B

Storages, components and pipelines

Apps are made of components and stateful storages.

Component is logic that processes data. It is defined by its input data streams, output data streams, exposed APIs, and imported APIs from other components.

We do not care what’s going on inside the component. If we have to describe what’s inside the component, we break it down into components and storages as well.

If there are no storages inside the component (even existing but invisible because we did not state what’s inside the component), then it must be stateless.

Storages hold data. One storage may hold multiple schemas.

storageA: storage (
    sometable: schema@t (
        replication: ...
        size ~~ 1Mb
        records ~~ 1M
        retention: 30d
        retention: 1M records
        retention: 10Mb
        transaction_id: int [key]
        order_id: Purchase.id
        somedata: str
        Purchase.state@t' = "done"
    ) 
)

storageA/sometable filter[transaction_id = 123]

storageA/sometable filter[somedata = "qwe"]

Pages