Pages
Definition
Blog
Cheat sheets
Screw long documentation, I’ll learn by example
Objects and abstractions
Schema
Schema defines data structure. It is a state of an object (typically on business level), which is valid for multiple applications and services. Data from schemas may be scattered on several databases. Schemas may be nested when children belong only to one parent.
User: schema (
[version: 5]
[maintainer: test@gmail.com]
user_id: int [key]
name: str
age: int
purchases: List[Purchase]
Purchase: schema (
order_id: int [key]
items: List[Product]
)
)
Product: schema (
product_id: int [key]
name: str
ts: timestamp
)
We may reference schema’s fields as data types to add semantics.
For example, by writing oid: Purchase.order_id
we are saying that variable oid
has type int
,
but actually it is a reference to Purchase
’s order_id
.
Definitions can be split into multiple files. Deployment (~
) or variants (!
) modify existing values.
To “extend” definition outside of the original file, you just start with _ .
This adds two fields to the Product definition
_Product (
image_url: str
page_url: str
)
schema.config
subtype of schema that describes configuration file format.
For that subtype, modifier FORMAT[filepath]
can be set for the schema itself to specify along with
[env: ENVVAR_NAME]
or [config: <config file path and option name>]
, or hardcoded values hostname: str = sample
for each field.
FORMAT
is one of: json, yaml, ini, xml, toml
Example:
UserMongoCreds: config json[env: "MONGO_CRED_PATH"] (
username: str
password: str [env: SECRET_MONGO_PASSWORD]
hostname: str
port: int
hostname~prod = "prod-user-mongo"
hostname~dev = "prod-user-mongo"
)
schema.enum
Advanced enum. It can be used to assign non-trivial values to enums. It is useful when different components have different enums, and they must be mapped to each other.
Example:
GenderInDB: enum(
male = "male"
female = "female"
)
GenderOnSite: enum(
adult
adult.male = "m" same[GenderInDB.male]
adult.female = "f" same[.female]
adult.unisex = "u" similar[.female, .male]
kids = "k"
kids.girls = "g" similar[.female]
kids.boys = "b" similar[.male]
)
Nodes, applications, clusters and projects
Project is a complete collection of repos, machines and services that can be interacted by an external user. For instance, your online shop is a “project”
Cluster is a set of machines, and node is a single machine (virtual or dedicated).
Application is a piece of software that runs on a node. There are subtypes of applications:
- services (
app.service
) which are expected to start accepting requests on start-up, and hang indefinitely - pipelines (
app.pipeline
) which are launched by a trigger or schedule. They transform data, and halt.
Project: projectname
MyCluster: cluster(
autoscale: ...
nodenames: ...
kubernetes_config_path: ..
MyNode: node(
hostname: ...
region: ...
bandwidth: ...
ram: ...
hdd: ...
sdd: ...
gpu: ...
cpu: ...
docker_image: ...
dockerfile: ...
MyApp: app(
triggered: <schedule>
#
>-- inName : mode[type] -->
|>-- inName : mode[type] -->
<-- outName: mode[type] --<
|<-- outName: mode[type] --<
|< apiCallName: argmode[argtype]
|> apiCallName: argmode[argtype]
- - - > apiDefinition: argmode[argtype] < retval: argmode[argtype]
)
)
)
A > B
, A |> B
, B <| A
A sends/writes some data to B
We may specify sent data and stream type:
A batch[data] > B
, A |batch[data]> B
, B <batch[data]| A
A sends/writes a batch of data
to B
B >| A
, A |< B
A reads data from B
A|request>--<response B
Storages, components and pipelines
Apps are made of components and stateful storages.
Component is logic that processes data. It is defined by its input data streams, output data streams, exposed APIs, and imported APIs from other components.
We do not care what’s going on inside the component. If we have to describe what’s inside the component, we break it down into components and storages as well.
If there are no storages inside the component (even existing but invisible because we did not state what’s inside the component), then it must be stateless.
Storages hold data. One storage may hold multiple schemas.
storageA: storage (
sometable: schema@t (
replication: ...
size ~~ 1Mb
records ~~ 1M
retention: 30d
retention: 1M records
retention: 10Mb
transaction_id: int [key]
order_id: Purchase.id
somedata: str
Purchase.state@t' = "done"
)
)
storageA/sometable filter[transaction_id = 123]
storageA/sometable filter[somedata = "qwe"]