ProcessHub: GitHub, but for bioprocesses
Core premise: each Process is a repo-like object with its own history, forks, releases, CI pipeline, and machine-readable specification.
Platform stance: no inherent restriction on what’s stored — attributes (like safety_class
, hazard_rating
, biosafety_level
) are part of the data model, not a platform hard limit.
1) Core concepts
-
Users / Orgs
Just like GitHub, with profile, contribution graph, following, org repos. -
Processes (the "repos")
A self-contained project describing what can be made, from what, and how.- Contains:
/graph.yaml
,/nodes
,/unitops
,/edges
,/README
,/provenance
. - Has version history, releases, tags.
- Metadata includes attributes like:
process_type
: synthesis, purification, analysissafety_class
: safe / controlled / hazardoushazard_rating
: numeric scalebiosafety_level
: BSL-1, BSL-2, etc.execution_ready
: true/false
- Contains:
-
UnitOp Registry
Shared, versioned definitions of reusable Unit Operations (e.g., RP-HPLC abstract, solvent extraction).
Processes can import UnitOps at a specific version. -
Forks, PRs, and Reviews
Same mechanics as GitHub — propose edits, merge after review. -
Pipelines (CI)
Validates schemas, checks graph connectivity, enforces org-defined policies (could block certain hazard ratings in public orgs, for example).
2) Process structure
/
├─ README.md
├─ graph.yaml # DAG of nodes/edges
├─ nodes/ # materials, products, hosts, etc.
├─ edges/ # transformations & separations
├─ unitops.lock # pinned UnitOps
├─ provenance.yaml # references, contributors, lineage
├─ metadata.yaml # attributes like safety\_class, hazard\_rating
└─ LICENSE
metadata.yaml example
id: process.c15_0.enrichment
label: Odd-chain fatty acid enrichment
process_type: purification
safety_class: controlled # controlled, unrestricted, hazardous
hazard_rating: 2 # scale 0–5
biosafety_level: BSL-1
execution_ready: false
tags: [lipid, fatty_acid, chromatography]
3) Example UnitOp (registry entry)
id: rp_hplc
version: "2.1.0"
label: Reverse-phase HPLC
inputs: [material:any_liquid_sample]
outputs: [material:fraction_collection]
parameters:
- name: column_type
type: enum
values: ["C8", "C18", "polymer_reversed"]
- name: detection_mode
type: enum
values: ["UV", "MS", "ELSD", "none"]
attributes:
hazard_rating: 1
safety_class: unrestricted
license: "Apache-2.0"
4) CI / Policy enforcement
-
Core validators (always on):
- Schema checks (graph, node, edge, UnitOp)
- Graph integrity (no dangling nodes)
- Provenance completeness
-
Org-defined policies (optional):
- Allow/block certain
safety_class
values in repos - Auto-flag processes above certain
hazard_rating
- Require
biosafety_level
to be declared
- Allow/block certain
This means the platform doesn’t prohibit — it gives knobs for communities/orgs to set their own thresholds.
5) Discovery & search
Filter by:
- Target product / intermediate
- UnitOps used
- Safety attributes (
safety_class=unrestricted
) - Hazard rating range
- Host organism
- Graph patterns
6) Execution adapters
- Public Processes remain descriptive specs.
- Orgs can attach private adapters mapping specs to LIMS or lab robots.
execution_ready: true
marks processes that have been successfully run in at least one environment (pointer to private adapter).
7) Why this model works
- Keeps safety and hazard as data, not a global constraint.
- Makes the platform useful for research, industry, and education without hardcoding exclusions.
- Allows federated communities to run their own governance models.