docs: design for per-project Python dependencies (cached venv)
Captures the proposed requirements.txt -> cached virtualenv approach (keyed by hash, --system-site-packages, QUARTO_PYTHON), its guard rails (auth gating, egress restriction, resource caps) given anonymous write is enabled, lifecycle (eviction, failure UX), a phased rollout, and the open decisions. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,76 @@
|
||||
# Design: per-project Python dependencies (cached virtualenv)
|
||||
|
||||
Status: **proposal** (not yet implemented). Captures the agreed plan for letting
|
||||
Quarto `{python}` cells use libraries beyond the curated base set.
|
||||
|
||||
## Background
|
||||
|
||||
Quarto executes `` ```{python} `` cells through a Jupyter kernel. The base image
|
||||
([`server-ce/Dockerfile-base`](../server-ce/Dockerfile-base)) bundles a curated
|
||||
scientific stack (numpy, pandas, scipy, matplotlib, seaborn, scikit-learn,
|
||||
sympy, plotly, tabulate). Anything outside that set currently fails the render
|
||||
with `ModuleNotFoundError`.
|
||||
|
||||
As a first step that already shipped, the Quarto log parser
|
||||
([`quarto-log-parser.ts`](../services/web/frontend/js/ide/log-parser/quarto-log-parser.ts))
|
||||
turns a missing-package traceback into an actionable message. This document is
|
||||
the *next* step: letting a project declare and install its own dependencies.
|
||||
|
||||
**Key constraint:** the instance runs with anonymous read+write enabled
|
||||
(`OVERLEAF_ALLOW_ANONYMOUS_READ_AND_WRITE_SHARING=true`), so compiles can be
|
||||
triggered by untrusted users. Installing arbitrary packages is therefore a
|
||||
security decision, not just a convenience.
|
||||
|
||||
## Mechanism
|
||||
|
||||
1. **Declaration.** A standard `requirements.txt` at the project root opts the
|
||||
project in (familiar, Quarto-agnostic, supports version pinning).
|
||||
2. **Keying.** CLSI hashes `sha256(requirements.txt + python version)`. The hash
|
||||
names a venv directory on a **persistent volume**, e.g.
|
||||
`…/data/python-venvs/<hash>/`. Identical dependency sets share one venv across
|
||||
projects and compiles.
|
||||
3. **Build-if-missing.** `python3 -m venv --system-site-packages <dir>` (so the
|
||||
bundled stack stays visible and only the *extra* deps are installed — smaller
|
||||
and faster), then `<dir>/bin/pip install -r requirements.txt`. Guard with a
|
||||
per-hash `flock` so concurrent compiles don't build the same venv twice.
|
||||
4. **Point Quarto at it.** Set `QUARTO_PYTHON=<dir>/bin/python3` in the render
|
||||
environment (threaded web → CLSI exactly like `exportMode`). With
|
||||
`--system-site-packages`, `ipykernel` from the base is importable, so the
|
||||
kernel runs in that interpreter with base + project packages.
|
||||
|
||||
## Guard rails
|
||||
|
||||
- **Auth gating.** Only run the install path for **logged-in owner/collaborator**
|
||||
compiles. Anonymous-link compiles use the plain base interpreter and never
|
||||
trigger installs. Web decides and passes a boolean to CLSI; default-deny.
|
||||
- **Network egress.** The compile environment must reach PyPI to install.
|
||||
Restrict egress to PyPI / an internal mirror only (k8s NetworkPolicy + pip
|
||||
`--index-url`), not arbitrary hosts.
|
||||
- **Resource caps.** Install timeout, venv size cap, max package count; surface
|
||||
overruns as a clear log error.
|
||||
- **Trust boundary.** Even gated, a trusted user installing packages is
|
||||
arbitrary code execution in the sandbox. Containment stays the CLSI container
|
||||
+ resource limits + egress policy. This is owner-trust-level by design.
|
||||
|
||||
## Lifecycle
|
||||
|
||||
- **Eviction.** `touch` the venv on use; an LRU cleanup job prunes the oldest
|
||||
venvs when the volume exceeds a size budget.
|
||||
- **Failure UX.** pip errors flow into the log panel (reusing the friendly-error
|
||||
pattern) showing pip's output.
|
||||
|
||||
## Rollout
|
||||
|
||||
- **Phase 1.** Detection + `flock` venv build + `QUARTO_PYTHON`, behind a
|
||||
settings flag (default **off**), gated to logged-in owner, dev volume.
|
||||
- **Phase 2.** Egress NetworkPolicy + index pinning + eviction job.
|
||||
- **Phase 3.** Nicer pip-error surfacing + a small project-settings UI
|
||||
affordance.
|
||||
|
||||
## Open decisions
|
||||
|
||||
- `requirements.txt` vs a frontmatter field vs both?
|
||||
- Shared global venv volume vs per-user namespacing (sharing is cheaper;
|
||||
per-user is stricter isolation)?
|
||||
- Allow native/compiled wheels (broader support) vs wheels-only/no-build
|
||||
(tighter security)?
|
||||
Reference in New Issue
Block a user