c9727a26e4
Build and Deploy Verso / deploy (push) Successful in 9m46s
Option A: when a {python} cell fails with ModuleNotFoundError/ImportError, the
log now suggests the exact PyPI package to add (with a module->package map, e.g.
cv2 -> opencv-python, sklearn -> scikit-learn), names the Verso requirements
file, and notes it could instead be a local module — so the langmuirthermalstudy
case isn't mistaken for a PyPI package.
Switch the per-project requirements file from requirements.txt to a Verso-
specific requirements.vrf (so it won't be confused with arbitrary .txt files);
QuartoRunner now looks for requirements.vrf, and 'vrf' is registered as an
editable text extension. The dedicated in-UI editor (and hiding it from the
file tree) follows in a separate change.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
101 lines
4.9 KiB
Markdown
101 lines
4.9 KiB
Markdown
# Design: per-project Python dependencies (cached virtualenv)
|
||
|
||
Status: **Phase 1 implemented** (gated behind `OVERLEAF_ENABLE_PROJECT_PYTHON_VENV`,
|
||
on in the deployment). Network egress policy and venv eviction (Phases 2–3)
|
||
remain. Captures the plan for letting Quarto `{python}` cells use libraries
|
||
beyond the curated base set.
|
||
|
||
## What ships in Phase 1
|
||
|
||
- A project root `requirements.vrf` is installed into a venv cached by its
|
||
sha256, created with `python3 -m venv --system-site-packages`; `QuartoRunner`
|
||
points Quarto at it via `QUARTO_PYTHON`. A per-hash `flock` serialises
|
||
concurrent builds; pip output is merged into `output.log`; on failure the
|
||
render falls back to the base interpreter (and the missing-package message
|
||
surfaces). Venvs live under `PYTHON_VENVS_DIR`
|
||
(default `/var/lib/overleaf/data/python-venvs`).
|
||
- Gated by `userCanInstallPython` (`PythonVenvGate.mjs`) to the project owner +
|
||
invited collaborators (any role) — never anonymous / link-sharing users —
|
||
threaded to CLSI as `allowPythonInstall` on the editor compile, presentation
|
||
export, and publish paths.
|
||
|
||
### Known Phase-1 limitations
|
||
|
||
- The first build of a heavy `requirements.vrf` runs within the compile
|
||
timeout; a very large install can be killed and retried next compile (the
|
||
venv is only marked complete on success).
|
||
- No egress restriction yet (Phase 2) — installs reach PyPI directly.
|
||
- No eviction yet (Phase 3) — venvs accumulate under `PYTHON_VENVS_DIR`.
|
||
|
||
## Background
|
||
|
||
Quarto executes `` ```{python} `` cells through a Jupyter kernel. The base image
|
||
([`server-ce/Dockerfile-base`](../server-ce/Dockerfile-base)) bundles a curated
|
||
scientific stack (numpy, pandas, scipy, matplotlib, seaborn, scikit-learn,
|
||
sympy, plotly, tabulate). Anything outside that set currently fails the render
|
||
with `ModuleNotFoundError`.
|
||
|
||
As a first step that already shipped, the Quarto log parser
|
||
([`quarto-log-parser.ts`](../services/web/frontend/js/ide/log-parser/quarto-log-parser.ts))
|
||
turns a missing-package traceback into an actionable message. This document is
|
||
the *next* step: letting a project declare and install its own dependencies.
|
||
|
||
**Key constraint:** the instance runs with anonymous read+write enabled
|
||
(`OVERLEAF_ALLOW_ANONYMOUS_READ_AND_WRITE_SHARING=true`), so compiles can be
|
||
triggered by untrusted users. Installing arbitrary packages is therefore a
|
||
security decision, not just a convenience.
|
||
|
||
## Mechanism
|
||
|
||
1. **Declaration.** A standard `requirements.vrf` at the project root opts the
|
||
project in (familiar, Quarto-agnostic, supports version pinning).
|
||
2. **Keying.** CLSI hashes `sha256(requirements.vrf + python version)`. The hash
|
||
names a venv directory on a **persistent volume**, e.g.
|
||
`…/data/python-venvs/<hash>/`. Identical dependency sets share one venv across
|
||
projects and compiles.
|
||
3. **Build-if-missing.** `python3 -m venv --system-site-packages <dir>` (so the
|
||
bundled stack stays visible and only the *extra* deps are installed — smaller
|
||
and faster), then `<dir>/bin/pip install -r requirements.vrf`. Guard with a
|
||
per-hash `flock` so concurrent compiles don't build the same venv twice.
|
||
4. **Point Quarto at it.** Set `QUARTO_PYTHON=<dir>/bin/python3` in the render
|
||
environment (threaded web → CLSI exactly like `exportMode`). With
|
||
`--system-site-packages`, `ipykernel` from the base is importable, so the
|
||
kernel runs in that interpreter with base + project packages.
|
||
|
||
## Guard rails
|
||
|
||
- **Auth gating.** Only run the install path for **logged-in owner/collaborator**
|
||
compiles. Anonymous-link compiles use the plain base interpreter and never
|
||
trigger installs. Web decides and passes a boolean to CLSI; default-deny.
|
||
- **Network egress.** The compile environment must reach PyPI to install.
|
||
Restrict egress to PyPI / an internal mirror only (k8s NetworkPolicy + pip
|
||
`--index-url`), not arbitrary hosts.
|
||
- **Resource caps.** Install timeout, venv size cap, max package count; surface
|
||
overruns as a clear log error.
|
||
- **Trust boundary.** Even gated, a trusted user installing packages is
|
||
arbitrary code execution in the sandbox. Containment stays the CLSI container
|
||
+ resource limits + egress policy. This is owner-trust-level by design.
|
||
|
||
## Lifecycle
|
||
|
||
- **Eviction.** `touch` the venv on use; an LRU cleanup job prunes the oldest
|
||
venvs when the volume exceeds a size budget.
|
||
- **Failure UX.** pip errors flow into the log panel (reusing the friendly-error
|
||
pattern) showing pip's output.
|
||
|
||
## Rollout
|
||
|
||
- **Phase 1.** Detection + `flock` venv build + `QUARTO_PYTHON`, behind a
|
||
settings flag (default **off**), gated to logged-in owner, dev volume.
|
||
- **Phase 2.** Egress NetworkPolicy + index pinning + eviction job.
|
||
- **Phase 3.** Nicer pip-error surfacing + a small project-settings UI
|
||
affordance.
|
||
|
||
## Open decisions
|
||
|
||
- `requirements.vrf` vs a frontmatter field vs both?
|
||
- Shared global venv volume vs per-user namespacing (sharing is cheaper;
|
||
per-user is stricter isolation)?
|
||
- Allow native/compiled wheels (broader support) vs wheels-only/no-build
|
||
(tighter security)?
|