Option A: when a {python} cell fails with ModuleNotFoundError/ImportError, the
log now suggests the exact PyPI package to add (with a module->package map, e.g.
cv2 -> opencv-python, sklearn -> scikit-learn), names the Verso requirements
file, and notes it could instead be a local module — so the langmuirthermalstudy
case isn't mistaken for a PyPI package.
Switch the per-project requirements file from requirements.txt to a Verso-
specific requirements.vrf (so it won't be confused with arbitrary .txt files);
QuartoRunner now looks for requirements.vrf, and 'vrf' is registered as an
editable text extension. The dedicated in-UI editor (and hiding it from the
file tree) follows in a separate change.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
4.9 KiB
Design: per-project Python dependencies (cached virtualenv)
Status: Phase 1 implemented (gated behind OVERLEAF_ENABLE_PROJECT_PYTHON_VENV,
on in the deployment). Network egress policy and venv eviction (Phases 2–3)
remain. Captures the plan for letting Quarto {python} cells use libraries
beyond the curated base set.
What ships in Phase 1
- A project root
requirements.vrfis installed into a venv cached by its sha256, created withpython3 -m venv --system-site-packages;QuartoRunnerpoints Quarto at it viaQUARTO_PYTHON. A per-hashflockserialises concurrent builds; pip output is merged intooutput.log; on failure the render falls back to the base interpreter (and the missing-package message surfaces). Venvs live underPYTHON_VENVS_DIR(default/var/lib/overleaf/data/python-venvs). - Gated by
userCanInstallPython(PythonVenvGate.mjs) to the project owner + invited collaborators (any role) — never anonymous / link-sharing users — threaded to CLSI asallowPythonInstallon the editor compile, presentation export, and publish paths.
Known Phase-1 limitations
- The first build of a heavy
requirements.vrfruns within the compile timeout; a very large install can be killed and retried next compile (the venv is only marked complete on success). - No egress restriction yet (Phase 2) — installs reach PyPI directly.
- No eviction yet (Phase 3) — venvs accumulate under
PYTHON_VENVS_DIR.
Background
Quarto executes ```{python} cells through a Jupyter kernel. The base image
(server-ce/Dockerfile-base) bundles a curated
scientific stack (numpy, pandas, scipy, matplotlib, seaborn, scikit-learn,
sympy, plotly, tabulate). Anything outside that set currently fails the render
with ModuleNotFoundError.
As a first step that already shipped, the Quarto log parser
(quarto-log-parser.ts)
turns a missing-package traceback into an actionable message. This document is
the next step: letting a project declare and install its own dependencies.
Key constraint: the instance runs with anonymous read+write enabled
(OVERLEAF_ALLOW_ANONYMOUS_READ_AND_WRITE_SHARING=true), so compiles can be
triggered by untrusted users. Installing arbitrary packages is therefore a
security decision, not just a convenience.
Mechanism
- Declaration. A standard
requirements.vrfat the project root opts the project in (familiar, Quarto-agnostic, supports version pinning). - Keying. CLSI hashes
sha256(requirements.vrf + python version). The hash names a venv directory on a persistent volume, e.g.…/data/python-venvs/<hash>/. Identical dependency sets share one venv across projects and compiles. - Build-if-missing.
python3 -m venv --system-site-packages <dir>(so the bundled stack stays visible and only the extra deps are installed — smaller and faster), then<dir>/bin/pip install -r requirements.vrf. Guard with a per-hashflockso concurrent compiles don't build the same venv twice. - Point Quarto at it. Set
QUARTO_PYTHON=<dir>/bin/python3in the render environment (threaded web → CLSI exactly likeexportMode). With--system-site-packages,ipykernelfrom the base is importable, so the kernel runs in that interpreter with base + project packages.
Guard rails
- Auth gating. Only run the install path for logged-in owner/collaborator compiles. Anonymous-link compiles use the plain base interpreter and never trigger installs. Web decides and passes a boolean to CLSI; default-deny.
- Network egress. The compile environment must reach PyPI to install.
Restrict egress to PyPI / an internal mirror only (k8s NetworkPolicy + pip
--index-url), not arbitrary hosts. - Resource caps. Install timeout, venv size cap, max package count; surface overruns as a clear log error.
- Trust boundary. Even gated, a trusted user installing packages is
arbitrary code execution in the sandbox. Containment stays the CLSI container
- resource limits + egress policy. This is owner-trust-level by design.
Lifecycle
- Eviction.
touchthe venv on use; an LRU cleanup job prunes the oldest venvs when the volume exceeds a size budget. - Failure UX. pip errors flow into the log panel (reusing the friendly-error pattern) showing pip's output.
Rollout
- Phase 1. Detection +
flockvenv build +QUARTO_PYTHON, behind a settings flag (default off), gated to logged-in owner, dev volume. - Phase 2. Egress NetworkPolicy + index pinning + eviction job.
- Phase 3. Nicer pip-error surfacing + a small project-settings UI affordance.
Open decisions
requirements.vrfvs a frontmatter field vs both?- Shared global venv volume vs per-user namespacing (sharing is cheaper; per-user is stricter isolation)?
- Allow native/compiled wheels (broader support) vs wheels-only/no-build (tighter security)?