# Design: per-project Python dependencies (cached virtualenv) Status: **Phase 1 implemented** (gated behind `OVERLEAF_ENABLE_PROJECT_PYTHON_VENV`, on in the deployment). Network egress policy and venv eviction (Phases 2–3) remain. Captures the plan for letting Quarto `{python}` cells use libraries beyond the curated base set. ## What ships in Phase 1 - A project root `requirements.vrf` is installed into a venv cached by its sha256, created with `python3 -m venv --system-site-packages`; `QuartoRunner` points Quarto at it via `QUARTO_PYTHON`. A per-hash `flock` serialises concurrent builds; pip output is merged into `output.log`; on failure the render falls back to the base interpreter (and the missing-package message surfaces). Venvs live under `PYTHON_VENVS_DIR` (default `/var/lib/overleaf/data/python-venvs`). - Gated by `userCanInstallPython` (`PythonVenvGate.mjs`) to the project owner + invited collaborators (any role) — never anonymous / link-sharing users — threaded to CLSI as `allowPythonInstall` on the editor compile, presentation export, and publish paths. ### Known Phase-1 limitations - The first build of a heavy `requirements.vrf` runs within the compile timeout; a very large install can be killed and retried next compile (the venv is only marked complete on success). - No egress restriction yet (Phase 2) — installs reach PyPI directly. - No eviction yet (Phase 3) — venvs accumulate under `PYTHON_VENVS_DIR`. ## Background Quarto executes `` ```{python} `` cells through a Jupyter kernel. The base image ([`server-ce/Dockerfile-base`](../server-ce/Dockerfile-base)) bundles a curated scientific stack (numpy, pandas, scipy, matplotlib, seaborn, scikit-learn, sympy, plotly, tabulate). Anything outside that set currently fails the render with `ModuleNotFoundError`. As a first step that already shipped, the Quarto log parser ([`quarto-log-parser.ts`](../services/web/frontend/js/ide/log-parser/quarto-log-parser.ts)) turns a missing-package traceback into an actionable message. This document is the *next* step: letting a project declare and install its own dependencies. **Key constraint:** the instance runs with anonymous read+write enabled (`OVERLEAF_ALLOW_ANONYMOUS_READ_AND_WRITE_SHARING=true`), so compiles can be triggered by untrusted users. Installing arbitrary packages is therefore a security decision, not just a convenience. ## Mechanism 1. **Declaration.** A standard `requirements.vrf` at the project root opts the project in (familiar, Quarto-agnostic, supports version pinning). 2. **Keying.** CLSI hashes `sha256(requirements.vrf + python version)`. The hash names a venv directory on a **persistent volume**, e.g. `…/data/python-venvs//`. Identical dependency sets share one venv across projects and compiles. 3. **Build-if-missing.** `python3 -m venv --system-site-packages ` (so the bundled stack stays visible and only the *extra* deps are installed — smaller and faster), then `/bin/pip install -r requirements.vrf`. Guard with a per-hash `flock` so concurrent compiles don't build the same venv twice. 4. **Point Quarto at it.** Set `QUARTO_PYTHON=/bin/python3` in the render environment (threaded web → CLSI exactly like `exportMode`). With `--system-site-packages`, `ipykernel` from the base is importable, so the kernel runs in that interpreter with base + project packages. ## Guard rails - **Auth gating.** Only run the install path for **logged-in owner/collaborator** compiles. Anonymous-link compiles use the plain base interpreter and never trigger installs. Web decides and passes a boolean to CLSI; default-deny. - **Network egress.** The compile environment must reach PyPI to install. Restrict egress to PyPI / an internal mirror only (k8s NetworkPolicy + pip `--index-url`), not arbitrary hosts. - **Resource caps.** Install timeout, venv size cap, max package count; surface overruns as a clear log error. - **Trust boundary.** Even gated, a trusted user installing packages is arbitrary code execution in the sandbox. Containment stays the CLSI container + resource limits + egress policy. This is owner-trust-level by design. ## Lifecycle - **Eviction.** `touch` the venv on use; an LRU cleanup job prunes the oldest venvs when the volume exceeds a size budget. - **Failure UX.** pip errors flow into the log panel (reusing the friendly-error pattern) showing pip's output. ## Rollout - **Phase 1.** Detection + `flock` venv build + `QUARTO_PYTHON`, behind a settings flag (default **off**), gated to logged-in owner, dev volume. - **Phase 2.** Egress NetworkPolicy + index pinning + eviction job. - **Phase 3.** Nicer pip-error surfacing + a small project-settings UI affordance. ## Open decisions - `requirements.vrf` vs a frontmatter field vs both? - Shared global venv volume vs per-user namespacing (sharing is cheaper; per-user is stricter isolation)? - Allow native/compiled wheels (broader support) vs wheels-only/no-build (tighter security)?