lightcone-cli on NERSC (Perlmutter)¶
A practical guide for running lightcone-cli on Perlmutter. The CLI itself behaves the same as on a laptop — the wrinkles are in the filesystem layout (DVS-mounted home, Lustre scratch), the container runtime (podman-hpc), and SLURM submission. This page covers all three.
Already familiar with the basics?
The generic Install and Running on a Cluster pages cover the cross-platform story. This page is the NERSC-specific overlay — read it first if Perlmutter is your home base.
0. Agentic CLI¶
lightcone-cli is the execution layer of the lightcone project — it harnesses an agent-based CLI to follow the astra standard while building and running an analysis. The choice of agent is open: anything that can drive a project shell works. This guide uses Claude Code as the running example — substitute your preferred agent CLI throughout if you use a different one.
Installing Claude Code:
Make sure ~/.local/bin is on your PATH, then verify and authenticate:
Other install routes (npm, native package managers) are documented in the Claude Code installation docs.
Other agent CLIs
Other agentic CLIs work too — for example:
- OpenAI Codex — see the repo README for install options.
- opencode — install via
curl -fsSL https://opencode.ai/install | bash.
Pick whichever you prefer; the rest of this guide writes claude in concrete commands, but the workflow is the same with any agent CLI.
1. Python¶
Like the generic Install page, we recommend uv for managing Python on Perlmutter — it's faster than pip and gives you a Python independent of NERSC's module system. NERSC doesn't ship it, but it installs into your home dir with a single curl:
This drops both uv and an isolated Python 3.12 under ~/.local/. Make sure ~/.local/bin is on your PATH.
Alternative: NERSC's python module
If you'd rather use NERSC's pre-built environment, module load python gives you a ready-to-use distribution with conda, pip, and many scientific packages already installed:
Convenient, but the module is shared and read-only — you can't pin a different Python version or guarantee dependency isolation. For that, build a conda env on top:
This is also NERSC's recommended path for pip install when you need custom packages.
Storage note: 40 GB home quota
Conda envs land under ~/.conda/envs/ by default. The Perlmutter home quota is 40 GB, which gets eaten quickly. NERSC recommends /global/common/software/<project>/ for larger envs. If you really want them on $SCRATCH (note: 12-week purge!), move and symlink:
conda deactivate
mv ~/.conda/envs/your-env-name $SCRATCH/conda-envs/
ln -s $SCRATCH/conda-envs/your-env-name ~/.conda/envs/your-env-name
See NERSC's Python guide for the full storage strategy.
2. Install lightcone-cli¶
The package on PyPI is lightcone-cli; the command it provides is lc. The recommended install uses uv tool, which isolates lc in its own venv under ~/.local/share/uv/tools/ and exposes a wrapper at ~/.local/bin/lc:
astra-tools is a transitive dependency — pulled in automatically.
Alternative: pip
If you'd rather not use uv, install with pip. The exact command depends on which Python you're using:
From source (contributors only)¶
If you want to track the latest commits or contribute back, clone the repo and install editably. Most users should stick with PyPI.
cd ~/.lightcone # or wherever you keep clones
git clone https://github.com/LightconeResearch/lightcone-cli.git
uv pip install -e ./lightcone-cli # or: pip install -e ./lightcone-cli
To hack on astra-tools itself (PyPI name astra-tools, GitHub repo ASTRA):
For development tooling (pytest, ruff, mypy):
Sanity check¶
Global config is auto-created
The first lc invocation writes ~/.lightcone/config.yaml with runtime: auto — no manual setup step needed. You'll pin it to podman-hpc for compute nodes in §5.
3. Initialize a new project¶
Scaffold a project directory and drop into it with the agent:
lc init your-analysis # scaffolds a fresh project tree
cd your-analysis
claude # launch your agent CLI (Claude Code shown here)
4. Start your research¶
Once your agent CLI is open (Claude Code in this guide's examples), drive everything from there. The lc-* skills are how you tell the agent what to build:
After that, just keep talking to the agent in plain English about what you want to build next.
You're still on a login node
Everything from lc init through your first /lc-new runs on a Perlmutter login node. That's fine for scaffolding and small recipes, but anything heavyweight needs a compute node — see §5.
5. Running on compute nodes¶
Login nodes are shared and rate-limited — fine for lc init, lc status, and small lc build calls, but anything heavyweight belongs on a compute node.
Pre-flight: pin the container runtime and build images¶
Perlmutter compute nodes ship podman-hpc. Pin it once globally:
Then, on a login node, build and migrate your project's images:
lc build runs podman-hpc build followed by podman-hpc migrate, which copies the image into each compute node's local container cache. See Running on a Cluster → Pre-flight for the underlying mechanics.
Interactive runs (agent-driven)¶
The agent calls lc run for you whenever a recipe needs to materialize — you never call it directly. What you do control is where the agent is running: it inherits the shell environment you launched it from. To put the agent's recipes onto a compute node, simply launch it from inside a SLURM allocation:
salloc -A <your_project> -q interactive -C gpu --nodes=1 -t 00:30:00
# salloc drops you onto a compute node; from there:
cd /path/to/your-analysis
claude # or whichever agent CLI you use
Now everything the agent triggers (lc run, scripts, etc.) executes on the allocated node.
Picking a QoS
The interactive QoS on the GPU partition is right for development. For longer or larger sessions, see NERSC's queue policy reference.
Unattended batch runs (no agent in the loop)¶
For production sweeps where the recipes are already nailed down, you can submit lc run directly as a batch job — no agent CLI involved. See Running on a Cluster → A typical SLURM workflow for the generic template; on Perlmutter, the only addition is the -A / -q directives:
#!/bin/bash
#SBATCH -A <your_project>
#SBATCH -q regular
#SBATCH -C gpu
#SBATCH -N 4
#SBATCH -t 04:00:00
cd $SCRATCH/your-analysis
# Make `lc` available — pick the line that matches your install:
export PATH=$HOME/.local/bin:$PATH # uv tool install (default)
# source ~/.conda/envs/your-env-name/bin/activate # conda env
lc run -j 16
When to use this path
The agent-driven flow above is the right tool during development. Reach for batch submission when you've finished iterating and want a hands-off sweep.
Storage gotcha: Snakemake state must live on $SCRATCH¶
DVS silently ignores flock()
$HOME and /global/cfs/ are mounted on compute nodes via DVS, which silently ignores flock(). Snakemake (and any sane locking system) relies on flock, so its .snakemake/ directory and Dask spill files must live on Lustre ($SCRATCH), which honors flock. Otherwise you get intermittent silent rule-rerun loops or hangs.
lc redirects state automatically when it detects Perlmutter, so this usually just works. To pin explicitly at project creation:
Or, after the fact, edit <project>/.lightcone/lightcone.yaml:
12-week purge on $SCRATCH
Perlmutter purges $SCRATCH on a rolling 12-week window. For outputs you need to keep, copy or symlink to /global/cfs/cdirs/<project>/.
Further reading¶
- NERSC interactive jobs —
sallocpatterns and reservation queues - Perlmutter system overview — node types and partitions
- NERSC Python guide — module, conda, and pip layering
6. Common troubleshooting¶
| Symptom | Likely cause | Fix |
|---|---|---|
lc: command not found |
Wrong env active, or ~/.local/bin not on PATH |
which lc; reinstall in the active env, or fix PATH |
lc runs but uses unexpected code |
Two installs across two envs shadowing each other on PATH |
which lc and uninstall the stale one |
ModuleNotFoundError: lightcone.cli.__main__ |
Tried python -m lightcone.cli (the package isn't directly executable) |
Use the lc console script instead |
| Snakemake locking errors / silent rule rerun loops | .snakemake/ ended up on DVS-mounted storage |
Set scratch_root: $SCRATCH in the project's .lightcone/lightcone.yaml |
ImportError: cannot import name 'resolve_analysis_tree' from 'astra.helpers' |
Stale astra-tools (pre-0.2.5) |
pip install -U astra-tools |
PermissionError reading another user's symlinked results/ |
Cross-user scratch path without group ACLs | Request access from the data owner, or copy the manifests into your own scratch |
pip install hangs or times out on a compute node |
Compute nodes have no public internet | Always install from a login node |
7. Updating¶
Editable installs auto-follow source edits — switching branches or pulling new commits is reflected immediately in lc. Re-install only when pyproject.toml adds a new dependency or changes the [project.scripts] table.
8. Uninstalling¶
Keep your config?
~/.lightcone/config.yaml survives the uninstall. Delete it too if you want to start fresh.