Project Stack Indexing Policy¶
This policy defines the indexing contract between SIMPLE project metadata,
particle rows, and physical particle stack files. It is intended to remove
ambiguity around merge_projects, prune_project, and downstream stack readers
such as 2D and 3D analysis.
Core Rule¶
fromp, top, and nptcls in the os_stk field refer to the project, not
to the physical stack file.
indstk refers to the physical stack file, not to the project.
These are separate index domains and must not be interchanged.
Index Domains¶
For projects with particle rows and stack rows:
ptcl row index row in os_ptcl2D or os_ptcl3D
stkind row in os_stk for the particle's stack
fromp/top/nptcls current project particle range/count for an os_stk row
indstk physical 1-based image index inside the stack file
nptcls_stk physical image count in the stack file
The stack path stored on an os_stk row identifies the physical stack file. A
particle image is read by resolving:
particle row -> stkind -> os_stk row -> stack path
particle row -> indstk -> physical image number in that stack
Field Semantics¶
os_stk%fromp, os_stk%top, and os_stk%nptcls¶
These fields describe the particle rows currently present in the project for a stack row.
They do not describe physical image positions in the backing stack file.
They do not inherently mean "active particles" or "state > 0 particles." They
count project particle rows assigned to that stack row. If a project still
contains rows with state = 0, those rows are part of the project and
therefore are included in fromp/top/nptcls.
If a pruning operation removes rows from the project, then removed rows are no
longer counted. In that case fromp/top/nptcls describe the post-prune project,
not the original project and not the physical stack file.
Required invariants when particle rows are present:
top - fromp + 1 == nptcls
stack ranges are contiguous in project-row space
stack ranges cover the particle rows in the project segment
particle rows in a stack range have the matching stkind
os_stk%nptcls_stk¶
nptcls_stk describes the number of physical images in the backing stack file.
It may be larger than nptcls after pruning or selection if the project
metadata points to the original stack file.
It should equal nptcls only when the backing stack file itself contains
exactly the particle rows present in the project, for example after writing a
new stack file from those rows.
os_ptcl2D/os_ptcl3D%stkind¶
stkind identifies the os_stk row associated with a particle. Because stack
rows are project metadata, stkind is a project-local index and may be remapped
when projects are merged or stack rows are renumbered.
os_ptcl2D/os_ptcl3D%indstk¶
indstk identifies the particle's physical image number inside the stack file
referenced by its stkind.
Modern project outputs should write positive indstk values for particle rows.
Missing or non-positive indstk values are legacy compatibility cases.
For old stream projects, even a positive indstk may be unreliable when the
stack row does not carry nptcls_stk.
When a legacy fallback is needed, or when a project-rewriting operation cannot
trust the input indstk, fromp/top define the project rows belonging to the
stack row, while the stack file itself is 1-based. Therefore the fallback
physical stack index is:
indstk = particle_project_row - fromp + 1
The fallback must never use the raw project particle row as the stack image
index unless fromp == 1. If state = 0 rows exist, they still occupy project
rows and therefore still occupy positions in this fallback mapping.
Required invariant:
1 <= indstk <= nptcls_stk
When nptcls_stk is absent in a legacy project and no physical stack inspection
has been done, the fallback physical stack count is:
nptcls_stk = top - fromp + 1
Selection State¶
Particle state is not part of the stack-index definition. It is a selection
or activity flag on a project particle row.
Use state > 0 when a workflow needs the number of active particles. Use
fromp/top/nptcls when a workflow needs the number of particle rows currently
present in the project for a stack.
Do not infer active-particle counts from nptcls unless the project operation
has explicitly removed all inactive rows.
Merge Policy¶
merge_projects concatenates project metadata. It must preserve physical stack
indices when they are credible and repair legacy old-stream indexing when they
are not.
When merging particle projects:
- Remap
stkind, because stack rows are renumbered in the merged project. - Remap
frompandtop, because project particle rows are concatenated. - Preserve
nptclsfor each stack row. - Treat input
indstkas a candidate, not as automatically correct. - If the source stack row has
nptcls_stk, preserve positiveindstkvalues that are within1..nptcls_stk. - If the source stack row lacks
nptcls_stk, treat the source as legacy and deriveindstkfromsource_particle_project_row - source_fromp + 1, even when anindstkvalue is present. - If
indstkis missing, non-positive, or out of range, derive it fromsource_particle_project_row - source_fromp + 1. - Preserve
nptcls_stkwhen present, because the physical stack files are not rewritten. - Preserve stack paths and row-level stack metadata, including CTF-model fields.
- Remap row-level
ogidvalues as project metadata. - Drop analysis products such as
cls2D,cls3D, andoutunless a future policy explicitly defines how to merge them.
The important merge invariant is:
project row indices may change
stack row indices may change
physical image indices must not change
The fallback formula is applied in the source project's project-row domain
before the output fromp/top remapping. This prevents the merged global
particle row from being used as a stack index.
Prune Policy¶
Pruning must distinguish between metadata-only pruning and materializing pruning.
Metadata-only prune¶
If pruning removes particle rows from the project but keeps the original stack files:
- Remove deleted rows from
os_ptcl2Dandos_ptcl3D. - Update
os_stk%fromp,os_stk%top, andos_stk%nptclsto describe the post-prune project. - Remap
stkindif stack rows are renumbered. - Treat input
indstkas a candidate, not as automatically correct. - If the source stack row has
nptcls_stk, preserve positiveindstkvalues that are within1..nptcls_stk, because the physical image positions in the original stack files have not changed. - If the source stack row lacks
nptcls_stk, treat the source as legacy and deriveindstkfromold_particle_project_row - old_fromp + 1, even when anindstkvalue is present. - If
indstkis missing, non-positive, or out of range, derive it from the legacy fallbackold_particle_project_row - old_fromp + 1. - Preserve
nptcls_stkwhen present, because the physical stack files have not changed. - Preserve stack paths.
This is the common case that protects downstream readers from using project row numbers as stack slice numbers.
Materializing prune¶
If pruning writes new physical stack files containing only the retained particles:
- Update stack paths to the new physical files.
- Rewrite
indstkto the physical image positions in the new stack files, normally1..nptcls. - Set
nptcls_stkto the physical image count in the new stack file. - Set
fromp/top/nptclsto the project row ranges. - The number of particle rows is the same as the physical image count:
nptcls = nptcls_stk
This is the only case where pruning should rewrite indstk based on the new
stack-file order.
Invalid prune state¶
The following state is invalid:
stack path still points to the original physical stack file
indstk has been recomputed from project-row order
That state corrupts the particle-to-image mapping and can make distributed workers request image indices that do not correspond to the intended particles.
Reader Policy¶
Any code that reads particle images from a stack file must use indstk as the
physical image index when the stack row has nptcls_stk and indstk is
present, positive, and within range.
If nptcls_stk is absent, or if indstk is missing, non-positive, or out of
range, readers may use the legacy fallback:
indstk = particle_project_row - fromp + 1
The reader must first resolve the particle's stkind, read fromp/top from
that stack row, and verify that the particle row lies inside that range. Readers
must not use the raw particle row index as the physical stack-file image index
except in the special case where fromp == 1.
Validation Policy¶
Project validation should check:
- Stack project ranges are contiguous and cover the particle rows.
top - fromp + 1 == nptcls.- Particle
stkindvalues identify valid stack rows. - New project-writing code should write
nptcls_stkon stack rows when particle rows are present. - When
nptcls_stkis missing in a legacy project, validation may inspect the physical stack file or usetop - fromp + 1as the fallback stack count. - New project-writing code should write positive particle
indstkvalues. - Missing, non-positive, or untrusted particle
indstkvalues may be accepted through theparticle_project_row - fromp + 1fallback. - Particle
indstkvalues are not larger thannptcls_stkwhennptcls_stkis present. - If
nptcls_stkis absent, validation must not assume existingindstkvalues are correct; merge/prune-style repair should derive them fromfromp/top.
Validation must not silently convert physical indstk values into project-row
indices.
The validate_projfile program applies this policy to an input project and
writes input_name_validated.simple. It reports warnings, errors, and repairs
encountered during validation, but writes a best-effort repaired project so that
legacy stream outputs can be normalized before downstream 2D or 3D analysis.
Test Policy¶
Tests for merge, prune, and stack readers should include pruned-style projects where:
nptcls <= nptcls_stk
fromp/top describe current project rows
indstk contains non-contiguous physical image indices
For example, a project stack row may have:
fromp = 1
top = 3
nptcls = 3
nptcls_stk = 6
particle indstk values = [1, 4, 6]
After merging this project with another, the second project's fromp/top and
stkind values may be remapped, but its indstk values must remain physical
indices into its original stack file unless new physical stack files are
written.
Tests should also include old-stream-style projects where nptcls_stk is
absent and indstk is missing, zero, or wrong. Merge and prune should repair
those rows with:
indstk = source_particle_project_row - source_fromp + 1