Abinitio2D Policy¶
1. Purpose and Scope¶
This document defines the current architectural policy for abinitio2D and the cluster2D workflow it drives.
It mirrors the recent refine3D cleanup where the same design pressure exists:
- keep stage policy separate from execution mechanics
- keep particle/class assignment work separate from class-average assembly
- preserve shared-memory and distributed parity
- make sampled-update and probabilistic handoffs explicit
- treat class-average files, assignment files, FRCs, and partial sums as workflow contracts rather than incidental scratch files
The 2D workflow is intentionally Cartesian. The old polar command-line branch
selector has been removed for abinitio2D and cluster2D.
2. Architectural Policy¶
abinitio2D is a staged 2D classification workflow:
- set run defaults and read project state
- determine stage geometry, low-pass limits, sampling policy, and cluster2D command lines
- initialize references when needed
- run staged
cluster2Diterations - optionally run probabilistic pre-alignment in later stages
- update particle class, in-plane, shift, sampled, and update-count state
- restore class averages through shared-memory or distributed class-average pathways
- run a final fill-in assignment pass for active particles that were never updated
- when sampled updates were active, run a terminal greedy all-particle pass
- generate final class averages, FRC metadata, and ranked outputs
The main policy boundary is:
- particle-domain work owns particle sampling, probabilistic assignment tables, search, class assignment, shift/in-plane updates, and partition-local outputs
- class-average assembly/restoration owns class-average sums, even/odd outputs, merged class averages, FRC/class documents, and project output metadata
3. Ownership Policy¶
simple_commanders_abinitio2D.f90 owns:
- the
abinitio2Dentry point - top-level defaults
- run orchestration across stages
- initial reference handling
- final fill-in dispatch
- terminal greedy all-particle dispatch after sampled staged updates
- final class-average generation/ranking
This layer should stay thin enough that stage rules are readable elsewhere.
simple_abinitio2D_controller.f90 owns:
- stage counts and stage constants
- low-pass limit helpers
- stage-local
cluster2Dcommand construction - search-mode policy by stage
- sampled-update policy, including
NPTCLS2SAMPLE_2Dandnsampleoverride handling - the rule that stage 1 may sample particles but does not fractionally restore previous class averages
simple_cluster2D_strategy.f90 owns:
- shared-memory versus distributed execution selection
- iteration control inside one
cluster2Dinvocation - scheduler interaction
- probabilistic pre-alignment dispatch
- distributed worker scheduling
- distributed class-average assembly dispatch
- convergence and run-finalization bookkeeping
simple_strategy2D_matcher.f90 owns:
- particle-domain alignment/search
- reproduction of the probabilistic sampled subset when
prob_align2Dis active - strategy-object selection
- sigma updates during Euclidean search
- writing orientation updates
- writing distributed partial class-average sums when running as a worker
The 2D matcher must preserve a single particle-stack read per batch in the online alignment/restoration path. Batch construction should keep the already-read raw particle images for class-average restoration, and restoration should consume those in-memory images after assignment in the same batch.
Do not split online class-average restoration into a second full particle pass that re-reads image stacks to lower peak memory. Offline or terminal class-average assembly commands may have their own explicit reads, but that is separate from the matcher worker's online single-read contract.
simple_commanders_mkcavgs.f90 and the classaverager modules own:
- explicit class-average assembly from partial sums
- merged/even/odd class-average output
- class-document generation
- class FRC output and project output metadata
4. Sampling and Fractional-Update Policy¶
abinitio2D uses a fixed run-local target sample size:
- default:
NPTCLS2SAMPLE_2D = 200000 - override:
nsample=<integer>
The effective update fraction is:
update_frac_2D = min(1.0, real(min(nptcls_eff, nsample_target_2D)) / real(nptcls_eff))
where nptcls_eff is the number of active particles with state > 0.
Stage policy:
- stage 1 uses a random sampled subset but disables fractional carry-over of previous class-average sums
- stages 2 and later use sampled update with fractional class-average restoration when the sample is smaller than the active set
- probabilistic stages preserve sample-once-and-reuse:
prob_align2Dchooses the subset, andprob_tab2D/cluster2D_execreproduce that subset rather than resampling - final fill-in is an assignment-only pass for active particles that still have
updatecnt == 0 - if any staged update used
update_frac,abinitio2Druns a terminalrefine=greedyall-particlecluster2Dpass withupdate_fracandfillindisabled, refreshing class, in-plane, and shift parameters before final class-average generation
The desired restoration model is class-local: each class average should carry forward previous sums according to the realized sampled fraction for that class. The current implementation has moved toward this policy; changes in this area should preserve class-local semantics where available and avoid reintroducing a single ambiguous global owner for sampled-update state.
5. Iteration Semantics¶
For cluster2D:
startitis the stage/invocation startwhich_iteris the current iterationextr_itertracks the 2D extrapolation/search scheduleenditis written after an invocation finishes and is consumed by the next stage setup
Do not collapse these counters into one another. Child command lines, including probabilistic pre-alignment and fill-in, must preserve the distinction between stage start and current iteration.
6. Artifact and Handoff Policy¶
Stable 2D workflow artifacts include:
assignment_part*.datandassignment.datdist_part*.datanddist.datcavgs_even_part*.mrc,cavgs_odd_part*.mrcctfsqsums_even_part*.mrc,ctfsqsums_odd_part*.mrccavgs_iterNNN.mrc,cavgs_iterNNN_even.mrc,cavgs_iterNNN_odd.mrcFRCS_FILEsigma2iteration filesptcl2D,cls2D,cls3D, andoutproject segments
Partition-local probabilistic assignment/dist files are per-iteration artifacts and should be removed before the next distributed iteration writes new ones. Class-average partial sums are different when fractional restoration is active: they are the carry-over input for the next iteration and must be preserved until the worker has read and updated them.
7. Review Checklist¶
For any abinitio2D or cluster2D change, check:
- Does the command layer remain mostly orchestration?
- Is stage policy in the controller rather than scattered through matcher or strategy code?
- Does probabilistic 2D sample once and then reproduce the same subset?
- Do shared-memory and distributed paths use the same scientific workflow?
- Are class-average assembly/restoration responsibilities explicit?
- Does online class-average restoration reuse the matcher batch images instead of introducing a second particle-stack read?
- Are stale distributed handoffs removed without deleting fractional class-average carry-over inputs?
- Are
startit,which_iter,extr_iter, andenditsemantics preserved? - Does fill-in remain assignment-only unless the policy is explicitly changed?
- When staged updates are sampled, does terminal greedy refresh all active particles before final class-average generation?
- Does the change preserve Cartesian-only
abinitio2D?
8. Rules to Preserve During Refactors¶
- Do not reintroduce a
polarbranch selector intoabinitio2D. - Do not bury stage-policy tables in the matcher.
- Do not let probabilistic pre-alignment and matcher update sample different particle subsets.
- Do not make distributed-only class-average assembly semantics diverge from shared-memory scientific behavior.
- Do not treat final fill-in as a normal class-average restoration stage.
- Do not use final fill-in as a substitute for the terminal greedy all-particle refresh when sampled abinitio2D updates were active.
- Do not reuse stale assignment files as valid current-iteration inputs.
- Do not re-read particle stacks in the online matcher/restoration path when the raw batch images are already available.
- Do not delete class-average partial sums at the start of a fractional-update iteration; workers need them as previous-sum carry-over.