Importance Sampling and Fractional Update Policy¶

This document records durable workflow contracts for sampled particle updates, probabilistic candidate sampling, fractional class-average restoration, and trailing reconstruction in abinitio2D, cluster2D, abinitio3D, and refine3D. It is policy, not a line-by-line implementation map.

1. Core Model¶

SIMPLE has two sampling layers that must remain separate:

outer fractional-update sampling chooses which particles participate in the current iteration
inner importance sampling chooses which reference, orientation, or in-plane candidates are explored for those participating particles

The outer subset is recorded in the project through sampled and updatecnt. Downstream restoration and reconstruction consume that recorded state. They must not infer participation from the nominal command-line update_frac alone.

Probabilistic pre-alignment is a sample-once-and-reuse path: the pre-alignment commander chooses the outer subset, probability-table workers reuse it, and the matcher reuses it again for the hard particle update.

2. Ownership¶

simple_commanders_abinitio2D.f90 owns abinitio2D orchestration: defaults, stage execution, final fill-in, and final class-average generation.

simple_abinitio2D_controller.f90 owns the 2D stage policy: NSAMPLE_DEFAULT_2D, nsample override handling, stage-local update_frac, search-mode transitions, and the rule that stage 1 may sample particles without fractionally restoring previous class averages.

simple_commanders_abinitio.f90 and simple_abinitio_controller.f90 own 3D stage scheduling: dynamic update_frac, fillin, frac_best, balance, trail_rec, and transitions between early prob_neigh modes, prob, and late prob_neigh.

simple_matcher_smpl_and_lplims.f90 owns the shared outer subset-selection helpers for 2D and 3D. This is where full update, random sampling, update-count-biased sampling, class-balanced sampling, fill-in sampling, and subset reproduction are dispatched.

simple_oris_sampling.f90 and simple_oris_getters.f90 own the bookkeeping: sampled, updatecnt, exact subset reproduction, global realized update fraction, and class-local realized update fractions.

simple_commanders_prob.f90 owns probabilistic pre-alignment orchestration: sampling the outer subset once, writing it to the project, running table generation, aggregating probability-table outputs, and writing the assignment artifact.

simple_eul_prob_tab*.f90 owns inner candidate importance sampling. These modules may sample references, orientations, neighbors, or in-plane candidates inside the active particle subset, but they must not choose a new particle subset.

simple_strategy2D_matcher.f90 and simple_strategy3D_matcher.f90 own particle-domain search on the active subset, assignment consumption, pose or class updates, sigma updates during search, and writing partition-local reconstruction or class-average inputs.

The classaverager modules own 2D class-average restoration and assembly. commander_volassemble owns 3D volume assembly and trailing reconstruction. These layers consume sampled-update state; they do not own particle selection.

3. Bookkeeping Contracts¶

sampled marks the current sampling round. All particles with the latest sampled value belong to the current active subset.

updatecnt tracks cumulative update history. Count-biased and fill-in paths use it to prefer under-updated or never-updated active particles.

sample4update_reprod is the only correct way to reuse a previously selected probabilistic subset. A probability-table worker or downstream matcher must not silently resample when a probabilistic pre-step has already sampled the subset.

get_state_update_fracs returns state-local realized update fractions for 3D trailing from the current sampled round, state labels, and particles with updatecnt > 0. get_update_frac remains available for callers that need the legacy global summary.

get_class_update_fracs returns per-class realized update fractions for 2D class-average carry-over. It uses active particles, current class assignments, the latest sampled round, and updatecnt > 0.

The nominal update_frac is a target used by sampling. The realized fraction in simple_oris is the downstream restoration and trailing contract.

4. Abinitio2D and Cluster2D¶

abinitio2D uses a fixed run-local target sample size:

default: NSAMPLE_DEFAULT_2D = 200000
override: nsample=<integer>

The stage controller converts that target into:

update_frac_2D = min(1.0, real(min(nptcls_eff, nsample_target_2D)) / real(nptcls_eff))

where nptcls_eff is the number of active particles with state > 0. If the target covers almost all active particles, the stage command omits update_frac and naturally becomes a full update.

Current stage policy:

stage 1 uses the sampled-update machinery when needed, but fractional class-average carry-over is disabled
while startit == 1, sample_ptcls4update2D keeps the initial subset sticky by reproducing it after the first random draw
later non-probabilistic iterations use sample4update_cnt, which is stochastic but biased toward particles with lower updatecnt
probabilistic stages use prob_align2D to sample once, then prob_tab2D and cluster2D_exec reproduce the same subset
staged fillin=yes currently acts as a full-assignment coverage guard. It requires active particles to have assignments before convergence, while particle selection still follows the normal sampled-update path
staged abinitio2D refinement uses sampled SNHC (refine=snhc_smpl) for stages 1-2. From stage 3 onward, refine=prob uses dense probabilistic assignment; refine=prob_snhc uses sparse probabilistic SNHC until the final staged invocation, which uses dense refine=prob
when staged updates were sampled, abinitio2D then runs a separate terminal dense probabilistic all-particle pass with update_frac and fillin disabled, refreshing class, in-plane, and shift parameters before final class-average generation

Fractional 2D restoration is class-local. cavger_init_online reads or centers previous partial sums when fractional update is active, obtains per-class realized fractions through get_class_update_fracs, and weights previous even/odd class sums and CTF-squared sums independently for each class. This is the 2D analogue of respecting independently updated objects in 3D.

Distributed cleanup must preserve class-average partial sums while fractional restoration still needs them as carry-over input. Assignment and distance artifacts are per-iteration handoffs and may be removed before the next iteration writes replacements.

5. Abinitio3D and Refine3D¶

The 3D controller derives the abinitio3D outer update policy from nsample. The resulting update fraction is capped by UPDATE_FRAC_MAX.

Current high-level ab initio stage policy:

stage 1 uses prob_neigh with prob_neigh_mode=snhc
stage 2 uses prob_neigh with prob_neigh_mode=shc
stages 3-5 use prob
final neighborhood stages use prob_neigh
stages 1 and 2 use the same nspace=1000
stage 1 keeps its low-pass limit but reuses stage 2 box_crop and smpd_crop
final active stages may switch to fillin, except where the multi-state policy disables it

For abinitio3D multivol_mode=independent, the default policy is an inspection-first multi-state run: nstages=5 and lpstop=6.0 A unless the user overrides them. This stops after the prob phase and before prob_neigh, staged NU filtering, independent-mode trailing reconstruction, and staged automasking. The workflow still runs the final reconstruction step at the configured last stage so it writes inspectable final state volumes. To increase the chance that all active particles receive assignments before that exit, independent mode starts stochastic balanced sampling at stage 4: the child refine3D stages use greedy_sampling=no with frac_best=1.0 from stage 4 onward. This keeps class-balanced quotas but draws from the whole class, not a top-ranked fraction. The outer particle target remains the fixed nsample-derived update fraction at every stage.

abinitio3D multivol_mode=docked has an explicit split/update epoch policy. Stages before the split run as one state. The default split stage is 6, so the split occurs after stage 5. Docked early stops before the split are rejected. At the split, the commander restores the requested state count, recomputes the post-split nsample-derived update target, clears ptcl3D%sampled and ptcl3D%updatecnt, randomizes active particles into balanced uniform state labels, validates that every split state is populated enough for probabilistic multi-state tables, and reconstructs split state volumes. Pre-split stages continue to use the single-state update target.

The first post-split stage uses refine=prob_state, removes update_frac, nsample, and fillin, and therefore processes all active particles without fractional particle sampling. It also keeps trailing reconstruction off, preventing pre-split mixed-volume memory from being blended into the split-stage volumes. Later post-split stages restore the fixed nsample-derived fractional particle target and trailing reconstruction inside the new multi-state epoch.

Because the split clears the counters, updatecnt after the split is post-split multi-state update history, not single-state history. Final docked reconstruction requires every active particle to have a post-split update.

sample_ptcls4update3D applies the normal 3D subset policy:

if fractional update is off, select all active particles
if balance=yes, use class-balanced sampling. If the sampling had been setup with partition=yes, the class-balancing is based of the clustering of the underlying classes as materialized by cluster_cavgs
otherwise use update-count-biased sampling

sample_ptcls4fillin is a separate late-stage coverage policy. Its purpose is to update particles with insufficient history, not to preserve the normal balanced or count-biased exploration distribution.

The 3D matcher writes partial reconstructions from the active subset. Volume assembly then restores volumes, calculates FSCs, postprocesses references, and applies trailing reconstruction when requested. Trailing uses an explicit ufrac_trec only when the parsed params%l_ufrac_trec_defined flag is true and the run is single-state; otherwise it consumes realized per-state fractions from get_state_update_fracs. The numeric params%ufrac_trec field has a default value and must not be interpreted as an active override by itself. Multi-state convergence reporting records those effective state-local fractions as TRAIL_REC_UPDATE_FRAC_STATE01, TRAIL_REC_UPDATE_FRAC_STATE02, and so on.

6. Probabilistic Pre-Alignment¶

Probabilistic pre-alignment is not a second outer sampler.

The workflow is:

choose the outer subset through the normal 2D or 3D sampling helper
write the sampled project state
run probability-table generation only for that subset
aggregate table outputs into one assignment artifact
reproduce the same subset in the matcher
perform the hard particle update

simple_eul_prob_tab.f90, simple_eul_prob_tab_neigh.f90, and simple_eul_prob_tab2D.f90 perform candidate-level importance sampling inside that subset. They may use score-derived candidate distributions, angle_sampling, greedy_sampling, or neighborhood sampling, but the selected particle set is already fixed before they run.

7. Restoration and Assembly¶

2D class-average restoration consumes class-local realized update fractions:

previous class contribution: 1 - rho(class)
current class contribution: the new partial sums for that class

Classes with no active updated particles keep a zero realized fraction. Classes with full sampled participation replace previous sums.

3D volume assembly consumes realized state-local update fractions for trailing:

previous state-volume contribution: 1 - update_frac_trail_rec(state)
current state-volume contribution: update_frac_trail_rec(state)

Neither class-average restoration nor volume assembly should make new particle sampling decisions. If a restoration or assembly change requires a different subset policy, that policy belongs in the commander/controller/sampling-helper layer and must be reflected in sampled and updatecnt.

Online matcher restoration/reconstruction paths must read active particle images from disk once per batch and reuse those batch images for both matching and restoration/reconstruction. Do not introduce a trailing full reconstruction or class-average restoration pass that re-reads image stacks as a memory optimization unless the single-read performance contract is explicitly changed. Probabilistic table-generation programs and explicit offline assembly commands are separate workflow stages and may perform their own reads.

8. Invariants¶

Outer particle sampling happens before probabilistic table generation.
Probabilistic table workers and downstream matchers reproduce the same subset.
Candidate importance sampling never changes the particle subset.
sampled remains the current-round marker.
updatecnt remains cumulative update history.
Downstream restoration uses realized update state, not only nominal update_frac.
Stage 1 of abinitio2D may be sampled but must not fractionally carry over previous class-average sums.
The abinitio3D docked split starts a new multi-state sampled/updatecnt epoch.
Independent multi-state abinitio3D defaults to a five-stage, lpstop=6.0 A inspection run, starts stochastic balanced sampling at stage 4, and still writes final reconstruction outputs.
Docked split-stage refinement must not use trailing volume averaging; later post-split stages restore trailing inside the new multi-state epoch.
2D fractional class-average restoration remains class-local.
Staged abinitio2D fillin=yes remains a full-assignment coverage guard unless the implementation is deliberately changed to missing-only assignment.
Sampled abinitio2D runs a terminal dense probabilistic all-particle refresh before final class-average generation.
volassemble and the classaverager remain consumers of sampled-update state, not producers of particle-selection policy.
Online matcher restoration/reconstruction reuses the particle images already read for the current batch.

9. Review Checklist¶

For sampling, probabilistic alignment, class-average restoration, or volume assembly changes, check:

Does the outer subset get selected exactly once for a probabilistic pre-alignment iteration?
Do table workers and matchers reuse the recorded subset through sample4update_reprod?
Is candidate-level importance sampling kept separate from particle-level subset selection?
Are sampled and updatecnt updated consistently before downstream restoration or trailing consumes them?
Does 2D restoration use class-local realized fractions?
Does 3D trailing consume the realized or explicit trailing fraction?
Does the online matcher path preserve one image-stack read per particle batch?
Are shared-memory and distributed paths preserving the same scientific workflow and artifact contracts?