Microchunk Rejection and Cleanup Policy¶
Scope¶
This document describes the current class-average rejection and particle cleanup policy in simple_microchunked2D.
The live stream path is implemented by:
src/main/stream/simple_microchunked2D.f90src/main/stream/simple_cluster2D_rejector.f90
model_cavgs_rejection and the shared src/main/cavg_quality backend are documented separately. They are not called by the current simple_microchunked2D rejection path in this tree.
Owners¶
collect_and_reject is the workflow owner. It detects finished jobs, calls rejection exactly once per eligible chunk, finalizes match chunks, updates runtime accepted/rejected counters, and marks completed lifecycle stages with sentinel files.
reject_cavgs owns per-chunk rejection and cleanup. It reads class averages, applies the rejection engine, writes selected/rejected class-average stacks, propagates class rejection to particle states, records selected-particle counts, and writes REJECTION_FINISHED.
cluster2D_rejector owns the scalar rejection criteria. It maintains the cumulative per-class rejection mask and exposes final rejected flags/states.
Generation routines consume rejection-complete chunks only after their project files for the next tier have been written successfully.
Public Policy¶
Class-average rejection runs after a chunk has finished ab initio 2D classification.
Policy invariants:
state=1means kept;state=0means rejected.- rejection criteria are cumulative within one pass.
- a rejected class maps to deselected particles.
- a deselected particle has
os_ptcl2D%class=0andos_ptcl2D%class_match=0. - a chunk is not consumed by the next tier until rejection is complete.
- lifecycle flags only advance:
abinitio2D_complete,rejection_complete,complete, orfailed.
Lifecycle Sentinels¶
Sentinels are authoritative for restart:
ABINITIO2D_FINISHED: chunk job completed.REJECTION_FINISHED: class-average rejection and particle cleanup completed.REJECTION_FAILED: rejection could not be performed safely.COMPLETE: chunk has been consumed or finalized and should not be reprocessed.
On import, simple_microchunked2D reconstructs chunk flags from these files:
failed = REJECTION_FAILED existsrejection_complete = REJECTION_FINISHED exists or failedcomplete = COMPLETE exists or failed
Chunks that are neither complete nor failed have their command lines regenerated for restart.
Rejection Eligibility¶
collect_and_reject marks a running chunk as abinitio2D_complete when ABINITIO2D_FINISHED appears, then calls reject_cavgs.
reject_cavgs returns without changing the chunk when:
abinitio2D_completeis false;failedis true;rejection_completeis true.
If the class-average stack is empty, or the number of class averages does not match the number of cls2D rows, reject_cavgs writes REJECTION_FAILED and COMPLETE, sets failed, rejection_complete, and complete, and exits.
Rejection Criteria¶
Criteria are applied in this fixed order:
- population
- FSC resolution
- mask geometry
- local variance
Population¶
Reject class i when:
pop(i) < ceiling(sum(pop) * threshold_fraction)
Equality with the threshold is kept.
Effective threshold fractions:
- pass 1: engine default
0.005 - pass 2:
DEFAULT_MICRO_P2_POP_THRESH = 0.0035 - refchunk:
DEFAULT_REF_POP_THRESH = 0.0025 - match:
DEFAULT_REF_POP_THRESH = 0.0025
Resolution¶
Reject class i when:
res(i) > RES_THRESHOLD
RES_THRESHOLD = 40.0 Angstrom. Equality with the threshold is kept.
Mask Geometry¶
For each class average, the rejector:
- edge-normalizes the image;
- low-pass filters to 30 Angstrom;
- applies Otsu thresholding;
- finds connected components;
- removes connected components whose diameter spans the full image;
- rejects the class if no valid component remains;
- rejects the class if any component centroid lies outside the mask radius;
- rejects the class if the largest component has more than
MASK_THRESHOLDpixels outside the mask disc.
MASK_THRESHOLD = 10.0 pixels.
Local Variance¶
For each class average, the rejector:
- edge-normalizes the image;
- low-pass filters to 10 Angstrom;
- applies Otsu thresholding;
- measures local variance inside and outside the foreground mask with a window of 10 pixels.
Classes with both local-variance scores near zero are rejected unconditionally:
abs(score_inside) <= ZERO_SCORE_EPS and abs(score_outside) <= ZERO_SCORE_EPS
ZERO_SCORE_EPS = 1.0e-6.
Remaining classes are robust-z-scored separately inside and outside the mask, excluding the zero-score classes. A class is rejected when one region is below the strong threshold and the other is below the weak threshold:
(z_inside < strong and z_outside < weak) or
(z_inside < weak and z_outside < strong)
Effective local-variance thresholds:
- pass 1: engine defaults
strong=-0.5,weak=-0.1 - pass 2:
strong=-1.0,weak=-1.0 - refchunk:
strong=-2.0,weak=-2.0 - match:
strong=-2.0,weak=-2.0
Per-Chunk Cleanup¶
After rejection, reject_cavgs:
- writes rejected and selected class-average stacks using
_rejected.mrcand_selected.mrcsuffixes; - writes JPEG contact sheets for non-empty selected/rejected stacks;
- sets rejected
os_cls2Dstates to zero; - mirrors the class states to
os_cls3D; - calls
map2ptcls_state; - clears
os_ptcl2D%classandos_ptcl2D%class_matchfor particles withstate=0; - writes the chunk project file;
- writes
REJECTION_FINISHED; - stores
chunk%nptcls_selectedfromos_ptcl2D%count_state_gt_zero(); - sets
chunk%rejection_complete=.true..
For the reference chunk, reject_cavgs also records the full class-average stack path and box size for match-chunk generation.
When DEBUG=.true., reject_cavgs also writes a _deselected project snapshot for inspection.
Inter-Tier Consumption¶
Pass-1 chunks are eligible for pass-2 generation only when:
rejection_complete and not complete and not failed
After a pass-2 project is written successfully, consumed pass-1 chunks are marked complete and receive COMPLETE.
Pass-2 chunks are eligible for reference generation under the same gate. Reference generation consumes all eligible pass-2 chunks, but those pass-2 chunks are not marked complete until match chunks are generated from them.
Match chunks are generated only after the reference stack and box are available. Each eligible pass-2 chunk is copied into one match chunk, then the source pass-2 chunk is marked complete and receives COMPLETE.
After LAST_IMPORT_TIMEOUT, if all pass-2 chunks are complete or failed and the reference chunk is complete, remaining rejection-complete pass-1 chunks can be merged into one final match chunk. Those pass-1 chunks are then marked complete and receive COMPLETE.
Match Finalization¶
For each match chunk with rejection_complete=.true. and complete=.false., collect_and_reject:
- counts selected particles as
state > 0; - increments runtime accepted/rejected particle counters;
- refreshes latest-match JPEG metadata;
- copies the project file to
completedir; - writes
COMPLETE; - sets
chunk%complete=.true..
Runtime counters are updated when match chunks are finalized in the current process. Sentinel files and completed project copies are the durable restart record.
Combined Output¶
combine_completed_match_chunks merges complete match chunk projects into microchunks_match_combined.simple or the caller-supplied combined project path.
The combined output:
- includes only complete match chunks;
- keeps particle class assignment but strips other 2D clustering parameters;
- replaces stale
cls2D,cls3D, and output metadata with reference chunk class metadata; - attaches the reference class-average stack;
- recomputes class populations from merged particle assignments;
- calls
map2ptcls_state; - no-ops when there are no match chunks, no complete match chunks, or the combined file already exists.
Finished State¶
The workflow is finished only when:
- at least one pass-1 chunk exists and all pass-1 chunks are complete or failed;
- at least one pass-2 chunk exists and all pass-2 chunks are complete or failed;
- the reference chunk is complete and not failed;
- at least one match chunk exists and all match chunks are complete or failed.
Verification¶
Engine-level unit coverage lives in:
src/main/stream/simple_cluster2D_rejector_tester.f90
Policy-sensitive checks:
- threshold boundary behavior for population and resolution;
- cumulative rejection semantics;
- mask rejection with no valid connected component;
- local-variance zero-score rejection and robust-z-score thresholds;
- rejection failure on empty or mismatched class-average stacks;
- sentinel reconstruction on restart;
- class-to-particle state propagation;
- match finalization counters and
COMPLETEwrites.