aminak

Roadmap Review — Round 3 (biologist + bioinformatician)

Reviewer agent role: structural bioinformatician + medicinal-chemistry biologist. Severity: HIGH / MEDIUM / LOW. Round 3, against ROADMAP.md v2 (post-R2 corrector pass).

Overall verdict

CONDITIONAL_PASS — one HIGH-severity hole remains (the E1b alignment script does not work on Vina pose PDBQT inputs as written), plus two MEDIUM and a couple of LOWs. Five of the six R2 sign-off items are genuinely closed. The roadmap is almost execution-ready: if the corrector fixes E1b’s input-format assumption and pre-bakes the LR peptide sequence + the PubChem similarity REST endpoint, R4 should be PASS. We are NOT in cosmetic-churn territory yet but we are close.

Status of R2 sign-off items (6 items)

HIGH-1 (CID verification)CLOSED with caveats. The placeholder text is gone. anchor_compounds_verified.json exists and the v2 §A2 script compares against it as ground truth. The eight rejected wrong-CID audit trail is excellent and exactly what R2 demanded. Caveats baked-in for the v2 audit below: three rows (pemetrexed, plevitrexed, nolatrexed) still carry inchikey: null and rely on a RDKit-derived runtime InChIKey — that is acceptable iff the SMILES used for derivation is itself trusted (see “Verified-anchors JSON audit” below). One Tier-1 row (pemetrexed CID 135410875) is questionable — see NEW-issue [HIGH] below.

HIGH-2 (PROLIF water-bridge)CLOSED in the conceptual sense (PROLIF is no longer asked to do impossible work; a separate E1b script was added) but the script itself has a NEW HIGH-severity bug. See “New v2 issues” — the script loads a .pdbqt file with mda.Universe(pose_pdbqt) and then align.alignto(... select="protein and name CA and segid A"). The Vina pose PDBQT contains only the ligand, no protein, no segid, no Cα. The alignment selection will return an empty AtomGroup and MDAnalysis will raise. The water-bridge gate cannot run as written.

HIGH-3 (HPEPDOCK fallback)CLOSED. The §D envelope table (60 s POST timeout / 5-min polls / 4 h per-job TIMEOUT / 3× retry with exponential backoff / CABS-dock named fallback / > 50 % TIMEOUT triggers strategy abort / extended S3 for service-down) is a clean specification. The scrambled-sequence control (numpy seed=42 permutation, ≥ 2 kcal/mol separation) is now concrete. Two residual concerns flagged below: poll-count math, and CABS-dock-as-2026-fallback viability.

MEDIUM-1 (cofactor box centre)CLOSED. §A row 2 now reads “computed once from the holo receptor cofactor_A.pdbqt and reused unchanged for both apo and holo dockings — the apo receptor has no D16 by definition.” One sentence; exactly the fix R2 asked for.

MEDIUM-2 (PLIP install gate)CLOSED. The four-step bash gate (pip → from openbabel import openbabel → fallback import openbabel → final from plip.structure.preparation import PDBComplex) is the right shape; PLIP_STATUS = skip_plip writes the sentinel; PROLIF is explicitly noted as RDKit-independent of this gate. Acceptable.

MEDIUM-3 (HPEPDOCK scrambled control)CLOSED. Concrete spec in §D: numpy.random.default_rng(seed=42).permutation(...); submitted as separate job; canonical_top1 ≤ shuffled_top1 − 2 kcal/mol else peptide_specificity_unreliable = True. One residual LOW: the R2 reviewer asked whether the Cys position should be preserved (Cys is the only nucleophile in LSCQLYQR — randomising it changes chemistry meaningfully). v2 does not address this; impact is bounded because it’s an internal control, not a docking-mode question.

Verified-anchors JSON audit

Per-anchor verdict against my own knowledge + the published literature:

New v2 issues (in addition to R2 carryover)

Execution-readiness verdict

NO, not yet — but only barely. A fresh agent armed with the roadmap + the repo + Vina/HPEPDOCK/RDKit/MDAnalysis/freesasa/FPocket/PROLIF/PLIP could execute Strategies 1, 2, and 4 essentially as written. The three blockers for full readiness are:

  1. E1b water-bridge script crashes on first invocation because Vina pose PDBQT has no protein atoms to align by Cα (NEW-HIGH). Strategy 1 cannot complete its post-analysis as specified.
  2. Pemetrexed CID 135410875 identity is unverified; the A2 gate will silently pass for this row because expected_key = null (NEW-HIGH).
  3. ConnectivitySMILES field name is wrong; A2 will fail with KeyError on the dict access for the three null-InChIKey rows (NEW-MEDIUM).

The LR-peptide sequence + DUD-E endpoint + HPEPDOCK endpoint + PubChem similarity endpoint omissions are LOW-cost rewordings that any competent agent can resolve by reading the JSON and one minute of documentation; they would not by themselves block execution but they would slow it.

Sign-off requirements for R3 → R4

Corrector agent must, before R4 PASS:

  1. Fix the E1b script. Either load the receptor PDB(QT) explicitly and do the Cα alignment on the receptor, then apply the transform to the pose; or accept that 1HVY-frame Vina runs need no alignment and check pose-vs-water distances directly. Verify the chosen path actually runs on a representative *_apo_seed42.pdbqt file.

  2. Resolve pemetrexed. Bake the literature InChIKey for (S)-pemetrexed (free acid) into anchor_compounds_verified.json. If PubChem CID 135410875 returns the expected InChIKey, keep it; otherwise switch the CID. Add a one-line note explaining the choice between 135410875 / 446556 / 60843.

  3. Replace ConnectivitySMILES with IsomericSMILES in §A2 (both the URL path and the dict access). This is a one-token edit.

  4. Bake the LR-peptide E. coli source sequence into §A3 prose (not just the JSON).

  5. Write down the four web endpoints explicitly: DUD-E URL + submission format, HPEPDOCK URL + submission format + result-retrieval, PubChem similarity REST endpoint with Threshold=70, CABS-dock URL. One paragraph in §1.2 + §D suffices.

  6. (LOW) One sentence in §D pre-flighting HPEPDOCK + CABS-dock liveness with curl -I before the first submission, aborting early if both are down rather than burning 4 h to TIMEOUT.

If items 1–3 land cleanly, items 4–6 are minor and R4 should PASS. We are one corrector pass away from execution.

End of Round 3 review.