JRNLClub
Home
ProfileSearchWriteJobsMessages
Sign up
JRNLClub

Home

Sign inSign up

The Feed

The latest from your network.

Join JRNLClub

Where scientists connect.

Post short takes and long-form essays, @mention any colleague or paper, and build a network of peers you trust.

Sign upAlready a member? Sign in

Powered by trusted scientific infrastructure

bioRxiv · medRxiv · arXiv · OpenAlex · NIH RePORTER · NSF Awards · Crossref · Altmetric

α¹
Alpha1 Science Editorial1h ago
New paperauto

Top 1% Most Discussed biorxiv Preprints Added

Preprint· Alpha1

De novo design of RNA pseudoknots with deep learning

biorxiv · Townley, J., Kladwang et al.

Open on Alpha1 →
9
α¹
Alpha1 Science Editorial1h ago
New paperauto

Top 1% Most Discussed biorxiv Preprints Added

Preprint· Alpha1

Accurate protein stability prediction for small domains using mega-scale experiments

biorxiv · Cho, Y., Tsuboyama et al.

Open on Alpha1 →
9
α¹
Alpha1 Science Editorial1h ago
New paperauto

Top 1% Most Discussed biorxiv Preprints Added

Preprint· Alpha1

A conserved structural logic underlies sensor-helper NLR communication in the NRC immune receptor network

biorxiv · Toghani, A., Garro et al.

Open on Alpha1 →
6
α¹
Alpha1 Science Editorial1h ago
New paperauto

Top 1% Most Discussed biorxiv Preprints Added

Preprint· Alpha1

Hidden symmetries in network connectivity support ring attractor dynamics in the fly's neural compass

biorxiv · Hulse, B. K., Aneesh et al.

Open on Alpha1 →
6

Kuan-Jui (Ray) Su joined JRNLClub — Kuan-Jui (Ray) Su

6h ago

JRNLClub Editorial · 173 jobs got added to JRNLClub on May 26 — check out the job board

1d ago

Kuan‐lin Huang’s post is trending — Once you claim your scholarly account with matched name here, JRNLClub AI will…

1d ago

2 reactions

kosar HajNajafi joined JRNLClub — kosar HajNajafi

1d ago

JRNLClub Editorial · 108 jobs got added to JRNLClub on May 22 — check out the job board

5d ago

Lucia S joined JRNLClub — Lucia S

6d ago

Kuan‐lin Huang’s post is trending — How I won the NIH replication prize by using AI to validate drug targets at scale

6d ago

2 reactions

Kuan‐lin HuangKH
Kuan‐lin Huang· Icahn School of Medicine at Mount Sinai6d ago

Once you claim your scholarly account with matched name here, JRNLClub AI will help you generate an accurate CV in a minute that source all your papers! https://www.youtube.com/watch?v=89fq9jeLr3I

Watch on YouTube
99

jane shen joined JRNLClub — jane shen

6d ago

JRNLClub Editorial · 106 jobs got added to JRNLClub on May 20 — check out the job board

May 20, 2026

Rahul Veettil joined JRNLClub — Rahul Veettil

May 20, 2026
α¹
Alpha1 Science EditorialMay 20, 2026
New paperauto

Top 1% Most Discussed biorxiv Preprints Added

Preprint· Alpha1

Stromal Gasdermin D-mediated Pyroptosis Drives Maladaptive CD4⁺ T-cell Remodeling in Tet2-Deficient Hematopoiesis

biorxiv · Ji, P., Ren et al.

Open on Alpha1 →
86
α¹
Alpha1 Science EditorialMay 20, 2026
New paperauto

Top 1% Most Discussed biorxiv Preprints Added

Preprint· Alpha1

Maternal high-fat diet drives sex-specific microglia remodeling of serotonergic reward circuits

biorxiv · Bilbo, S., Patton et al.

Open on Alpha1 →
81
α¹
Alpha1 Science EditorialMay 20, 2026
New paperauto

Top 1% Most Discussed biorxiv Preprints Added

Preprint· Alpha1

PD-1 blockade drug holiday improves exhausted progenitor CD8 T cell (Tpex) reinvigoration by avoiding Tpex adaptive resistance

biorxiv · Wherry, E. J., Ngiow et al.

Open on Alpha1 →
74
α¹
Alpha1 Science EditorialMay 20, 2026
New paperauto

Top 1% Most Discussed biorxiv Preprints Added

Preprint· Alpha1

Targeted extracellular degradation of LRP8 promotes ferroptosis in cancer cells

biorxiv · Zhao, F., Inague et al.

Open on Alpha1 →
65

Ku Wai Lim joined JRNLClub — Ku Wai Lim

May 20, 2026

Shicheng Guo joined JRNLClub — Shicheng Guo

May 19, 2026

Mohamed El Moussaoui joined JRNLClub — Mohamed El Moussaoui

May 19, 2026

JRNLClub Editorial · 91 jobs got added to JRNLClub on May 19 — check out the job board

May 19, 2026

Bence Szalai joined JRNLClub — Bence Szalai

May 19, 2026

Ehsan Saghapour joined JRNLClub — Ehsan Saghapour

May 19, 2026
Kuan‐lin HuangKH
Kuan‐lin Huang· Icahn School of Medicine at Mount SinaiMay 19, 2026
Essay

How I won the NIH replication prize by using AI to validate drug targets at scale

About 90% of cancer drug candidates that enter clinical trials never make it to approval. A big chunk of that failure is upstream: the target was wrong. Two industry audits made this concrete years ago. Bayer reported in 2011 that only 20–25% of published cancer targets held up when their own scientists tried to reproduce them; Amgen in 2012 said just 6 out of 53 "landmark" oncology studies survived rigorous replication. We've known this for a long time. We just haven't had a way to do something about it at scale (at least in the published literature).

Manually re-validating every published target is tedious. You'd need to harmonize lots of CRISPR, omics, and other data, work out the right disease subgroupings, write the codes, run the stats, look at the output. Each target takes days to validate. Nobody's funded to do it (in academia). So most candidates sit there, cited, repeated, occasionally bankrolled into a screen.

So I tried something else because it's 2025 (when this was done). I gave the job to an AI agent (Biomni) and ran 31 published oncology targets through it in an afternoon. The compute cost $68 in Claude API credits. About two-thirds of the retracted-paper targets failed to replicate. Roughly two-thirds of the recent, non-retracted targets did. Compared to retracted ones, the non-retracted targets have a 17 O.R. to show bona-fide, context-specific dependency in the agent's analyses that I validated as correct.

The interesting part isn't the headline number. It's how to get an agent to do this kind of work without it making things up.

1. Find out what the agent can do reliably

Most of the hype around "AI scientists" frames the agent as a generalist that does everything. That's a trap. LLMs hallucinate, especially when asked to use tools or data that they either don't have access or know how to use. But they will almost always write you a beautiful, plausible, partly-wrong narrative.

The move is to find a task class where the agent is reliable, say, above 95% success rate on something you can score. For me that task is: given a gene target, a disease context, and a public dataset like DepMap or TCGA, test whether the gene shows context-specific cancer dependency. Narrow enough that the agent's job is mostly translating a hypothesis into code and stats. Reliable enough that I can trust the agent's executions.

2. Apply it across many use cases

Once you know the agent does one type of thing well, throw a lot of that thing at it. I built a table of 31 targets: 17 from retracted papers, 14 recent candidates with real-looking evidence. Each verbal target claim got translated into a structured natural language prompt with the same template. Gene, context, datasets to use, statistical contrasts to run.

When I first started playing with the agent, the biggest failure mode wasn't bad reasoning. It was the agent failing to gain access or download the right data files. Then it'd start hallucinating or simulating fake data for analyses. To stop this, I wrote a separate cancer-omics data know-how document that spelled out how to pull DepMap through the Bioconductor depmap package and how to grab TCGA Pan-Cancer Atlas data from the NCI Genomic Data Commons. This was before Anthropic released the Skills feature; today you'd just package it as a skill. Once the agent stopped fighting the data layer, the rest of the work got dramatically easier.

Two more constraints made the difference:

  • Forbid the agent from reading literature. I appended a non-overridable instruction: "You are a data-only replication agent. Do not use any literature search, papers, or external textual knowledge." Without that, the agent fills in gaps from training data, which means it tells you the consensus view of whatever paper it dimly remembers. You want what the data says.
  • Force everything into executable code. No prose conclusions. Every claim has to come from a notebook cell that loaded real data and ran a real test for me to review.

3. Validate the process before you trust the results

Before I believed anything the agent said about retracted targets, I needed proof it could find the real ones. So I seeded the panel with well-established synthetic lethal relationships: WRN in microsatellite-unstable tumors, PRMT5 in MTAP-deleted cancers.

The agent successfully re-derived the MTAP–PRMT5 relationship in detail. It stratified cell lines by copy number using a sensible 15% threshold it picked itself, compared dependency between groups, ran the dose-response across copy-number quartiles, and landed on effect sizes consistent with the literature and p-values from 10⁻⁹ to 10⁻¹¹. Once those controls worked, the rest of the panel became interpretable.

4. Look at every output myself

This is the unglamorous part nobody talks about. The agent produces 31 python notebooks. A human has to read it to validate and learn what happened. Did the data actually load? Did the statistical test make sense for the question? Did the agent silently swap in a different dataset when the first one failed? Did it interpret "wild type" the same way you meant?

I scored every one of the 31 notebooks manually. There are few components that was false after doing the aforementioned steps. The rest I coded supported, refuted, or inconclusive on two axes: context-specific dependency, and other supporting evidence.

Expert review isn't optional. The good news: it's faster than doing the analysis yourself. Maybe 15 minutes per notebook, against the several days it would take from scratch.


The most interesting result wasn't the big retracted-versus-non-retracted split. It was ALKBH5. The original paper was retracted, and the specific mechanistic claim (that miR-193a-3p regulates AKT2 through ALKBH5) didn't hold up. But the agent independently found that ALKBH5 itself is a real, glioma-selective dependency, with consistent CRISPR and RNAi signals, a strong correlation with stemness scores, a very strong negative correlation with the m6A gene signature, and a significant survival hazard ratio across gliomas.

You get insights like this because the agent decomposed the target claim into testable pieces and ran each one independently. That's the part I didn't expect, and it's the part that's made me think this approach generalizes well beyond target replication.

On AI Scientist Arena (aiscientistarena.com), I've benchmark LLMs and even without any sophisticated tool use or harness, they could predict clinical trial success beyond noise. If AI agents continue to improve in their capacity in all tasks across the drug discovery and development cycle, the best constructor of an entire clinical program might end up being an AI.

All of this — the prompts, the data and replication know-how documents, the 31 notebooks, the expert scoring — is at github.com/Huang-lab/AgentReplication. The bioRxiv preprint is at Agent-Driven Validation of Oncology Therapeutic Targets. This is part of the work that initiated the Accelerated Discovery with Agents (ADA) Consortium.

There's a version of this work that sounds bigger than it is. "AI agent validates 31 cancer drug targets in one hour" is technically true and somewhat misleading. The hour is the agent's compute time. Building the prompts, curating the targets, writing the know-how documents, and reviewing every notebook took weeks. The agent isn't doing the science. It's doing the implementation.

The science is still in deciding what to ask and whether the answer means anything to benefit humans.


Postscript, May 2026: This was my Track 2 submission to the NIH Replication Prize that was done in Nov 2025, which I thought was the better entry. My other entry, proposing mandatory release of participant-level clinical trial data, won Track 1.

Continue reading →
6 min read
320

Kailash B P joined JRNLClub — Kailash B P

May 19, 2026

JRNLClub Editorial · 35 jobs got added to JRNLClub on May 18 — check out the job board

May 18, 2026

Marek Wiewiórka joined JRNLClub — Marek Wiewiórka

May 18, 2026

Caroline joined JRNLClub — Caroline

May 18, 2026

Αlpha¹ Editorial joined JRNLClub — Αlpha¹ Editorial

May 16, 2026

Randy Aryee joined JRNLClub — Randy Aryee

May 16, 2026

JRNLClub Editorial · 31 jobs got added to JRNLClub on May 16 — check out the job board

May 16, 2026

JRNLClub Editorial · 28 jobs got added to JRNLClub on May 15 — check out the job board

May 15, 2026
Kuan‐lin HuangKH
Kuan‐lin Huang· Icahn School of Medicine at Mount SinaiMay 15, 2026
Essay

How I rebuilt Variant Effect Predictor to be 100x faster (fastVEP!)

Watch on YouTube

If you work with genomic variants, you know VEP. Ensembl's Variant Effect Predictor is the standard tool — the thing your pipeline calls to figure out whether a given mutation breaks a protein, hits a splice site, or sits harmlessly in some intron. It's been around forever and it works. It's also written in Perl, ships with a Perl 5.22+ requirement, ten-plus CPAN modules, a DBI dependency, and a small graveyard of installation issues anyone who's set up VEP from scratch will recognize.

The annotation itself is fine. The speed is not. Annotating 50,000 variants with VEP takes about 206 seconds. Point it at a full human WGS (~4 million variants) and it doesn't finish on the newest MacBook Pro. People work around this by splitting their VCFs, running parallel processes, and stitching the outputs back together. That works, but it's a huge time tax. A lab running thousands of samples pays that tax every day.

So I rebuilt it in Rust.

The numbers

fastVEP runs the same 50,000-variant file in 1.59 seconds. That's a 130x speedup. The full WGS that VEP can't finish? fastVEP does it in 86 seconds.

Peak memory drops from ~500 MB to 2.8 MB. The installed binary is 3.3 MB instead of ~200 MB of Perl plus dependencies. There are no CPAN modules to chase. You cargo install, you run a binary, that's it.

That's the headline. The interesting part is what actually made it fast. It wasn't one thing. It was the dumb stuff Perl couldn't do well, layered on top of a few good ideas.

What Rust gets you for free

A lot of the speedup is just what you get when you stop paying for an interpreter and a garbage-collected dynamic language. Tight loops over variant records compile to real machine code. Strings don't allocate when they don't need to. Parallelism is rayon and works; you don't fork ten Perl processes and reconstitute their output.

Thanks to agentic coding, doing this manageable with one person's effort for a full month. This involves knowing exactly how the algorithm works to instruct the coding agents, and verify extensively with tests and outputs. Mostly, the Sequence Ontology has 49 consequence terms; you map a variant's coordinates against a transcript and figure out which ones apply. The bottleneck in the Perl version is the Perl, not the algorithm.

If you stop there, you get maybe 10–20x. The rest came from somewhere else.

The next real win: rebuilding the annotation lookup

VEP's slowest path is annotation lookup: pulling in ClinVar, gnomAD, dbSNP, COSMIC, all the supplementary databases that turn raw consequence into something a clinician can act on. The default workflow round-trips through SQLite or remote APIs. For a million variants, that's a million lookups, and every one of them costs more than the consequence prediction itself.

The fix is to put the annotations in a format designed for the access pattern. fastVEP has its own binary format called fastSA, and the v2 design is shamelessly inspired by echtvar: thanks to Brent Pedersen's work & credit where it's due. The key improvements in my understanding:

  • Chunked ZIP layout with Var32 encoding for variant keys.
  • Parallel u32 value arrays per annotation field.
  • Delta encoding on sorted positions.
  • An LRU chunk cache, because variant lookups in a real VCF are clustered.
  • A Bloom filter in front of the index for negative lookups.

Putting ClinVar, gnomAD, and dbSNP into this format and querying them as a single in-process call is most of what closes the gap on the heaviest workloads. You're not asking a database anymore. You're doing memory-mapped byte arithmetic.

What surprised me

A few things I didn't expect going in.

The FASTA handling matters more than I thought. You need the reference sequence for HGVS notation, and a naïve read of the GRCh38 primary assembly is enough to wreck your memory budget on its own. Memory-mapping the indexed FASTA and pulling spans on demand was the difference between "fastVEP runs on a laptop" and "fastVEP needs a server." Apparent simplicity hides this kind of thing; samtools faidx is doing a lot of work for you.

Structural variants are genuinely separate code. SNVs and short indels share a clean abstraction. <DEL>, <DUP>, <INV>, <BND> and the rest don't slot into it cleanly. I tried for a while to unify them, eventually gave up, and wrote a separate SV consequence predictor.

HGVS was the worst part. Generating correct HGVSc and HGVSp notation with 3' normalization across all the edge cases — overlapping CDS, mitochondrial circular coordinates, start-loss variants in non-Met-starting transcripts — required more test cases than the consequence engine itself. There's a reason VEP has been worked on for a decade. The annoying details are plenty and real.

Correctness

A faster but wrongly annotated VCF isn't useful. fastVEP is validated against VEP's output on shared test sets and matches on the consequences that matter. The repo has 233 tests across the workspace, not because that number is magic, but because every annoying HGVS edge case eventually became one. If you find a case where fastVEP disagrees with VEP and you think VEP is right, open an issue. Let me know here!

Try it

It's on GitHub at Huang-lab/fastVEP, Apache 2.0. There's a hosted web version at fastVEP.org if you want to paste in some VCF and see what it does. If you have Rust installed, it's a single cargo install away.

It works on yeast, fly, arabidopsis, mouse, human, anything with a GFF3. The web server can switch between organisms if you point it at a directory of them. The preprint is on bioRxiv. If it saves your group some compute time, that's the point and I'm glad :) Watch on YouTube

Continue reading →
4 min read
203
Kuan‐lin HuangKH
Kuan‐lin Huang· Icahn School of Medicine at Mount SinaiMay 15, 2026
Repost

Checkout the JRNLClub demo to see what you can do here: https://youtu.be/tc_tdoC9LpI?si=1qtEiZ5pRpUEIL2t

Reposted
Kuan‐lin HuangKH
Kuan‐lin Huang· Icahn School of Medicine at Mount SinaiMay 14, 2026

Hello World, JRNLClub!

Open →
149
Kuan‐lin HuangKH
Kuan‐lin Huang· Icahn School of Medicine at Mount SinaiMay 14, 2026

Hello World, JRNLClub!

127
α¹
Alpha1 Science EditorialMay 12, 2026
New paperauto

Top 1% Most Discussed biorxiv Preprints Added

Preprint· Alpha1

IRES-TrAPPr reveals novel insights into viral and cellular mRNA translation

biorxiv · May, G. E., McManus et al.

Open on Alpha1 →
31
α¹
Alpha1 Science EditorialMay 12, 2026
New paperauto

Top 1% Most Discussed biorxiv Preprints Added

Preprint· Alpha1

Turnip mosaic virus-based gRNA delivery system for plant genome editing

biorxiv · Khwanbua, E., Lappe et al.

Open on Alpha1 →
29
α¹
Alpha1 Science EditorialMay 12, 2026
New paperauto

Top 1% Most Discussed biorxiv Preprints Added

Preprint· Alpha1

scLASER: a robust framework for simulating and detecting time-dependent single-cell dynamics in longitudinal studies

biorxiv · Vanderlinden, L. A., Vargas et al.

Open on Alpha1 →
28
α¹
Alpha1 Science EditorialMay 12, 2026
New paperauto

Top 1% Most Discussed biorxiv Preprints Added

Preprint· Alpha1

Complete biosynthesis of penicillin G in Nicotiana benthamiana

biorxiv · Rawoof, A., Lin et al.

Open on Alpha1 →
33
α¹
Alpha1 Science EditorialMay 12, 2026
New paperauto

Top 1% Most Discussed biorxiv Preprints Added

Preprint· Alpha1

Intermolecular 3'UTR-3'UTR interactions drive Wnt gene activation through heteromeric protein assembly

biorxiv · Cai, T., Cruz et al.

Open on Alpha1 →
30
α¹
Alpha1 Science EditorialMay 12, 2026
New paperauto

Top 1% Most Discussed biorxiv Preprints Added

Preprint· Alpha1

Iridescence in pterosaur pycnofibers and the evolution of integumentary coloration

biorxiv · wu, Z., D'Alba et al.

Open on Alpha1 →
25
α¹
Alpha1 Science EditorialMay 5, 2026
New paperauto

Top 1% Most Discussed biorxiv Preprints Added

Preprint· Alpha1

One-pot parallel Sidewinder construction from oligo pools

biorxiv · Robinson, N. E., Paul et al.

Open on Alpha1 →
26
α¹
Alpha1 Science EditorialMay 5, 2026
New paperauto

Top 1% Most Discussed biorxiv Preprints Added

Preprint· Alpha1

Polymeric mechanism of enhancer-promoter cooperativity in transcriptional bursting

biorxiv · YAMAMOTO, T., Kawasaki et al.

Open on Alpha1 →
30
α¹
Alpha1 Science EditorialMay 5, 2026
New paperauto

Top 1% Most Discussed biorxiv Preprints Added

Preprint· Alpha1

Cooling fast and slow: Characterising the effects of vitrification in cryo-EM and the subsequent recovery of equilibrium populations

biorxiv · Clark, R., Smith et al.

Open on Alpha1 →
25
α¹
Alpha1 Science EditorialMay 5, 2026
New paperauto

Top 1% Most Discussed biorxiv Preprints Added

Preprint· Alpha1

AI-guided discovery of atypical protein assemblies

biorxiv · Toghani, A., Seager et al.

Open on Alpha1 →
24
α¹
Alpha1 Science EditorialMay 5, 2026
New paperauto

Top 1% Most Discussed biorxiv Preprints Added

Preprint· Alpha1

Resolving human neuronal herpesvirus reactivation via petabase-scale association studies

biorxiv · Gutierrez, J. C., Chen et al.

Open on Alpha1 →
22
α¹
Alpha1 Science EditorialApr 29, 2026
New paperauto

Top 1% Most Discussed biorxiv Preprints Added

Preprint· Alpha1

Generative design of sequence specific DNA binding proteins

biorxiv · Sehgal, E., Politanska et al.

Open on Alpha1 →
23
Scroll for more
JRNLClub

Where scientists connect.

est. 2026

Product

  • Feed
  • Search
  • Jobs
  • Messages

Company

  • About

Legal

  • Terms
  • Privacy
© 2026 JRNLClub. All rights reserved.All systems operational
  • Home
  • Search
  • Write
  • Jobs
  • Inbox