Sep 25, 2025
Introduction
At BEIT, we specialize in pushing the boundaries of molecular modeling by blending high-performance classical computing with innovations inspired by quantum computing. Our interdisciplinary team in Kraków has achieved two exciting milestones: (1) running the first-ever molecular dynamics simulation of a protein on a wafer-scale engine and (2) developing a hybrid quantum-classical simulation framework for covalent drug-receptor interactions. These advances, achieved without resorting to hype or buzzwords, showcase how novel computing architectures and algorithms can accelerate drug discovery in very tangible ways.
In this blog post, we’ll dive into how we performed an ultrafast simulation of an antimicrobial peptide on the Cerebras Wafer-Scale Engine (WSE) and how our “Project Angelo” is marrying classical simulations with quantum computing techniques to model drug binding more accurately. Let’s explore how these breakthroughs came about and why they matter for the future of molecular modeling.
Breaking the Molecular Dynamics Timescale Barrier
Molecular dynamics (MD) simulations are a cornerstone of computational biology and drug design, allowing us to simulate the motion of atoms in a molecule (or collection of molecules) over time. However, MD is notoriously time-consuming: simulating even a few microseconds of real time can take days or weeks on a traditional supercomputer. This is because MD requires tiny time steps (on the order of femtoseconds) to capture atomic vibrations, meaning millions of steps are needed to reach biologically relevant timescales[1]. Even the world’s fastest supercomputers struggle with this - for instance, running a large biomolecular simulation on the exascale Frontier supercomputer might produce only microseconds of simulated time in a month[2].
That’s where wafer-scale computing enters the scene. The Cerebras Wafer-Scale Engine (WSE) is a massively parallel processor built on an entire silicon wafer, integrating hundreds of thousands of cores on one chip. Our team leveraged the latest Cerebras WSE-3, which contains approximately 900,000 compute cores and 44 GB of on-chip memory, delivering an unprecedented 21 petabytes per second of memory bandwidth[3]. This unique architecture also provides ultra-fast core-to-core communication - essentially a 2D grid network across the wafer - with latencies of only a single clock cycle between neighboring cores[4][5]. In practical terms, a wafer-scale engine can exchange data between hundreds of thousands of parallel threads far more efficiently than a cluster of GPU nodes. In fact, a single wafer-scale processor has demonstrated MD performance hundreds of times faster than a top supercomputer for certain large systems[6]. However, systems benchmarked so far on WSE have been limited to quasi 2-dimensional slabs of identical atoms, imitating pure Tungsten metal, with limited use in biochemical sciences and drug discovery. We took on the challenge to extend the usefulness of the WSE by simulating a genuine 3D biochemical system.
Nanotube - WSE simulation | 6A5J - WSE simulation |
First Peptide Simulation on a Wafer‑Scale Engine
We set out to harness this wafer-scale power for a biological molecule - something that, to our knowledge, had never been done before on this platform. The test case we chose was L-K6, a small antimicrobial peptide (13 amino acids) known for its potent activity. L-K6 is an analog of the frog peptide temporin-1CEb (isolated from the skin of the Chinese brown frog Rana chensinensis)[7]. In other words, it’s a real biologically active mini-protein, making it an interesting target for simulation. Using a custom MD engine we developed (internally dubbed “WaferMol”), we successfully ran an MD simulation of L-K6 on the Cerebras WSE-3 - marking the first time a protein’s dynamics have been simulated on a wafer-scale processor.
How did we map a 3D molecule onto a 2D wafer? This was a key innovation. We designed efficient algorithms to embed the 3D structure of the molecule onto the 2D grid of cores on the wafer. Essentially, each atom in the peptide can be assigned to a processor core, and we organize communication such that each core exchanges information with cores representing neighboring atoms (within the force interaction cutoff). By orchestrating communication in “neighborhood multicast” patterns (broadcasting atomic coordinates in waves across the grid)[5], we ensure that all the forces on each atom can be computed in parallel without network bottlenecks. This approach takes advantage of the WSE’s capabilities where every core can talk to its neighbors with minimal latency and without large message overhead[8][9].
The result: our wafer-scale MD simulation achieved extremely high speed. In initial runs, we reached on the order of 10,000 simulation steps per second for the L-K6 peptide - a rate that would be extremely difficult on conventional hardware for such a system size. For perspective, that means we can simulate roughly 10 picoseconds of molecular motion every second of wall-clock time. Moreover, we see a lot of room for optimization; with further tuning, we project that we can improve this performance by as much as 50×. That would push the simulation rate into the hundreds of thousands of steps per second, moving us closer to the goal of millisecond-scale biomolecular simulations in hours or days instead of years[2].
It’s worth noting that as we scale to larger molecular systems, the absolute step rate will naturally decrease (because more atoms per step means more total work and communication). Even so, wafer-scale architecture fundamentally alters the strong-scaling curve for MD. To illustrate scalability, we also ran our code on a carbon nanotube model (~840 atoms) and even a massive graphene sheet with ∼840,000 atoms - the latter approaching the upper limits of the WSE’s peak performance capacity. Both could be simulated stably, and the 840k-atom graphene benchmark in particular highlighted how the WSE can handle system sizes that normally require large supercomputers. (In fact, 800k+ atom MD simulations on a WSE have been shown to run hundreds of times faster than on traditional HPC clusters[6].) These tests give us confidence that WaferMol can tackle not only small proteins but also large biomolecular assemblies or materials, all on a single wafer-scale chip.
<small>(We’ve included a short video clip of the carbon nanotube and graphene simulations as supplemental material in this post - demonstrating the kinds of systems and dynamics we can explore with wafer-scale MD.)</small>
Towards Faster Drug Discovery - Why This Matters
Beyond breaking speed records for the sake of it, why do we care about running MD this fast? The ultimate motivation is to accelerate drug discovery. One particularly challenging and crucial task in early-stage drug design is computing binding free energies - basically, how strongly a given drug molecule binds to its target protein. These free energy calculations (for example, via free energy perturbation (FEP) or related methods) are considered a gold standard for predicting drug potency, but they are computationally intensive. A single drug-protein binding free energy estimate can require days of MD sampling and extensive averaging to get a reliable result. This slowness has limited the use of FEP in practice, despite its high accuracy.
By dramatically increasing MD throughput with wafer-scale hardware, we open the door to doing many binding free energy calculations in parallel or in rapid succession. Our goal is to achieve “fast FEP” - turning what used to be a multi-day supercomputing task into something that might be done overnight or faster. This could enable, for instance, virtual screening of many drug candidates with rigorous physics-based scoring rather than resorting to cheaper, less accurate approximations. In the long run, being able to quickly compute binding free energies means drug designers can iterate faster and focus on the most promising compounds with confidence. In short, faster MD = more opportunities to find the needle in the haystack in drug discovery.
Project Angelo: Marrying Classical and Quantum Simulations
While wafer-scale engines tackle the speed problem for classical simulations, some challenges in molecular modeling require more than just speed - they require new levels of accuracy. Specifically, when chemical bonds are made or broken (as in many drug mechanisms), classical physics alone can falter. Covalent drugs (which form a covalent bond to their target) are a prime example: to model the bond formation properly, you need to use quantum mechanics explicitly for the electrons. This is where our second major initiative comes in: Project Angelo, a hybrid quantum-classical computational suite for accurately and efficiently modeling covalent drug-protein binding.
In Project Angelo, we combine the best of both worlds: classical molecular mechanics (fast and scalable) for most of the system, and quantum mechanics (accurate for chemical reactions) for the key region where a bond forms. In practice, this looks like a QM/MM setup: the protein and bulk solvent are treated with classical force fields, while the drug and the amino acid it binds to (and perhaps a few neighboring atoms) are treated with a quantum chemical calculation. The twist is that we’re pushing this concept further by incorporating actual quantum computing into the workflow for the quantum part, alongside classical quantum chemistry methods.
Covalent Binding Use-Case: A BTK Inhibitor
Our proof-of-concept target in Angelo is Bruton’s Tyrosine Kinase (BTK), specifically a selected latest covalent inhibitor cancer drug, binding to a cysteine residue in the BTK active site. BTK is a protein involved in cancer growth pathways, and our classified molecule forms a covalent bond with cysteine in the binding pocket. Understanding the reaction pathway and energy landscape of that covalent bond formation is crucial: it can tell us how favorable the reaction is, what the activation energy (barrier) is, and thereby how quickly the drug will bond (which relates to its efficacy). This is a perfect testbed for our hybrid approach - it has a clear quantum piece (formation of a S-C chemical bond) in a complex protein environment.
Early Achievements in Angelo
Even though full integration of the system is ongoing, we’ve already made significant progress on the quantum computing side of Angelo. Here are some key achievements so far in our quantum-classical journey:
Simulating a chemical reaction with VQE: We used the Variational Quantum Eigensolver (VQE) algorithm to calculate electronic energies along the reaction path of our model system (Michael addition, representative of the cysteine-drug bonding). We employed a UCCSD ansatz - a quantum circuit based on the Unitary Coupled Cluster method with single and double excitations, which is a chemically-informed way to approximate the molecular electronic wavefunction, along with our other original classified ansatzae. These VQE simulations were run on classical hardware using NVIDIA’s CUDA-Q library, essentially leveraging GPUs as a quantum circuit simulator. By stepping along the reaction coordinate and running quantum solver at each point, we mapped out an energy profile (think of it as a quantum-derived potential energy surface) for the bond formation. This is like running a quantum chemistry calculation, but formulated as an algorithm that could eventually run on a quantum computer. We ensure scalability of the adopted approach via a novel methodology of splitting a $n$-qubit unitary operation into multiple lower-qubit number operations, with some extra classical post-processing to tie the results together.
New unique quantum-classical system partitioning technique based on multiparticle entanglement: we devised a new quantum system partitioning scheme based on Density Matrix Embedding Theory (DMET) and quantum information methods to ensure an optimal division between classical and quantum computer calculations, hence for the most efficient use of available quantum resources. Our multiparticle-entanglement measures are intertwined with the DMET methodology and state-of-the-art QM/MM coupling methods, producing hopefully a new state-of-the-art methodology.
High-performance quantum circuit simulation: To make the above possible, we benchmarked and optimized our state-vector simulations for circuits up to 24 qubits in size. While 24 qubits might sound small, simulating a full quantum state vector of that size is quite demanding (the state vector has $2^{24}$ amplitudes, i.e., ~16 million complex numbers). By using GPU acceleration and clever memory management (through CUDA-Q, access to latest NVIDIA hardware and custom code), we ensure that we will push our simulations further. This capability will let us test more complex ansätze or bigger active spaces in the near future. In fact, our approach is so scalable that in an earlier benchmarking we could simulate certain circuits for systems up to 100+ qubits using tensor network techniques on GPUs (though those were for special cases). The upshot: we have a robust classical simulation pipeline to support our quantum algorithm development.
Preparing to cut large quantum circuits: We are now implementing a novel technique for circuit reduction - this approach, co-developed with our academic collaborators, is a form of quantum circuit cutting or fragmentation and is a promise of big win for near-term applicability of quantum computing in drug discovery.
Hardware-efficient ansätze: In parallel, we’re tailoring our quantum circuits to real hardware constraints. We’ve started implementing hardware-efficient ansätze, which are types of quantum circuits designed to respect the connectivity and gate limitations of actual quantum processors. These circuits often use repetitive layers of parameterized single-qubit rotations and entangling gates in a pattern that a given hardware (like a superconducting or trapped-ion device) can execute with high fidelity. By optimizing our ansätze for specific quantum hardware, we increase the chance that our VQE runs will succeed when we deploy them on a true quantum processing unit (QPU). After all, an algorithm is only as good as its execution on the real machine - so this step is crucial for moving from simulation to reality.
It’s important to highlight that Angelo isn’t just about quantum algorithms in isolation - it’s about the integration of quantum and classical. For example, once we identify a reaction’s transition state and pathway via VQE (quantum part), we feed that information back into classical simulations (to, say, initialize MD runs or refine a force field around the transition state). Likewise, classical MD can sample various protein conformations which then need quantum evaluation for bond formation energetics in those states. A lot of our ongoing work is building the glue between these components, such as coupling terms between the QM and MM regions, efficient ways to update forces, and orchestrating computations across heterogeneous platforms. It’s a complex dance, but one that we believe will pay off by combining the strengths of each approach.
As part of our mission, we provide a unique roadmap, tiled with our IP protected quantum algorithms dedicated for future Fault-Tolerant Quantum Computers, that enable simulations of the biochemical processes modeled in Angelo and our other projects at an unprecedented scale, speed and accuracy. By doing so we secure our partners and collaborators with the best position on the classical computing end, most efficient utilization of existing quantum computing devices and future mighty fault-tolerant machines. See our publications highlighting this roadmap:
M Szczepanik and E Zak; Utilizing redundancies in qubit Hilbert space to reduce entangling gate counts in the unitary vibrational coupled-cluster method. The Journal of Chemical Physics 163 (2) (2025)
S. Pliś and E. Zak; Quantum Discrete Variable Representations. arXiv:2504.15841 (2025)
E Zak, J. Küpper and A. Yachmenev; Controlling rovibrational state populations of polar molecules in inhomogeneous electric fields of the Stark deceleration: molecular dynamics and quantum chemistry simulations, arXiv:2506.04798 (2025)
A New Chapter for Molecular Modeling
Taken together, these developments represent a new chapter in molecular modeling. On one hand, we have the raw speed of wafer-scale classical computing tackling problems that were previously intractable due to time constraints. On the other, we have the nuanced accuracy of quantum-enhanced methods tackling problems that were intractable due to electronic complexity. Bringing these two together - fast classical simulations and targeted quantum computations - is a powerful strategy for revolutionizing drug discovery workflows.
From a startup innovation perspective, it’s also a story about being agile and interdisciplinary. We’ve had to bring together expertise in computational chemistry, software engineering, high-performance computing, and quantum algorithms under one roof. It’s not every day that you find people discussing molecular force fields and qubit counts in the same meeting, but that’s exactly the kind of cross-pollination we encourage at BEIT. By avoiding the usual buzzword-laden approach and focusing on technical merit, we’ve managed to attract collaborators and partners who are genuinely excited about solving these hard problems with us.
What’s next? On the wafer-scale side, we’ll be scaling up simulations to larger proteins and perhaps designing new wafer-friendly algorithms for things like free energy perturbation (FEP) calculations directly. We’re also interested in exploring multi-wafer setups (imagine several WSE chips working in tandem) and hybrid AI+MD models where machine learning might steer simulations - the wafer can potentially handle AI workloads too. On the Angelo side, the next milestone is to run our quantum circuits on actual quantum hardware. We’re collaborating with quantum computing companies and plan to test our hybrid computing approach on a high-quality QPU (e.g. IonQ’s trapped-ion system or others) with a modest number of qubits, to see experimental results for our chemical system. This will be carried out alongside further QM/MM integration and eventually higher-level applications like screening libraries of covalent inhibitors (where we generate candidate molecules and evaluate them with our hybrid model).
In summary, we believe these innovations will open the way to faster and more accurate drug discovery - where simulations can truly guide decisions in real time. It’s an ambitious vision, but the pieces are coming together. If you’re as excited as we are about the future of molecular modeling - whether you’re a computational scientist, a drug hunter, or a technologist - we invite you to reach out and follow our journey. By combining cutting-edge hardware with novel algorithms, we’re aiming to transform how new medicines and materials are discovered. The era of wafer-scale molecular dynamics and quantum-enabled chemistry is just beginning, and we’re thrilled to be at the forefront of it.
Let’s shape the future of molecular modeling together. 🚀
Sources: Our technical claims and data are supported by internal R&D results and external benchmarks. For instance, the Cerebras WSE-3 specifications (900k cores, 21 PB/s bandwidth) can be found in Cerebras documentation[3], and its record-breaking MD performance (1.1 million steps/s on 800k atoms) was reported in collaboration with national labs[6]. The L-K6 peptide is described in the RCSB Protein Data Bank[7]. Details of our Project Angelo accomplishments are documented in our project reports. These sources underscore the transformative potential of the approaches discussed.
[1] [2] [4] [5] [6] [8] [9] Cerebras Breaks Exascale Record for Molecular Dynamics Simulations - Cerebras
https://www.cerebras.ai/blog/cerebras-breaks-exascale-record-for-molecular-dynamics-simulations
[7] 6a5j - solution NMR Structure of small peptide - Summary - 日本蛋白質結構資料庫