The Power of AI in Early Cancer Detection
At Harbinger Health, we believe that artificial intelligence (AI) is one of the most powerful tools available to transform how we detect cancer—enabling earlier detection, with greater accuracy, and for more people. Today, most cancers are diagnosed after symptoms appear, often when the disease is already advanced. That delay can significantly impact survival. Early detection greatly improve outcomes, yet current screening tools cover only a handful of cancers and aren’t always accessible.
We’re working to change that. By pairing our proprietary advanced AI technologies with foundational biology, we are developing blood-based tests capable of detecting cancer in its earliest stages. These tests can be performed from a simple blood draw, making them easy to administer as part of routine care.
Importantly, by improving specificity, our AI-driven tests can help minimize false positives, reducing the likelihood of unnecessary follow-up tests, imaging, or invasive procedures. This has the potential to ease emotional burden and avoid interventions that may not be clinically necessary.
This shift from reactive to proactive medicine has the potential to save lives, reduce the burden of treatment, and make cancer detection more equitable. Our AI enables us to detect signals that traditional tools may miss. That’s why it’s central to our mission of making early cancer detection accessible, affordable, and accurate for everyone.
Understanding AI in Cancer Detection
AI is transforming how we approach cancer detection by enabling us to analyze and interpret complex biological data in ways that were previously impossible. At Harbinger Health, we use a non-invasive approach known as liquid biopsy, where we take a blood sample from an individual, extract cell-free DNA (cfDNA) and process it through our assay, generating vast amounts of genomic and epigenomic data. As an example, we generate sequence information for over 200 million fragments of DNA per individual, and from this we extract information such as DNA methylation states and fragment structural characteristics. We then utilize proprietary AI models to uncover patterns within these data that tell us whether the individual has signals that may suggest the presence of cancer. Importantly, the AI can process millions of data points in a fraction of a second, enabling us to sift through all 200 million fragments to identify potentially tumor-derived signals.
Cancer is an inherently heterogeneous disease, with molecular patterns that can vary significantly across individuals. These differences may arise from factors such as cancer type, stage, and location, as well as inherited genetics and environmental exposures. Furthermore, a range of factors not related to cancer can also impact molecular signals captured in the data, for example age, overall health status or variability in how the sample was processed in the lab. These are known as biological or technical confounders. Our AI models are trained using large cohorts of diverse samples that encompass biological heterogeneity as well as confounders. AI has the ability to learn how these different variables impact signal, generating robust models that can discriminate cancer from non-cancer, determine a tumor’s tissue of origin and highlight key biological features that may be linked to cancer progression or subtype.
To achieve this, we apply a suite of complementary machine learning approaches, each playing a distinct role in how we uncover and interpret cancer-related signals:
- Supervised learning enables our models to learn from labeled examples by recognizing patterns that consistently appear in known cases, such as distinguishing cancer from non-cancer or predicting tissue of origin.
- Unsupervised learning helps reveal hidden structure in the data, such as natural clusters in molecular profiles that reflect shared biological traits. It can also identify technical variations in data, ensuring our models stay focused on true biological signals.
- Deep learning is especially effective at uncovering complex patterns in biological data, such as sequencing and methylation signals. We use a variety of advanced model architectures, including convolutional neural networks, transformers, autoencoders and diffusion-based models. Each brings a unique strength: some are well-suited to detecting high-resolution, localized features; others excel at identifying broader patterns, capturing long-range dependencies, or modeling realistic biological variation.
- Transfer learning allows us to adapt models trained on rich, high-signal data sources (like tumor tissue) to perform well on more challenging inputs (like blood-based cfDNA), which are more applicable in clinical settings.
These approaches are inherently flexible and data-agnostic, allowing us to apply them across a wide range of biological inputs and to continuously evolve our models as new data types and scientific insights emerge.
Harbinger’s Biologically Informed Approach
At Harbinger Health, our approach to early cancer detection begins with biology. We design our AI models around the earliest molecular events that mark the transition from healthy to malignant cells, many of which occur before a tumor is clinically detectable. Research has shown that diverse cancer types often share early disruptions in gene regulation and DNA methylation. These shared biological programs serve as a foundation for our platform, restricting the attention of our AI models to loci with high biological importance, rather than scanning the genome indiscriminately.
This biology-first strategy is especially important in the context of liquid biopsy, where we analyze cfDNA circulating in the bloodstream. In early-stage disease, tumor-derived DNA is present at very low concentrations, making it difficult to detect. By rooting our models in biologically validated signals, we increase the likelihood of finding meaningful patterns even when the tumor signal is faint.
From scarcity to scale by simulating biology
To generate highly sensitive, complex AI models that use advanced machine learning architectures, there is a need for large datasets of unbiased, representative samples for the model to train on. Therefore, a core challenge in building early detection models is the scarcity of representative real-world data particularly for rare cancers, early-stage disease, and underserved populations. To address this, we’ve developed a proprietary framework for generating synthetic data: simulated DNA methylation and fragmentomic profiles that reflect both biological differences and technical variability. These datasets allow us to model a wide range of real-world conditions, including varying levels of circulating tumor DNA, different cancer types and stages, demographic factors such as age and sex, genetic backgrounds, and differences that can arise from how samples are collected or processed. Incorporating this diversity into model training improves not only our ability to detect cancer, but importantly our model’s ability to generalize to unseen future populations and ensures strong performance across the wide range of real-world conditions where early detection is most critical.
Ethical, Transparent, and Tuned for Equity
AI systems are only as robust and reliable as the data they learn from. To ensure our models generalize well, we evaluate them across a wide spectrum of real-world conditions, using both real and synthetic data to reflect the diversity of patient populations and technical variables encountered in practice.
Furthermore, we have built tools to systematically assess the integrity of our AI models. These tools allow us to compare different architectures and identify sources of bias and confounding, ensuring that our models are learning biologically meaningful patterns and producing reliable predictions.
To address concerns around AI interpretability, we also extract learned features and evaluate their biological relevance. This step is essential for validating model behavior and enhancing transparency in clinical applications.
Combined, these strategies help us build models that are both scientifically rigorous and clinically generalizable across diverse demographic populations.
Equity is a core principle, embedded into every stage of development. By prioritizing diversity in data and fairness in evaluation, we aim to deliver early detection tools that work well for everyone.
Looking Ahead
We’re at the beginning of a new era in cancer care. AI is making it possible to move from late-stage diagnosis to early detection, potentially unlocking better outcomes, less invasive treatments, and more lives saved.
At Harbinger Health, we envision a future where multi-cancer detection is a routine part of preventive care. A future where a simple blood draw can screen for many cancers at once. A future where doctors can identify not just whether cancer is present, but where it started and how best to treat it.
To support these goals, we are actively building novel and proprietary state-of-the-art discovery platforms and AI models that go beyond cancer detection and generate additional clinically actionable outputs, for example tumor subtyping, identifying tumor-expressed targets, predicting tumor burden or the likelihood of metastasis. We aim to deliver an informative package that aids clinical decision making and have already demonstrated advances in these areas.
As we continue to refine our platform and gather more data through clinical studies, our AI models will become even more accurate, adaptive, and personalized. The technology continuously learns and evolves as each new sample and patient outcome contributes to a smarter and more effective system.
But innovation alone is not enough. Our mission is to ensure these tools are broadly accessible, affordable, and equitable, so they can make a meaningful impact for all.
—
Stay in the loop on all of the exciting news, innovations, and developments from Harbinger Health. Sign up for updates in our e-newsletter.