What Is Small-Molecule Hit Identification?

Hit identification, also known as hit finding or hit discovery, is the process of discovering small drug-like molecules that can potentially bind to a specific biological target. This is a critical early step in the drug discovery process and is the bread and butter of computational and medicinal chemistry. By identifying hits, researchers can determine which small-molecule starting points are worth improving in lead optimization to fine-tune their design for the clinic.

What Methods Are Used for Hit Identification?

There are several different methodologies for effective identification, including:

  • High-throughput screening (HTS): This method involves testing large libraries of compounds against a target using automated techniques. It can be conducted in biochemical assays or cell-based assays to identify compounds that act as inhibitors or modulators of the target.
  • Virtual screening: This computational technique encompasses both target-based methods, like in silico molecular docking and simulations, and ligand-based methods, allowing a broader range of virtual screening strategies. It enables researchers to predict which compounds will likely interact with a target protein (such as an enzyme), as well as to identify candidates based on the properties of known active molecules. Methods such as shape screening (e.g., FastROCS from OpenEye or shape similarity in RDKit), 2D similarity (e.g., Tanimoto similarity), and pharmacophore screening are often used due to their speed, complementing the more computationally intensive docking and molecular dynamics simulations. This approach enables the screening of large numbers of compounds quickly and cost-effectively.
  • Fragment-based drug discovery (FBDD): In fragment screening, small libraries of low-molecular-weight compounds (fragments) are tested for their ability to interact with a specific protein. These fragments (often containing pharmacophores) are starting points for designing more potent compounds by expanding or linking them.
  • Target-Directed Screening: This approach focuses on identifying hit series based on knowledge of the target's structure and function, often using structure-based design techniques.
  • Phenotypic Screening: This involves screening for desired changes in phenotype rather than targeting a specific protein. This approach can uncover novel mechanisms of action by observing the effects on cells or organisms.

Each method has its strengths and is often chosen based on the nature of the target and the resources available. These hit-discovery approaches can be used individually or in combination to maximize the chances of identifying promising hits for further development into drug candidates.

How Does AI Contribute to Hit Identification?

Drug discovery is a long, costly process with high failure rates. Early preclinical stages, such as the hit-to-lead process, are critical in directing drug development. Recent AI advancements have shown the potential to streamline early drug discovery processes by improving efficiency, precision, and innovation. 

Here are some examples of how AI is being used in hit identification

Data Processing and Analysis

AI can manage and analyze large biological datasets, like genomics or proteomics, enabling efficient data organization and retrieval and providing essential disease insights. 

For example, Transcripta Bio uses AI to analyze gene expression data across 200 million experiments, building their Drug-Gene Atlas to help discover new therapeutic candidates. By leveraging AI and machine learning models, they accelerate the analysis of transcriptomic data, which is crucial in understanding drug responses and discovering new drugs that modulate gene expression. Their AI modeling suite can virtually screen billions of compounds for therapeutic benefits, focusing on gene modulation with high speed and accuracy.

Molecule Generation and Biophysical Property Prediction

AI enhances drug discovery through de novo drug design and property prediction. Generative models like NVIDIA MolMIM create new molecules from scratch, while other, predictive models assess these molecules' biological activity, toxicity, metabolism, and target binding affinity. This dual approach enables the generation and rapid optimization of drug candidates, improving efficacy and safety while streamlining drug development.

Atomwise, a member of the NVIDIA Inception program for startups, is a pioneering company in AI-driven drug discovery. Their AtomNet platform uses deep learning to predict the binding of small molecules to protein targets, effectively performing bioactivity prediction and toxicity estimation. Atomwise has partnered with several major pharmaceutical companies, demonstrating the industry's confidence in their AI-driven approach to predictive modeling.

Virtual Screening and Docking

AI accelerates Virtual Screening by evaluating molecules for similarity to known actives (ligand-based) or predicting how well compounds bind to targets (structure-based). Binding affinity prediction further ranks promising candidates based on interaction strength.

Accenture is tailoring the generative AI-powered NVIDIA Blueprints for drug discovery in collaboration with pharmaceutical partners. This approach enhances molecular generation steps by incorporating industry-specific requirements, optimizing binding affinity and pharmacokinetic properties like absorption, distribution, metabolism, and excretion (ADME). 

Target Identification, Target Validation, and Structure Prediction

AI helps identify potential drug targets through omics data analysis and predicts protein structures, aiding structure-based drug design. 

For example, one of the most transformative advancements in protein structure prediction, AlphaFold2, has revolutionized how proteins are understood, reaching atomic-level accuracy for many proteins. AlphaFold’s predictions are now used extensively to accelerate drug discovery by providing insights into protein interactions with other molecules. The AlphaFold2 database, developed in collaboration with EMBL-EBI, offers free access to predicted structures of nearly all known proteins, contributing to faster target identification for diseases such as cancer and Alzheimer’s.

Molecular Dynamics (MD) Simulations
AI in molecular dynamics simulations enhances the prediction of molecular movements and interactions by efficiently analyzing large datasets. Complementary to wet-lab synthesis and experimentation, MD can provide accurate insights into complex systems like protein folding, drug binding, and material behavior under various conditions. 

Iambic Therapeutics, a startup dedicated to AI-driven drug discovery, incorporates quantum mechanics into MD simulations to enhance drug discovery efforts targeting cancer proteins. Their method is a hybrid of deep learning and physics-based simulations, which significantly reduces the data requirements while increasing the precision of predictions for small molecules and their binding interactions.

Image Analysis and Phenotypic Screening

AI processes imaging data in high-content screening, detecting phenotypic changes caused by drug treatment and histopathology analysis, and assesses disease progression or treatment responses from tissue samples.

Recursion, a clinical-stage biotechnology company, leverages AI-driven high-content imaging to explore phenotypic changes at a massive scale. Their platform integrates AI and machine learning to analyze millions of cellular images, identifying phenotypic changes in response to drug compounds. Their goal is to map all possible cellular behaviors, creating a detailed dataset that speeds up the discovery of new drugs.

Synthetic Route Prediction

AI is used to identify efficient pathways for creating molecules and predict reaction outcomes to forecast yields and the feasibility of chemical reactions.

The open-source tool AiZynthFinder is widely used for retrosynthetic planning. It uses Monte Carlo Tree Search (MCTS) and neural networks to break down molecules into purchasable precursors and predict efficient synthetic routes.

How Accurate Are AI Predictions in Hit Identification?

AI models are becoming increasingly accurate at predicting small-molecule hits in virtual screening, often surpassing human speed and scalability, but they come with certain limitations.

Key Factors for AI's Accuracy

  1. Speed and scale: AI models can screen millions of compounds far faster than human experts. Models like Atomwise's AtomNet can process vast chemical spaces and find novel bioactive molecules at a scale unattainable for human researchers.
  2. Pattern recognition: AI excels at recognizing subtle patterns in chemical structures and biological data, enabling it to predict binding affinities and bioactivity more consistently than humans. Tools like DeepChem and NVIDIA BioNeMo™ help scientists leverage large datasets to make predictions across diverse targets, sometimes identifying hits that might not be intuitive to experts.

AI is increasingly viewed as a collaborative tool for researchers and drug discovery experts, particularly in hit identification. By automating repetitive and computationally intensive tasks, AI accelerates the early stages of drug discovery while enhancing human decision-making, rather than replacing it.

What Are the Limitations of AI in Hit Identification?

There are several limitations to consider, including:

  • Data quality and availability: AI models rely heavily on the data used for their training. The predictions can be unreliable if the training data is incomplete, biased, or of poor quality. 
  • False positives and negatives: AI models can sometimes generate false positives, predicting activity for compounds that ultimately fail during assay development and biological testing. Similarly, they may miss true hits (false negatives), particularly if a target's chemical or biological features are underrepresented in the training data.
  • Limited generalization for novel targets: Many AI models need help with generalizing to novel targets, especially when no similar proteins or chemical scaffolds exist in the training set. This issue is particularly prevalent in drug discovery, where innovation often targets previously unexplored biological systems.
  • Interpretability: Some AI models, especially deep learning approaches, may be considered black boxes. Their predictions can be difficult to interpret, which may reduce trust in AI-driven results when medicinal chemists need to understand the reasoning behind a suggested hit.
  • Integration with experimental validation: AI can significantly reduce the number of compounds that need experimental testing. However, wet-lab testing, such as x-ray crystallography or nuclear magnetic resonance (NMR)-based determination of target protein structures, remains crucial for target validation. AI predictions may guide hit identification but cannot replace experimental testing entirely, as biological systems are highly complex and not fully captured by computational models.

Next Steps

What is AlphaFold2?

Understand how AlphaFold2 is used in drug discovery.

AI for Drug Discovery

Learn more about the role of AI in drug discovery research with BioNeMo.

AI for Virtual Screening

Learn about NVIDIA’s AI-based virtual screening software.

Try Out NVIDIA Blueprints

Test the NVIDIA Blueprint for generative virtual screening.