AAV Genome Integrity: Why It Matters and How It Is Evaluated in Modern Gene Therapy
Adeno-associated virus, or AAV, remains one of the most important delivery platforms in gene therapy, but its success depends on more than vector titer and capsid identity. One increasingly important quality attribute is genome integrity: whether the packaged vector genomes are complete, correctly structured, and representative of the intended therapeutic cassette. Recent reviews now treat genome integrity as a growing critical quality attribute (CQA) because heterogeneous or disrupted packaged genomes can affect potency, consistency, and safety.
What “AAV genome integrity” actually means
In the rAAV context, genome integrity refers to the extent to which encapsidated vector genomes match the intended construct from ITR to ITR, without unwanted truncations, rearrangements, chimeric sequences, or mis-packaged non-vector DNA. This matters because AAV preparations are rarely composed only of perfectly full particles carrying a single uniform genome species. Modern analytical and sequencing studies show that rAAV lots can contain a mixture of full-length genomes, partial genomes, rearranged genomes, and other heterogeneous packaged species.
It is important to separate genome integrity from related but distinct measurements. Vector genome titer asks how many genomes are present. Empty/full capsid analysis asks how many capsids contain nucleic acid. Genome integrity asks whether the nucleic acid inside the capsid is the right size and sequence architecture. A sample can have an acceptable titer and still contain a substantial fraction of incomplete or abnormal genomes.
Why genome integrity matters
Genome integrity matters first because it affects biologic activity. If a meaningful fraction of capsids package truncated or rearranged genomes, the number of functionally useful particles may be lower than the nominal genome titer suggests. That creates a disconnect between measured dose and effective dose. Reviews of AAV analytical characterization now explicitly frame genome integrity as relevant to both efficacy and safety.
Second, genome integrity matters because it is closely linked to manufacturing consistency. Packaged-genome heterogeneity can arise from vector design, ITR quality, sequence context, oversized cassettes, and production-process variables. PackGene’s recent CMC article correctly reflects this broader industry view: genome size, ITR integrity, regulatory elements, and cassette architecture can all influence packaging efficiency, heterogeneity, and consistency.
Third, genome integrity has safety relevance. Long-read sequencing studies have shown that rAAV lots may contain contaminating packaged DNA species, including plasmid-derived or host-related sequences, and recent in vivo work has renewed attention to the biological consequences of disrupted or contaminating vector genomes after administration. A 2026 Nature Medicine study reported contaminating plasmid sequences and disrupted vector genomes in liver following AAV gene therapy, underscoring that genome composition is not only a manufacturing concern but also a translational one.
Why AAV genomes become heterogeneous
AAV genome heterogeneity is not a niche artifact; it is a well-documented feature of the platform. One major driver is cargo size. The practical packaging limit of AAV is roughly around the classic ~4.7 kb range, and when cassette size pushes beyond that limit, packaged genomes increasingly become heterogeneous and often truncated. This principle has been recognized for years and remains one of the clearest design rules in AAV vectorology.
But oversized genomes are not the only cause. Primary sequencing studies have shown that cassette design itself can shape the population of packaged genomes. A widely cited SMRT-sequencing study demonstrated full-vector-genome resolution and revealed heterogeneous packaged species, including human-vector chimeras, while later AAV-genome population sequencing studies showed that vectors carrying CRISPR components can display design-influenced heterogeneity. High-quality reviews now cite these studies as evidence that heterogeneity can arise from sequence architecture, not only from overlength genomes.
More recent data also suggest that capsid and production system can influence packaged-genome composition. For example, a 2024 study reported that the AAV2.7m8 capsid packaged a higher degree of genome heterogeneity than AAV2 in the tested context, and other work cited in major reviews has described differences in genome heterogeneity between human-cell- and insect-cell-produced rAAV. These findings do not mean one platform is universally better, but they do reinforce that genome integrity must be measured rather than assumed.
How genome integrity is evaluated
No single assay captures every aspect of AAV genome integrity, which is why modern characterization strategies increasingly combine size-based analysis with sequence-level analysis.
A traditional approach has been denaturing agarose gel electrophoresis and, in some settings, Southern blotting. These methods helped establish the field’s early understanding of packaged-genome size distributions, but they are relatively labor-intensive and have limited resolution compared with newer platforms. Analytical reviews from recent years describe capillary-based methods and long-read sequencing as important advances over these legacy workflows.
Capillary gel electrophoresis and related CE methods
Capillary gel electrophoresis, often discussed more broadly under CE-based workflows, has emerged as a practical method for assessing released AAV nucleic acids by size. Reviews note that CGE can separate genome size variants and, in some settings, resolve full-length from truncated species with better automation and resolution than slab-gel methods. This makes CE particularly useful as a relatively fast, lot-comparison-friendly assay for monitoring heterogeneity.
Still, CE has limits. It is primarily a size-distribution method, not a complete sequence-identity method. It can show that a lot contains species of unexpected sizes, but by itself it usually cannot determine exactly what those abnormal species are. For that reason, CE is best viewed as a powerful screening and comparability tool that often benefits from orthogonal follow-up by sequencing.
Long-read sequencing
Long-read sequencing has become one of the most informative approaches for genome-integrity assessment because it can evaluate packaged genomes as individual molecules rather than as bulk averages. Primary studies using SMRT sequencing and later Nanopore-based ITR-to-ITR workflows showed that direct read-level analysis can reveal full-length genomes, truncations, rearrangements, and contaminating species that are difficult to resolve by size-based assays alone.
This is especially valuable in rAAV because problematic species are not always simple “short fragments.” They may include rearranged constructs, chimeras, and design-dependent off-target packaged sequences. Long-read methods therefore provide the kind of molecular detail needed to understand why a lot looks heterogeneous, not just whether it does.
That said, long-read sequencing also does not replace every other assay. It requires thoughtful sample preparation, data interpretation, and method controls, especially around ITRs and low-frequency species. In practice, the most defensible strategy is often a combined workflow in which CE or another size-based assay is used for routine lot comparison and long-read sequencing is used for deeper structural resolution. This layered approach is consistent with how recent reviews describe modern AAV characterization.
What genome-integrity data can and cannot tell you
Genome-integrity testing is extremely informative, but it is important not to overstate it. These assays can identify heterogeneity, truncation patterns, and abnormal packaged species. They can support process development, comparability, design optimization, and risk assessment. But they do not, by themselves, fully predict in vivo efficacy or safety. Those outcomes still depend on dose, tissue exposure, immunogenicity, capsid biology, route of administration, and other product attributes. High-quality reviews of AAV analytical testing consistently present genome integrity as one important CQA within a broader characterization framework, not as a standalone release surrogate for clinical performance.
Practical implications for vector design and CMC development
From a development perspective, the clearest lesson is that genome integrity should be addressed upstream, not only tested downstream. If a cassette is too close to or beyond AAV’s practical packaging limit, if ITRs are unstable, or if vector architecture promotes abnormal resolution or recombination, the final product may be heterogeneous before purification even begins. That is why recent CMC thinking treats vector design itself as a manufacturing variable.
This has direct consequences for program strategy. During research and preclinical development, genome-integrity testing can help teams compare construct variants, troubleshoot poor potency, and detect design-linked packaging problems early. As programs advance, it becomes increasingly valuable for process characterization, lot comparability, and building a more coherent analytical story for regulators. The growing emphasis on genome integrity in analytical reviews reflects this broader shift from “nice-to-have” characterization toward more integrated product understanding.
The bottom line
AAV genome integrity is not just a technical assay endpoint. It is a window into how well a vector was designed, how cleanly it was packaged, and how faithfully the final product represents the intended therapeutic construct. Primary studies and recent reviews now make it clear that heterogeneous packaged genomes are common enough to deserve routine attention, especially when cassette size, sequence architecture, or process changes may stress the system. The most reliable characterization strategies therefore combine orthogonal tools: size-based methods such as CE for efficient lot assessment, and long-read sequencing for high-resolution structural insight.
At PackGene, we support AAV programs across research, preclinical, and translational stages with integrated capabilities in AAV vector design, packaging and production, analytical testing, and CMC support. PackGene’s current AAV analytical portfolio includes AAV genome integrity analysis by CE, AAV genome sequencing by Nanopore, ddPCR-based genome titer assays, empty/full characterization, and broader analytical workflows designed to help teams evaluate vector quality from early development through GMP-oriented programs. For groups working to understand or improve genome integrity, that combination of size-based and sequence-level characterization can provide a more complete picture of product quality and manufacturability.
About PackGene
PackGene Biotech is a world-leading CRO and CDMO, excelling in AAV vectors, mRNA, plasmid DNA, and lentiviral vector solutions. Our comprehensive offerings span from vector design and construction to AAV, lentivirus, and mRNA services. With a sharp focus on early-stage drug discovery, preclinical development, and cell and gene therapy trials, we deliver cost-effective, dependable, and scalable production solutions. Leveraging our groundbreaking π-alpha 293 AAV high-yield platform, we amplify AAV production by up to 10-fold, yielding up to 1e+17vg per batch to meet diverse commercial and clinical project needs. Moreover, our tailored mRNA and LNP products and services cater to every stage of drug and vaccine development, from research to GMP production, providing a seamless, end-to-end solution.