In the context of intricate cases with ambiguous prenatal genetic diagnoses, this project intends to carry out long - read DNA sequencing data analysis on birth defect cases and family samples. The emphasis lies on the extraction and identification of individual - specific genomic characteristics, as well as the development of detection algorithms for all categories of structural variations (SV), including complex SV. It will establish a pan - genomic reference map specific to the Chinese population to facilitate the identification of pathogenic SV in birth defect cases and family samples of the Chinese population, and delineate the detailed SV spectrum of major birth defects in the Chinese population. Additionally, the project will conduct in - depth analyses of the genetic and pathogenic roles of different types of SV in birth defects, offering a theoretical foundation for promoting the early warning, intervention, and prevention of major birth defects in China.
Based on the long - read DNA sequencing of cases carried out in Project Topics 1 and 3, this study incorporates case and control samples, along with their long - read sequencing data. The PacBio Revio platform was chosen for long - read DNA sequencing, and standard whole - genome long - read sequencing analysis was performed on 50 cases. Building on the first - phase China population pan - genome reference map that had already been constructed, 50 representative control samples were added for long - read sequencing to construct a new China population pan - genome reference map. The specific details are as follows: 1. Extraction of structural variation (SV) characteristics and encoding of birth defect case genomes, and establishment of an SV detection method based on feature encoding. Sequence imaging was utilized to eliminate background repetitive information from site alignment signals, thereby facilitating SV detection in complex genomic regions. An "image compression" encoding method was explored, using a "stacking" approach to characterize abnormal sequence features between birth defect cases and parental controls or population controls within a single image. Based on whole - genome alignment results, the genomic regions containing abnormal sequences in birth defect cases were identified. A local breakpoint - sensitive realignment method based on collinear segments was established to extract SV fragment features carried in sequencing reads between birth defect cases and parental controls or population controls. The study focused on researching an isomorphic convolutional neural network framework capable of simultaneous target segmentation and classification, achieving both in SV detection. 2. Construction of a China population pan - genome reference map based on long - read DNA sequencing data. Leveraging third - generation whole - genome sequencing technology, DNA samples from multiple ethnic groups in China were sequenced, and the visualization of pan - genome assembly was realized. Ethnic - specific reference genomes were constructed to form a pan - genome graph, integrating DNA sequences from different populations. Genetic variants or sequences with differences were regarded as nodes, and adjacent sequences were connected by edges, thereby identifying core and specific gene sequences in the China population. This endeavor aims to establish a high - quality pan - genome reference map exclusive to the Chinese population, with a focus on studying whole - genome structural variation (SV) maps to support the precise analysis of rare or novel SVs in birth defects. 3. Mapping of the fine SV map of major birth defects in the China population. SV detection is carried out through two approaches: the linear genome approach and the pan - genome approach. The linear genome approach employs conventional linear detection methods to identify genetic variants. The pan - genome approach involves constructing a genome map using de novo assembled genomes, the universal human reference genome (GRCh38), and genetic variant and birth defect case samples discovered in the China population. These two approaches mutually validate and complement each other, integrating the obtained SV results. Thresholds are set based on criteria such as the location of variants and sequence similarity, and redundant results are removed.
Study Type
OBSERVATIONAL
Enrollment
100
The sample DNA was sequenced using long - read DNA sequencing technology.
Complex genomic structural aberrations
The test detected that the subject carried complex genomic structural aberrations.
Time frame: When the test is completed,up to 6 weeks
This platform is for informational purposes only and does not constitute medical advice. Always consult a qualified healthcare professional.