Deep Learning Approaches to SNP-Gene Mapping: Leveraging Autoencoders for Insight into Periodontitis Genetics
DOI:
https://doi.org/10.70135/seejph.vi.5975Abstract
Background: Genetic studies reveal the complexity of human diseases, particularly polygenic disorders like periodontitis. Autoencoders are used to study single-nucleotide polymorphisms (SNPs) using deep learning to identify latent variables and model complex relationships. This helps identify biomarkers for early diagnosis, disease severity indicators, and therapeutic targets. However, challenges remain, such as large datasets and the interpretability of deep learning models. Developing approaches to enhance interpretability is crucial for translating genomic findings into clinically relevant insights, such as precision medicine and personalized treatment strategies. This study employs a Graph Autoencoder (GAE) to analyze SNP data, focusing on the relationships between SNPs and their associated genes in periodontitis.
Methods: The GAE model was trained to learn embeddings that capture the underlying graph structure, enabling predictions and insights into the genetic network. The dataset containing single nucleotide polymorphisms (SNPs) and gene information for periodontitis was merged from two sources, using database management tools like SQL or pandas in Python. Node features were created by encoding categorical variables (SNP_ID and Associated_Gene) and scaling them using StandardScaler. Edges were constructed to represent bidirectional connections between SNPs and their associated genes. The Graph Autoencoder is a two-layer GCN encoder that captures graph structure and reconstructs the adjacency matrix, using binary cross-entropy loss for parameter updates, PyTorch Geometric for efficient data handling, and adjustable learning rate.
Results: The model achieved a final reconstruction loss of 0.8047 after 100 epochs, indicating effective graph structure learning. Its high recall but low precision suggest over-prediction of connections. The learned embeddings reveal a clear clustering of SNPs, explaining 98.2% of the variance.
Conclusion: While the Graph Autoencoder demonstrates high recall, it requires further optimization for precision. It provides valuable insights into SNP-gene associations and disease mechanisms requiring further study.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Asok Mathew , Pradeep Kumar Yadalam , Subasree S, Jayaraj Kodangattil Narayanan

This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License.
