The findings of tissue-of-origin and molecular categorization studies are becoming more important in the development of cancer therapies. In the clinic, up to 5% of the cancer primary site is unclassified, which is concerning (CUP). It is critical for doctors to identify sensitive individuals and decide the appropriate therapy for them. The most common treatment approach is empirical chemotherapy, which has a reduced overall survival rate. As a result, determining the cancer tissue of origin is a problem that must be resolved as soon as possible. There is a critical need to identify the precise genetic events that lead to cancer development, since these events are often accompanied with cell proliferation and uncontrolled metabolic alterations. However, in the age of massive biomedical data, relying only on experimental methods will not give a comprehensive picture of genetic characteristics. Despite the fact that a number of computer techniques have been developed in this field, the accuracy is often inadequate for clinical applications.
Cancer treatment policies may be optimized via the use of molecular categorization systems. With the accumulation of data, particularly more and more single-cell sequencing data, the molecular categorization of different cancer kinds will be enhanced, which will benefit patients. Better biomarkers will be created in the future, which will lead to more efficient therapies and the development of new medicines. Several research papers and reviews were collected for this Research Topic, which included not only computational techniques for inferring the origins of cancer and molecular categorization, but also translational studies for cancer therapy in hospital settings. In this collection of articles, the authors shed light on the development of cancer treatments, with a particular emphasis on the most cutting-edge computational applications in the field of cancer diagnostics.
A total of 19 articles were published, including 18 research papers and one regular review. The articles demonstrate the application of computational methods to determine cancer Tissue-of-Origin and molecular classification in a variety of cancer types, including but not limited to hepatocellular carcinoma (HCC), pancreatic cancer (PC), ovarian cancer (OC), glioma, gastric cancer (GC), circulating tumor cells (CTCs), and circulating tumor cells (CTCs) (EC). Seven research papers offer various distinct techniques for capturing gene signatures (models) for comparable objectives, all of which are described in the same article. For the first time, Li et al. used the limma R package to identify the top 5,000 significant differentially expressed genes (DGEs) in human cancer (HC). They were divided into nine modules after the completion of a weighted correlation network analysis of the DEGs (WGCNA). In the next step, six genes were screened using univariate, LASSO, and multivariate Cox regression analysis, and they were shown to be a significant independent prognostic predictor in a survival study (Li et al.). Zhang et al. published a paper whose goal was to create a stemness index-based gene signature for lower-grade glioma. The majority of the bioinformatic methods used in this research were applied in that publication (LGG). The same study group also created an immune-related signature for prognosis prediction and risk stratification in LGG using data from The Cancer Genome Atlas (TCGA), Genome Tissue Expression (GTEx), and the Chinese Glioma Genome Atlas (CGGA) (CGGA). Ding et al. conducted a research that was comparable to this one in CC and EC. It is important to note that they verified the gene signature using a variety of techniques, including enrichment studies via the Go, KEGG, and GSEA pathways, Kaplan-Meier survival curves, receiver operating characteristic curves, and immune cell infiltration, among others. Furthermore, Pan et al. showed that gene methylation may be used to categorize gliomas into distinct subtypes using signatures. Detecting methylation characteristics associated with glioma subtypes was accomplished via the application of sophisticated computational techniques such as Monte Carlo feature selection (MCFS), incremental feature selection (IFS), and support machine vector (SVM). Hou et al. conducted a back-to-back research in which they demonstrated the roles and processes of N6-methyladenosine (m6A) alteration in the development of Parkinson’s disease (PC). LASSO regression was used to identify a six-m6A-regulator-signature that was associated with overall survival (OS). Moreover, by integrating transcriptome and genomic data for high-grade OC, Kieffer and colleagues were able to develop gene signatures for the disease.
Three research papers, in particular, provide insight into the use of machine learning in gene feature captivation. Liu et al. used DNA somatic mutation data to extract genetic characteristics using the random forest method, and they then used the information to build a logistic regression-based classifier. After extracting a matrix of characteristics from the functional 300 genes, the prediction accuracy may reach up to 81 percent when tested using a 10-fold cross-validation procedure. He and colleagues developed a cell identification algorithm based on deep learning to detect CTCs in order to decrease the effort associated with CTC counting and to increase the degree of automation. In their study, the CTCs pictures of 600 in-house patients were examined using Python’s OpenCV scheme for segmentation, which was developed by the researchers. Then, on 1,300 cells for training, convolutional neural networks deep learning networks in machine learning methods were built, and the remaining cells were utilized for testing. The ultimate specificity and sensitivity of identification were 91.3 percent and 90.3 percent, respectively, according to the results. Using the G-Gap dipeptide, Qian and colleagues developed a feature extraction method based on Support Vector Machines (SVM) for cancer lectin prediction using a fusion of G-Gap dipeptide.
Three research papers are devoted to the development of computational methods to scientific problems. Zhu et al. used a prediction model called MiRNA-Disease Association prediction (BHCMDA) based on the Biased Heat Conduction (BHC) algorithm to discover potentially associated miRNAs of diseases by integrating known miRNA-disease associations, the disease semantic similarity, the miRNA functional similarity, and the Gaussian interaction profile kernel similarity to discover potentially associated miRNAs of diseases. Through the integration of protein interaction networks (PINs), protein domains, and gene expression data, Zhao and colleagues developed a new computational method known as the multiplex biological network (MON). It was discovered that the necessary proteins may be detected using the novel method by extending the random walk with a restart procedure to the tensor data. Using histopathological images of patients from the TCGA database, Wu et al. developed a convolutional neural network (CNN) framework called DeepLRHE for predicting lung cancer recurrence after surgical resection. The area under the receiver operating characteristic (ROC) curve (AUC) was 0.79, and the accuracy of the prediction was 80%.
Finally, the systematic review shows in detail that the CLDN18-ARHGAP fusion is a major molecular feature of diffuse GC, and that it is also a substantial independent prognosis risk factor for the condition. Research papers and reviews in this Research Topic draw on cutting-edge sources to investigate the origins and gene signatures of various malignancies, while also evaluating the current computational techniques and serving as a resource for doctors.