Unique Insights From ClinicalTrials.gov By Mining Protein Mutations and RSids In Addition To Applying The Human Phenotype Ontology

Developed by Shray Alag, The Harker School, San Jose, CA

Publication       Check Out A Report

Publications

Alag S (May, 2020). Unique insights from ClinicalTrials.gov by mining protein mutations and RSids in addition to applying the Human Phenotype Ontology PLOS ONE 15(5): e0233438. https://doi.org/10.1371/journal.pone.0233438 pmid:32459809 PLOS ONE PubMed/NCBI Google Scholar

PLOS ONE

Alag S (September, 2020). Analysis of COVID-19 clinical trials: A data-driven, ontology-based, and natural language processing approach PLOS ONE 15(9): e0239694. https://doi.org/10.1371/journal.pone.0239694 pmid:32997699 PLOS ONE PubMed/NCBI Google Scholar

PLOS ONE

CovidResearchTrials.com: Analysis of COVID-19 Clinical Trials: A Data-Driven, Ontology-Based, and Natural Language Processing Approach http://CovidResearchTrials.com/ SNPMinerTrials

CovidResearchTrials
362,558

Total number of clincal trials.

588

Number of unique RSids

15,803

HPO nodes used in analysis

1,077

Number of unique MutationFinder mutations

Reports

Data processed on January 01, 2021.

RSids: An HTML report was created for each of the 588 unique RSids. Each report contains a list of the clinical trials in which the SNP appears, along with the sentences containing the SNP. Each clinical trial report also shows the mapped HPO as well as MeSH terms. The HPO terms and their associated genes are also displayed. Similarly, an HTML report was generated for each of the 394 unique clinical trials that mentioned SNPs.

Protein MutationsHTML reports were created for each of the 1,077 unique protein mutations. All 1,077 protein mutations are displayed on the left-hand side of the report to enable easy navigation. Similarly, reports for each of the clinical trials which reference a protein mutation are also available.

SNPs in Clinical Trials   Clinical Trials with SNPs   Protein mutations in Clinical Trials   Clinical Trials with protein mutations  

SNPs in Clinical Trials

Report for each SNP

Clinical Trials with SNPs

Report for each clinical trial with a SNP

Protein mutations in Clinical Trials

Report for each protein mutation

Report for each clinical trial with protein mutation

Each clinical trial that mentions a protein mutation

API

JAVA APIs to access the data. Each Java class is a stand-alone program and does not require any other package beyond the Java core classes: Users can simply download a Java IDE, install Java, and run the class on that IDE. Google Colab Notebook provides an example of Python code to access the data.

Figures and Tables

Key insights. Please refer to Alag May, 2020 for details.

  • All
  • Figure
  • Table
  • Reference

Pipeline

Steps required to generate the results.

Figure 2

Bubble graph showing the key MeSH nodes used to tag clinical trials with protein mutations.

Figure 3

Common MeSH terms for clinical trials with RSid and protein mutation frequencies.

Figure 4

Frequency of different HPO terms across clinical trials, across trials with RSids, and across trials with protein mutations.

Figure 5

Percentage of clinical trials in each of the eleven categories with RSids and protein mutations.

Table 1

Software libraries

Table 2

Most frequent RSids across ClinicalTrials.gov.

Table 3

HPO Terms with the most number of associated RSids.

Table 4

Most frequent mutations across ClinicalTrials.gov.

Table 5

HPO Terms with the most cited protein mutations found by MutationsFinder in ClinicalTrials.gov.

Table 6

HPO Terms with the most number of associated mutations.

Table 7

Intervention types for clinical trials with mutations.

Table 8

Related HPO terms using co-occurrences of RSids and HPO terms.

Amino Acids

Reference for amino acids

Data from January 01, 2021

Details of data processed

  • Number of clincal trials: 362,558
  • HPO: nodes: 15,803, Parent-child hierarchy: 19,678, Phenotype to gene: 944,915, Unique genes: 4,503

Java SDK

The Java API for accessing HPO and MeSH terms with clinical trials.

RSids

Details of RSids across ClinicalTrials.gov:

Item Number
1 Number of unique RSids 588
2 Number of clinical trials with Rsids 394
3 Number of RSids references 843

Java SDK

Google Colab Notebook

Protein Mutation

Protein mutations found by MutationFinder across ClinicalTrials.gov:

Item Number
1 Number of unique MutationFinder mutations 1077
2 Number of clinical trials with mutation 2,147

Java SDK

Google Colab Notebook

Data from November 07, 2020

Details of data processed

  • Number of clincal trials: 357,017
  • HPO: nodes: 15,656, Parent-child hierarchy: 19,523, Phenotype to gene: 919,672, Unique genes: 4,484

Java SDK

The Java API for accessing HPO and MeSH terms with clinical trials.

RSids

Details of RSids across ClinicalTrials.gov:

Item Number
1 Number of unique RSids 588
2 Number of clinical trials with Rsids 377
3 Number of RSids references 814

Java SDK

Google Colab Notebook

Protein Mutation

Protein mutations found by MutationFinder across ClinicalTrials.gov:

Item Number
1 Number of unique MutationFinder mutations 1043
2 Number of clinical trials with mutation 2,056

Java SDK

Google Colab Notebook

Data from May 23, 2020

Details of data processed

  • Number of clincal trials: 340,614
  • HPO: nodes: 15,229, Parent-child hierarchy: 18,949, Phenotype to gene: 839,551, Unique genes: 4,315

Java SDK

The Java API for accessing HPO and MeSH terms with clinical trials.

RSids

Details of RSids across ClinicalTrials.gov:

Item Number
1 Number of unique RSids 577
2 Number of clinical trials with Rsids 377
3 Number of RSids references 814

Java SDK

Google Colab Notebook

Protein Mutation

Protein mutations found by MutationFinder across ClinicalTrials.gov:

Item Number
1 Number of unique MutationFinder mutations 966
2 Number of clinical trials with mutation 1,975
3 Number of mutation references 3,923

Java SDK

Google Colab Notebook

Data from March 2020

Details of data processed

  • Number of clincal trials: 332,418
  • HPO: nodes: 14,961, Parent-child hierarchy: 18,547, Phenotype to gene: 820,297, Unique genes: 4,312

RSids

Details of RSids across ClinicalTrials.gov:

Item Number
1 Number of unique RSids 566
2 Number of clinical trials with Rsids 368
3 Number of RSids references 798

Protein Mutation

Protein mutations found by MutationFinder across ClinicalTrials.gov:

Item Number
1 Number of unique MutationFinder mutations 962
2 Number of clinical trials with mutation 1,939
3 Number of mutation references 3,881

Data from August 2019

Details of data processed

  • Number of clincal trials: 310,880
  • HPO: nodes: 14,708, Parent-child hierarchy: 18,312, Phenotype to gene: 521,507, Unique genes: 4,073

RSids

Details of RSids across ClinicalTrials.gov:

Item Number
1 Number of unique RSids 530
2 Number of clinical trials with Rsids 340
3 Number of RSids references 739

Protein Mutation

Protein mutations found by MutationFinder across ClinicalTrials.gov:

Item Number
1 Number of unique MutationFinder mutations 903
2 Number of clinical trials with mutation 1,793
3 Number of mutation references 3,568