faq Help
bookmark Bookmark..

This Help page provides 'How To' mini-tutorials on how to use the website, reference documents referred to in the Help text and a glossary of key terms used throughout the HGVbase-G2P web-site and descriptions of specific parts of the site.

DATABASE 'HOW TO...' GUIDE

How to Understand the Database Content

HGVbaseG2P is built upon a basal layer of Markers that comprises all known SNPs and other variants from public databases such as dbSNP and the DBGV.

Genetic association significance findings are added on top of the Marker data, and organised the same way that investigations are reported in typical journal manuscripts. Critically, no individual level genotypes or phenotypes are presented in HGVbaseG2P - only group level aggregated (summary level) data.

The largest unit in a data submission is a Study, which can be thought of as being equivalent to one journal article. This may contain one or more Experiments, one or more Sample Panels of test subjects, and one or more Phenotypes. Sample Panels may be characterised in terms of various Phenotypes, and they also may be combined and/or split into Assayed Panels. The Assayed Panels are used as the basis for reporting genetic association findings (in 'Analysis Experiments'). Environmental factors are handled as part of the Sample Panel and Assayed Panel data structures.

How to find Studies of interest

You can find Studies of interest by identifiers, keywords, gene names or chromosomal region. Use the search box on the homepage to look for matches amongst the Studies, Phenotypes or Markers section of the database (example queries are given below the search box located at the top right of every page). The same search functionality is available through the search box located at the top right of every page.

You can also browse all available Studies either from the homepage by clicking on the Studies icon, or from any page by clicking the Studies tab.

To view a small summary of a Study, click on the Study name. To explore a Study in more detail, click on the identifier. On the Study page you will see 5 new tabs, and you can click on these to get to the Study's detailed content: Summary, Panels, Phenotypes, Analysis Experiments and Markers. From the Marker view in the Study you will find links to the detailed association datasets (p-values). The Result Sets containing these Markers can be added to the Browser using the 'Add to Browser' link above the tabs. If there is more than one Result Set, you will be given the option to choose which ones you would like to see on the Browser using the check boxes.

How to find Markers of interest

To find Markers of interest, click on the Markers tab and enter a dbSNP or HGVbaseG2P identifier or identifiers (for multiple markers enter space-separated IDs), an HGNC gene symbol or genomic region in the search box. Click 'Go' to perform the search.

The search can be refined by displaying Markers with association results where p-values are greater than or equal to a specified value, and by specifying that only Markers with association data be displayed.

To further explore the Marker(s) resulting from a search, click on the Marker identifier, which will take you to the Marker report. The report contains a Summary tab and an Association Results tab, where you can see which Result Sets, if any, contain this Marker, and add them to the Browser. The Association Results can also be reached directly from the Marker search results.

Example Marker searches:

  • dbSNP identifier rs699
  • HGVbaseG2P identifier HGVM13863803
  • HGNC gene symbol BRCA1
  • Genomic region chr19:232346..453453 or 8p22

How to Submit Data

All data submitted to HGVbaseG2P will remain the property of the data generators and/or submitters, and all records will be presented to database with links and acknowledgements leading back to the original data source. Any users who might wish to obtain non-aggregated data will be instructed to make suitable requests to the relevant submitter and their data access authorities.

Submissions can be submitted with embargo dates or conditions attached. We will still immediately process such datasets to ensure the submission is complete and useable, but we will not release the submitted data to the public until instructed to do so.

When submitting genetic association data and/or allele/genotype frequency data to HGVbaseG2P, we require that the utilised Markers are all present in a major public marker/variation database (e.g., dbSNP). If this is not the case, we can assist you in depositing the Markers into a suitable database.

To submit genetic association and/or allele/genotype frequency data, please gather together the required information as specified in the Submission Guidance Notes and paste it into the Submission Data Template form for submission. Each submission will equate to one Study in HGVbaseG2P, but each Study (i.e., each submission) can include one or more Experiments.

Note: for the future, we are devising a standalone software tool that submitters will be able to download and install locally, which will actively guide them through the process of gathering and checking their data before submitting it. The tool will organize submission content into an XML formatted document that is stored on the submitter's hard disk, gather related information from sites across the internet (e.g., journal citation details, Marker Ids, and Allele specifications), and check for any inconsistencies in the total submission. This will make it simpler for users to assemble and check their submissions with care at their own pace, with the added benefit that they will be able to reuse components (e.g., assay details, clinical materials, and phenotype descriptions) from earlier submissions.

Questions on making submissions should be directed to: submissions@hgvbaseg2p.org

USING THE BROWSER

Before using the Browser, please ensure that you have added Result Sets of interest, using the search methods detailed above. You may add a maximum of 16 Result Sets to a Browser session.

Genome view

  • The bar chart shows the number of markers that pass a significance threshold in 3Mb windows across the entire genome. For multiple Studies, stacked plots of these counts are generated.
  • The threshold can be changed in the expandable Settings panel above the bar chart. This panel also includes other settings to customise the display.
  • The Selected Studies tab shows the Result Sets that have been added to the Browser, listed by Study.
  • The Top Markers tab shows a customisable number of Markers from each Result Set, ranked by significance.
  • To show a region in more detail, click on the area of interest represented on the histogram.

Region view

  • Unlike the Genome View, the Region View provides both high- and low-resolution view of marker significance data:

1. The low-resolution view shows the number of markers that pass a significance threshold in 1Mb windows across large (9Mb or greater) regions. For multiple studies, stacked plots of these counts are generated.

2. The high-resolution view shows the same as the low-resolution view, but in 15Kb windows across smaller regions (< 9Mb). In addition, it shows a combined trace of maximum p-values across the region, the individual markers and those markers present in multiple studies.

  • The threshold can be changed in the expandable Settlings and Tracks panel above. Within this panel it is also possible to turn Browser tracks on and off using the check boxes.
  • To find a marker, gene or genomic region of interest within the selected studies, use the search box ‘Landmark or Region'.
  • The Region Markers tab gives a summary of which markers are located in a particular region, together with the Study and Result Set they are contained in.

REFERENCE DOCUMENTS

HGVbaseG2P Nomenclature System

We have devised a completely new HGVbaseG2P Nomenclature System to ensure consistent and unambiguous presentation of alleles and genotypes. The system caters not only for simple sequence alleles and traditional presence/absence genotypes, but also copy-number variants and somatic variants, as well as quantitative and ratio classes of genotypes. It also offers a robust way to represent long alleles.

HGVbaseG2P Object Model v1.0

HGVbaseG2P data is organised into a series of relational database tables, a graphical overview of which is provided as a Data Model Diagram. The detailed structure of these tables is available in the form of a MySQL Relational Schema Definition.

Submitting data to HGVbaseG2P

Instructions on how to submit datasets to HGVbaseG2P are provided in the Submission Guidance Notes. To help you assemble your data correctly, we provide a Submission Data Template.

DEFINITIONS/GLOSSARY

What is a Study?

A Study in HGVbaseG2P is similar in scope to a journal article, comprising information relevant to a given research question or set of related questions. Data and analysis results from a study are grouped into one or more Experiments. The main fields in a Study entry are: Title, Abstract, Background, Objectives, KeyResults, Conclusions, StudyDesign, StudySizeReason, StudyPower, SourcesOfBias, Limitations, Acknowledgements, and SubmissionDate.

What is an Analysis Experiment?

Analysis Experiments in HGVbaseG2P are packages of information that address one discrete research question, providing summaries of genetic association findings in Assayed Panels. An Analysis Experiment may include data for any number of Markers and any number of Assayed Panels, but will address no more than one Phenotype question. The main fields in an Experiment entry are: Objective, Outcome, and Comments.

What is a Sample Panel?

A Sample Panel in HGVbaseG2P is a set of test subjects that are collected together and grouped into a named compilation to address some phenotype of interest. Typically, all the individuals in a Sample Panel are annotated in terms of one or more related Phenotypes, or share some commonality of another key metric (e.g., age, gender, ethnicity). Sample Panels may or may not be equivalent to the eventual groupings that are used as the basis for examining and reporting Experiment data, i.e., the Assayed Panels.

What is an Assayed Panel?

An Assayed Panel in HGVbaseG2P is a set of test subjects that are grouped into a named compilation, and used as the basis for examining and reporting Experiment data. Each Assayed Panel is derived from one or more Sample Panels (by splitting them into subsets and/or merging across Sample Panels) on the basis of some explicit phenotype criterion (such as presence/absence of a Phenotype, or a Phenotype value beyond some inclusion threshold).

What is a Phenotype?

A Phenotype in HGVbaseG2P is a reported characteristic or trait of interest, such as blood pressure. Phenotype information is organized into three sub-components: the 'Phenotype Property' which represent the concept of the trait under study, the 'Phenotype Method' which describes how the Phenotype Property was measured, and the 'Phenotype Value' which is a particular observation/result produced by measuring the Phenotype Property. Schemalet examples of this are available at the PaGE-OM website.

This system is very straightforward to use for the representation of ordinal or nominal Phenotype Values. To solve the problem of presenting quantitative Phenotype Values in a group of individuals (i.e., a Sample Panel or an Assayed Panel), HGVbaseG2P stores various statistics that define the group's distribution (e.g., mean, max, min, standard deviation). HGVbaseG2P does not store Phenotype information for single individuals.

What is a Marker?

In HGVbaseG2P we define a Marker as: "A DNA sequence for which identical or highly similar instances exist at one or more locations in a genome. Markers are typically used as the basis for designing an experimental assay for detection of those instances of that sequence". The range of Markers available in HGVbaseG2P is extensive, including the complete Marker content from other public depositories such as dbSNP, UniSTS, and DBGV.

What is a Genotype?

In HGVbaseG2P we define a Genotype as: "A qualitative or quantitative combination of alleles of one or more Markers or DNA regions, implied (by the result of running a genotyping assay) to be resident at one or more positions in the genome of a tested DNA sample". This definition thus focuses on the genotyping result and not absolute reality, i.e., detected genotypes may not always reflect the true status of the genome, since some assays are flawed in their design or application, and some DNA samples may be inaccurately genotyped. This definition also allows for haplotype genotypes, MarkerSet genotypes (composite Marker signals), and genotype classes that are something other than simple presence/absence detections. Specifically, we must also cater for copy-number variation and somatic variation, which implies quantitative and ratio genotypes will need to be supported. A new HGVbaseG2P Nomenclature System for genotypes has been devised, to help manage these various complexities.

What is an Allele?

In HGVbaseG2P we define an Allele as: "A specific version of a set of different sequence alternatives of a Marker or DNA region resident at one or more locations in a genome". To minimise confusion when referring to Alleles, HGVbaseG2P always presents Alleles in the context of their immediate flanking DNA sequences, and a new HGVbaseG2P Nomenclature System for Alleles has been devised.

What are MeSH terms?

Medical Subject Headings (MeSH) is the National Library of Medicine's controlled vocabulary thesaurus. It consists of sets of descriptors structured in a hierarchy that permits searching at various levels of specificity. In HGVbaseG2P two levels of MeSH are implemented. MeSH 'headings' are displayed in the MeSH tree and represent concepts found in the biomedical literature, for example "Neoplasms". MeSH 'terms' are used by the phenotype autocomplete search box and are the various synonyms used to represent those concepts, for example "Benign Neoplasms" and "Cancer".