Skip to Main Content

Biology (New York/Shanghai/Abu Dhabi)

A guide to resources for finding articles, books, and data in biology. Includes resources to comply with NIH research rigor and transparency initiatives.

Uniquely Identifying Resources

Resource identification refers to the unambiguous reporting of research resources such as genes, organisms, tools, and reagents. These resources should be reported within publications with enough information that reviewers and subsequent researchers can identify the exact strain or reagent used. Preferably, authors should provide the full, descriptive name of the resource, its source and a unique identifier. Doing so allows for:

  • Better evaluation of the methods and interpretation of results
  • Reproducibility of the research
  • Machine readability and potential text mining applications

Many reporting standards will include guidance on how to identify such resources. This page provides further information and sources for unambiguous identifiers.

Use RRIDs when available! Research Resource Identifiers (RRIDs) are persistent and unique identifiers for organisms, cell lines, antibodies, and software tools available through The Resource Identification Initiative. They can be found through the portal below.

Organism and Strain Identification

Report:

  • For trangenic animals: source, species, strain, sex, age, husbandry and inbred and strain characteristics.
  • For identifying model organisms: species IDs can be found in NCBI Taxonomy. Strain information can be found in model organism databases and reported using their unique identifiers. For example, for a strain of C. elegans, use the WormBase ID.

Alternatively, RRIDs are available for some species.

Species Identifiers

Model Organism Databases (for strain info)

Reagent & Cell Line Identification

Report:

  • Full, descriptive name of the resource, including host species, if relevant
  • Source of the resource, as in the vendor or lab
  • Unique ID (could be a catalog number, accession number, CAS ID).

Again, RRIDs are available for many antibodies and cell lines and should be used when available.

Sequence & Variant Identification

Genes discussed in the literature often have multiple names, symbols, and IDs associated with them. This can cause confusion when reporting on genes or sequences and also when searching for them. Gene nomenclature committees and molecular sequence databases provide standardized names and identifiers as well as known synonyms.

Approved gene symbol: There are species-specific rules about naming genes. Nomenclature committees that assign gene names and symbols exist for a variety of organisms (see below). If you are publishing a report on a gene that does not have an assigned symbol, you can contact these committees to request that they assign one prior to publication. Some journals will require this step.

Accession IDs: Molecular sequence databases assign unique accession numbers to sequences. The International Nucleotide Sequence Database Collaboration (DDBJ/EMBL-EBI/NCBI) assigns accessions in a specific format. In NCBI, for example, you may find a record with an accession like: EF212037.2. While the "EF212037" is the Genbank (direct submission) accession number. The ".2" is the version number and indicates this is the second version of this record. It is, therefore, important to include the dot and version number when reporting accession IDs from these sources.

+ GenBank Accession Number Reference Sheet (includes RefSeq)
+ More about types of reference sequences

Reporting variants: HGVS is the standard for reporting human gene variants. Model organism nomenclature committees include species-specific rules for unambiguously naming variants.

The Human Genome Variation Society (HGVS) provides detailed recommendations for the unambiguous naming of sequence variants. HGVS notation includes a reference sequence, type of reference sequence (DNA, RNA, protein...), nucleotide/residue number, and type of variation (substitution, deletion, duplication...). The same variant can be named in various ways depending on the reference sequence selected. For example, allele 17, the "ultra-rapid metabolizer," of the CYP2C19 gene may be named: 

  • NG_008384.2:g.4195C>A
  • NM_000769.2:c.-806C>A
  • NC_000010.11:g.94761900C>A (GRCh38)
  • NC_000010.10:g.96521657C>A (GRCh37)

You may also see the allele name used, CYP2C19*17, or the rs number from dbSNP, rs12248560.

Abbreviation meanings in HGVS

Humans

Other Organisms

Converting Identifiers

More Information

For more information on how to report research resources unambiguously - particularly when no unique identifier is available, the following articles and sites provide guidance and some examples.