(QUArtets in RNA)
Find the Terms and definitions used in QUARNA right here.
  1. About QUARNA
  2. Base Pair Geometry
  3. Quartet Topologies
  4. Quartet Nomenclature
  5. Datasets
1. About QUARNA:

QUARNA is a web portal, provides information about Quartets in RNA. Nucleotide base quartets are group of 4 nucleotide bases which interact with each other via base pairing and observed recurrently in different functional RNA molecules. In the present work we have classified quartets according to their belongingness to a particular topological class. A comprehensive nomenclature scheme has been proposed based on topology, identity of the four participating nucleotide bases and geometries of constituent base pairs.
QUARNA web-portal will help RNA researcher to identify and annotate base quartets in a given pdb file and will provide complete lists of different type of quartets present in specified datasets of known RNA structures.

2. Base Pair Geometry:

A base pair is characterized by base identities, the interacting edges and respective glycosidic bond orientation of both the participating bases. Based on interacting edges and glycosidic bond orientation, Leontis and Westhof Classified base pairs in 12 different classes. Figure 1 and figure 2 explains all possible geometries.

Figure 1. (a) interaction edges and (b) glycosidic bond orientation of base pairs.

Figure 2. Schematic diagram of 12 base pairing geometries given by Leontis and Westhof. In this classification some of the base pairing geometries have not been considered as separate geometries, e.g., Watson-crick:Hoogsteen and Hoogsteen:Watson-crick geometry has been considered as same. So, if we consider Hoogsteen:Watson-Crick, Sugar:Watson-crick and Sugar:Hoogsteen geometries are different from Watson-crick:Hoogsteen, Watson-crick:Sugar and Hoogsteen:Sugar respectively, then total number of geometric families will be actually 18 (considering both cis and trans orientation of the three aforementioned geometries).

(Reference: Leontis, N.B. and Westhof, E. 2001. Geometric nomenclature and classification of RNA base pairs; RNA, 7:499-512.)

3. Quartet Topologies:

In a quartet, bases can be arranged in one of the four different ways shown in Figure 3. These are the four quartet topologies observed in RNA. Base identity and exact base pairing geometries are not considered to define topology, rather, in each topological classes there are multiple subclass with unique base combinations and unique base pairing geometries between them.
Type 1 is the linear topology where four bases are connected linearly through base pairing interaction. In linear topology there is two internal bases, each of which pair with two other bases (one internal and one terminal) and two terminal bases, each of which pair with one of the central base.
Type 2 is a star type of topology where one central base pairs with three other bases, through its two (in case of one bifurcated pair) or three different edges.
Type 3 is where each of the four bases is paired with two other bases and form a cyclic topology.
Type 4 can be called a semi cyclic topology, which is a variant of star topology having three bases connected to one central base and any two of the terminal base are connected with each other. In this topology central base and two terminal bases forms a cyclic triple and the remaining one terminal base is paired with only the central base.
Another two types of quartet topologies can be possible theoretically (described in figure 4 of supporting information). But probably, as those topologies are not very preferable sterically, no instances of those two types have been observed in existing RNA structures.

Figure 3. Four topologies of quartets observed in nature.
Figure 4. Two quartet topologies which are not observed in existing RNA structures

4. Quartet Nomenclature:

Basic conventions of the topology specific nomenclature of RNA base quartets are tabulated in figure 5 with, schematic representations and one example of each topology.

Figure 5. Basic rules of quartet nomenclature.

While naming a particular quartet, it is mandatory to mention the topological class of that quartet as a prefix to the quartet name. However in each topological class few basic priority rules has been implemented for unambiguous naming. The rules are described as follows:

Type1/ Linear topology
  1. The name of a linear quartet will start with one of the terminal nucleotide which comes first in the dictionary order, other nucleotides will come serially according to connectivities.
  2. If both the terminal nucleotides are same then, identity of the central bases has to be considered, and the quartet name will start with the terminal nucleotide, which is attached to central nucleotide of lesser dictionary order.
  3. If both the central residues are also same then name will start with the terminal nucleotide having lesser nucleotide number.
Type2/ Star topology
  1. In star topology, central base has to be mentioned first, outside the first bracket.
  2. Then three residues, with which central base is connected, those will be written according to alphabetical order within a first bracket.
  3. If there is more than one same residues within the bracket, then mention the residue with lower nucleotide number will come first.
Type3/ Cyclic topology
  1. The name of a cyclic quartet will start with the residue of lowest dictionary order.
  2. If there is more than one potential starting residue (same base) then consider the residue with lower nucleotide number.
  3. Then remaining residues will come serially according to connectivities. But in cyclic quartet each residue is interacting with two other residues. So, there is an ambiguity to decide, which residue will come next in the order. To remove this ambiguity, here also dictionary order of these two probable residues has been considered. The residue will lower dictionary order will come next to the starting residue and cycle will be completed serially.
  4. If the starting residue is connected with two residues with same nucleotide names, then the nucleotide number of the residues will be considered again and priority will be given to the residue with lower nucleotide number.
Type4/ Semi-cyclic topology
  1. Name of the semi-cyclic quartets are very much similar to the star quartets. Central base will come first, outside the first bracket. Then other three residues, with which the central base is connected, will be written within the first bracket.
  2. The only difference from star topology is here, within the bracket, that base is mentioned first, which is not the part of a cycle. Then write other two in alphabetical order. Here also if more than one same residue is present, priority will be given to lesser nucleotide number.
5. Datasets:

We have considered 4 different datasets, to explore the variety of different quartets

HDRNAS non-redundant dataset:
This dataset contain 167 pdb files, having nucleotide chain length >30nt and resolution <=3.5Å. HDRNAS (here) classifies all the RNA structures available in RCSB-PDB database according to their functional classes, e.g. transfer RNA (tRNA), messenger RNA (mRNA), ribosomal RNA (rRNA), ribozymes, riboswitches, ribonucleases, etc. The tRNAs and ribosomal RNAs are further classified according to the amino-acid it may carry and sedimentation coefficient, respectively. In order to exclude small synthetic RNA constructs, we have used a chain length cut-off of 30 nucleotides or larger for the classification procedure. The structures are then further classified according to the source organism from which the RNA molecules were isolated and crystallized. The non-redundant dataset consists of the best representative RNA structures from each of these sub-classes, determined by the best resolution and R-factor (free R-value).

NDB non-redundant dataset:
The non-redundant list of RNA structures available at NDB (release id:1.89) contains 838 RNA structures, which have a resolution cutoff <= 3.5Å (here). In this list no restriction on chain length has been implemented, therefore it contains both synthetic and functional RNA molecules. NDB non-redundant list was created by removing uninteresting redundancy by clustering all the RNA structures based on sequence comparison, structural superposition and geometric analysis and selecting one representative from the each cluster. The name of 838 pdb files present in NDB non-redundant list are mentioned in supporting information.

X-ray structure dataset from RCSB-PDB:
This dataset have been created by considering all RNA containing crystal structures with a resolution cutoff <=3.5Å, which were available in RCSB-PDB database(here) till 21st September, 2015. This dataset contains a total of 1873 RNA containing X-ray crystal structures, of which 139 structures are available as mmCIF/PDBx format. These 139 structures are actually of larger biomolecular assemblies like ribosomes.
As our program deals with structure files written as pdb format, In this dataset we have considered multiple PDB format-like files (available at the RCSB-PDB website) corresponding to each of the 139 mmCIF/PDBx files. For example, 4u3m is a structure of Yeast 80S ribosomes and available as ammCIF/PDBx file. For the present study we have taken 5 corresponding PDB format-like files, named 4u3m-pdb-bundle1.pdb to 4u3m-pdb-bundle5.pdb, instead of a single 4u3m mmCIF file. Thus, total number of pdb files are even more in this dataset, as there are actually 559 pdb-format-like files corresponding to 139 mmCIF files.

NMR structure dataset from RCSB-PDB:
This dataset contains all RNA containing NMR structures (591 pdb files) available at RCSB-PDB database (here).


BPFIND is a tool (here), which detec and annotate base pairing interactions present in nucleic acid structures(DNA and RNA). This tool implements a hypothesis driven algorithm, for detection of base pairs, which are stabilized by at least two good hydrogen bonds and have a significant planar geometry. To quantify the goodness of a base pair BPFIND calculates a composite parameter called E value. Lower E_value implies better base pairing geometry. (Reference: Das,J., Mukherjee,S., Mitra,A. and Bhattacharyya,D. (2006) Non-Canonical Base Pairs and Higher Order Structures in Nucleic Acids: Crystal Structure Database Analysis. J Biomol Struct Dyn, 24, 91–202.)
We have used BPFind program to identify all base pairs present in RNA structures. Our Quartet finder program considers the output of BPFIND program as input for identification and classification of quartets.

QUARNA version 1.0 © CCNSB, IIIT Hyderabad