QUARNA is a web portal, provides information about Quartets in RNA. Nucleotide base quartets are group of 4 nucleotide bases which interact with each other via base pairing and observed recurrently in different functional RNA molecules. In the present work we have classified quartets according to their belongingness to a particular topological class. A comprehensive nomenclature scheme has been proposed based on topology, identity of the four participating nucleotide bases and geometries of constituent base pairs.
QUARNA web-portal will help RNA researcher to identify and annotate base quartets in a given pdb file and will provide complete lists of different type of quartets present in specified datasets of known RNA structures.
A base pair is characterized by base identities, the interacting edges and respective glycosidic bond orientation of both the participating bases. Based on interacting edges and glycosidic bond orientation, Leontis and Westhof Classified base pairs in 12 different classes. Figure 1 and figure 2 explains all possible geometries.
Figure 2. Schematic diagram of 12 base pairing geometries given by Leontis and Westhof. In this classification some of the base pairing geometries have not been considered as separate geometries, e.g., Watson-crick:Hoogsteen and Hoogsteen:Watson-crick geometry has been considered as same. So, if we consider Hoogsteen:Watson-Crick, Sugar:Watson-crick and Sugar:Hoogsteen geometries are different from Watson-crick:Hoogsteen, Watson-crick:Sugar and Hoogsteen:Sugar respectively, then total number of geometric families will be actually 18 (considering both cis and trans orientation of the three aforementioned geometries).
(Reference: Leontis, N.B. and Westhof, E. 2001. Geometric nomenclature and classification of RNA base pairs; RNA, 7:499-512.)3. Quartet Topologies:
In a quartet, bases can be arranged in one of the four different ways shown in Figure 3. These are the four quartet topologies observed in RNA. Base identity and exact base pairing geometries are not considered to define topology, rather, in each topological classes there are multiple subclass with unique base combinations and unique base pairing geometries between them.
Type 1 is the linear topology where four bases are connected linearly through base pairing interaction. In linear topology there is two internal bases, each of which pair with two other bases (one internal and one terminal) and two terminal bases, each of which pair with one of the central base.
Type 2 is a star type of topology where one central base pairs with three other bases, through its two (in case of one bifurcated pair) or three different edges.
Type 3 is where each of the four bases is paired with two other bases and form a cyclic topology.
Type 4 can be called a semi cyclic topology, which is a variant of star topology having three bases connected to one central base and any two of the terminal base are connected with each other. In this topology central base and two terminal bases forms a cyclic triple and the remaining one terminal base is paired with only the central base.
Another two types of quartet topologies can be possible theoretically (described in figure 4 of supporting information). But probably, as those topologies are not very preferable sterically, no instances of those two types have been observed in existing RNA structures.
Basic conventions of the topology specific nomenclature of RNA base quartets are tabulated in figure 5 with, schematic representations and one example of each topology.
While naming a particular quartet, it is mandatory to mention the topological class of that quartet as a prefix to the quartet name. However in each topological class few basic priority rules has been implemented for unambiguous naming. The rules are described as follows:
We have considered 4 different datasets, to explore the variety of different quartets
HDRNAS non-redundant dataset:
This dataset contain 167 pdb files, having nucleotide chain length >30nt and resolution <=3.5Å. HDRNAS (here) classifies all the RNA structures available in RCSB-PDB database according to their functional classes, e.g. transfer RNA (tRNA), messenger RNA (mRNA), ribosomal RNA (rRNA), ribozymes, riboswitches, ribonucleases, etc. The tRNAs and ribosomal RNAs are further classified according to the amino-acid it may carry and sedimentation coefficient, respectively. In order to exclude small synthetic RNA constructs, we have used a chain length cut-off of 30 nucleotides or larger for the classification procedure. The structures are then further classified according to the source organism from which the RNA molecules were isolated and crystallized. The non-redundant dataset consists of the best representative RNA structures from each of these sub-classes, determined by the best resolution and R-factor (free R-value).
NDB non-redundant dataset:
The non-redundant list of RNA structures available at NDB (release id:1.89) contains 838 RNA structures, which have a resolution cutoff <= 3.5Å (here). In this list no restriction on chain length has been implemented, therefore it contains both synthetic and functional RNA molecules. NDB non-redundant list was created by removing uninteresting redundancy by clustering all the RNA structures based on sequence comparison, structural superposition and geometric analysis and selecting one representative from the each cluster. The name of 838 pdb files present in NDB non-redundant list are mentioned in supporting information.
X-ray structure dataset from RCSB-PDB:
This dataset have been created by considering all RNA containing crystal structures with a resolution cutoff <=3.5Å, which were available in RCSB-PDB database(here) till 21st September, 2015. This dataset contains a total of 1873 RNA containing X-ray crystal structures, of which 139 structures are available as mmCIF/PDBx format. These 139 structures are actually of larger biomolecular assemblies like ribosomes.
As our program deals with structure files written as pdb format, In this dataset we have considered multiple PDB format-like files (available at the RCSB-PDB website) corresponding to each of the 139 mmCIF/PDBx files. For example, 4u3m is a structure of Yeast 80S ribosomes and available as ammCIF/PDBx file. For the present study we have taken 5 corresponding PDB format-like files, named 4u3m-pdb-bundle1.pdb to 4u3m-pdb-bundle5.pdb, instead of a single 4u3m mmCIF file. Thus, total number of pdb files are even more in this dataset, as there are actually 559 pdb-format-like files corresponding to 139 mmCIF files.
NMR structure dataset from RCSB-PDB:
This dataset contains all RNA containing NMR structures (591 pdb files) available at RCSB-PDB database (here).
BPFIND is a tool (here), which detec and annotate base pairing interactions present in nucleic acid structures(DNA and RNA). This tool implements a hypothesis driven algorithm, for detection of base pairs, which are stabilized by at least two good hydrogen bonds and have a significant planar geometry. To quantify the goodness of a base pair BPFIND calculates a composite parameter called E value. Lower E_value implies better base pairing geometry.
(Reference: Das,J., Mukherjee,S., Mitra,A. and Bhattacharyya,D. (2006) Non-Canonical Base Pairs and Higher Order Structures in Nucleic Acids: Crystal Structure Database Analysis. J Biomol Struct Dyn, 24, 91–202.)
We have used BPFind program to identify all base pairs present in RNA structures. Our Quartet finder program considers the output of BPFIND program as input for identification and classification of quartets.