Insilico Functional Annotation of Hypothetical ORFs in Human Chromosome2

Download Article

DOI: 10.21522/TIJAR.2014.03.01.Art022

Authors : Sivashankari Selvarajan, Piramanayagam Shanmughavel


The high-throughput genome projects have resulted in a rapid accumulation of genome sequences for a large number of organisms and large number of genes with unknown function (Hypothetical). To fully realize the value of the data, scientists need to identify proteins encoded by these genes and understand how these proteins function in making up a living cell. With experimentally verified information on protein function lagging far behind, computational methods are needed for reliable and large-scale functional annotation of proteins. Functional annotation is the process of identifying for a given gene its biological function, interaction with other elements, involvement in metabolic pathways, and any other piece of information that helps in understanding when and how a gene influences the overall system. On the other hand, many Biological Processes and Disease mechanisms are still unknown due to lack of knowledge about the function of the Hypothetical genes in Human. Once its function is revealed the so called hurdle of unknown mechanism of the Human Genome can be mastered. Hence, the present study aims to use computational approaches to annotate the function of hypothetical genes in Chromosome 2 of Human. The annotation of the hypothetical genes in human chromosoem2 was done both at the nucleotide and protein level. Among the 41 uncharacterized hypothetical genes in Human chromosome 2, the functions of 27 of them were successfully annotation. Further, experimental validation is essential to confirm the predicted function.


[1.] Ana Conesa, Stefan Götz, Juan Miguel García-Gómez, Javier Terol, Manuel Talon and Montserrat Robles, Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research, Bioinformatics, 2005, Volume 21, Issue 18 Pp. 3674-3676.

[2.] Bhattacharya, A., Lakhman, S.S., Singh, S. (2004). Modulation of L-type calcium channels in Drosophila via a pituitary adenylyl cyclase-activating polypeptide (PACAP)-mediated pathway. J. Biol. Chem. 279(36): 37291--37297.

[3.] Chen, Y. and Xu, D. (2003) Computation analysis of high-throughput protein-protein interaction data. Current Peptide and Protein Science, 4, 159-181.

[4.] Geer LY, Marchler-Bauer A, Geer RC, Han L, He J, He S, Liu C, Shi W, Bryant SH. The NCBI BioSystems database. Nucleic Acids Res. 2010 Jan; 38(Database issue):D492-6. (Epub 2009 Oct 23)

[5.] Human epithelial cells trigger dendritic cell mediated allergic inflammation by producing TSLP.

[6.] Krogh A, Larsson B, von Heijne G, Sonnhammer EL Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J Mol Biol. 2001 Jan 19;305(3):567-80.

[7.] Lei Kong, Yong Zhang, Zhi-Qiang Ye, Xiao-Qiao Liu, Shu-Qi Zhao, Liping Wei and Ge Gao, CPC: assess the protein-coding potential of transcripts using sequence features and support vector machine, Nucleic Acids Research, 2007, Volume 35, Issue suppl 2 Pp. W345-W349

[8.] Murzin A. G., Brenner S. E., Hubbard T., Chothia C. (1995). SCOP: a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol. 247, 536-540.

[9.] Nakai K and Horton P. PSORT: a program for detecting sorting signals in proteins and predicting their subcellular localization, Trends Biochem Sci. 1999 Jan;24(1):34-6.

[10.] Nat Immunol. 2002 Jul;3(7):673-80. Epub 2002 Jun 10.

[11.] Pandit SB, Gosar D, Abhiman S, Sujatha S, Dixit SS, Mhatre NS, Sowdhamini R, Srinivasan N, SUPFAM--a database of potential protein superfamily relationships derived by comparing sequence-based and structure-based families: implications for structural genomics and function annotation in genomes. Nucleic Acids Res. 2002 Jan 1;30(1):289-93.

[12.] Robert D. Finn, Alex Bateman, Jody Clements, Penelope Coggill ,Ruth Y. Eberhardt, Sean R. Eddy, Andreas Heger, Kirstie Hetherington, Liisa Holm, Jaina Mistry, Erik L. L. Sonnhammer, John Tate and Marco Punta, Pfam: the protein families database, Nucleic Acids Research, 2013, Volume 42, Issue D1, Pp. D222-D230.

[13.] Roy, N. S., Farheen, S., Roy, N., Sengupta, S. and Majumder, P. P. (2008), Portability of Tag SNPs Across Isolated Population Groups: An Example from India. Annals of Human Genetics, 72: 82–89.

[14.] Shu-Ye Jiang1, Alan Christoffels2, Rengasamy Ramamoorthy1, and Srinivasan Ramachandran, Expansion Mechanisms and Functional Annotations of Hypothetical Genes in Rice Genome Plant Physiology Preview. Published on June 17, 2009, as DOI:10.1104/pp.109.139402

[15.] Soumelis V, Reche PA, Kanzler H, Yuan W, Edward G, Homey B, Gilliet M, Ho S, Antonenko S, Lauerma A, Smith K, Gorman D, Zurawski S, Abrams J, Menon S, McClanahan T, de Waal-Malefyt Rd R, Bazan F, Kastelein RA, Liu YJ

[16.] Tobias, J.A., Bates, J.M., Hackett, S.J. & Seddon, N. 2008. Comment on the latitudinal gradient in recent speciation and extinction rates of birds and mammals. Science 319: 901.

[17.] Xie X, Lu J, Kulbokas EJ, Golub TR, Mootha V, Lindblad-Toh K, Lander ES and Kellis M, Systematic discovery of regulatory motifs in human promoters and 3' UTRs by comparison of several mammals, Nature. 2005 Mar 17;434(7031):338-45. Epub 2005 Feb 27.

[18.] Zarembinski, T.I., Hung, L.W., Mueller-Dieckmann, H.J., Kim, K.K., Yokota, H., Kim, R., and Kim, S.H. 1998. Structure-based assignment of the biochemical function of a hypothetical protein: A test case of Structural Genomics. Proc. Natl. Acad. Sci. 95: 15189–15193