The preponderance of short repeating patterns is an important phenomenon in biological sequences. Here present Compression algorithm, which data compresses by searching exact repeat, genetic palindrome and palindrome (RGP2) substring substitution and create a Library file. The output of RGP2 again compressed by Huffman’s algorithm. It can provide the data security, by using ASCII code, on line Library file acting as a signature and Huffman’s tree on at a particular level on the basic of a key tree node. Over all compression rate is 2.263592 bit per base. The algorithm can approach a moderate compression rate, provide strong data security, the running time is very few second and the complexity is O(n2).
International nucleotide sequence database collaboration, (2013),[Online]. Available: http://www.insdc.org.
Karsch-Mizrachi, I., Nakamura, Y., and Cochrane, G., 2012, The International Nucleotide Sequence Database Collaboration, Nucleic Acids Research, 40(1), 33–37.
Deorowicz, S., and Grabowski, S., 2011, Robust relative compression of genomes with random access, Bioinformatics, 27(21), 2979–2986.
Brooksbank, C., Cameron, G., and Thornton, J., 2010, The European Bioinformatics Institute’s data resources, Nucleic Acids Research, vol. 38, 17-25.
Shumway, M., Cochrane, G., and Sugawara, H., 2010, Archiving next generation sequencing data, Nucleic Acids Research, vol. 38, 870-871.
Kapushesky, M., Emam, I., Holloway, E., et al. , 2010, Gene expression atlas at the European bioinformatics institute, Nucleic Acids Research, 38(1), 690-698.
Ahmed A., Hisham G., Moustafa G., et al., 2010, EGEPT: Monitoring Middle East Genomic Data, Proc., 5th Cairo International Biomedical Engineering Conf., Egypt, 133-137.
Korodi, G., Tabus, I., Rissanen, J., et al., 2007, DNA Sequence Compression Based on the normalized maximum likelihood model, Signal Processing Magazine, IEEE, 24(1), 47-53.
Mr Deepak Harbola1 et al. State of the art: DNA Compression Algorithms, International Journal of Advanced Research in Computer Science and Software Engineering, 2013, pp 397-400.
A. Postolico, et al., Eds., DNA Compression Challenge Revisited: A Dynamic Programming Approach, Lecture Notes in Computer Science, Island, Korea: Springer, 2005, vol. 3537, 190–200.
Nour S. Bakr1, Amr A. Sharawi, ‘DNA Lossless Compression Algorithms: Review ‘, American Journal of Bioinformatics Research, 2013 pp 72-81
S. Grumbach and F. Tahi, “A new challenge for compression algorithms: Genetic sequences,” J. Inform. Process. Manage., vol. 30, no. 6, pp. 875-866, 1994.
X. Chen, S. Kwong and M. Li, “A Compression Algorithm for DNA Sequences and its Applications in Genome Comparison,Genome Informatics, 10:52–61, 1999.
Bell, T.C., Cleary, J.G., and Witten, I.H., Text Compression, Prentice Hall, 1990.
Matsumoto, T., Sadakane, K., and Imai, H., 2000, Biological Sequence Compression Algorithms, Genome Informatics, 2000,pp 43–52.
Giancarlo, R., Scaturro, D., and Utro, F., 2009, Textual data compression in computational biology: a synopsis, Bioinformatics, 25(13), 1575–1586.
Nalbantog̃lu, Ö. U., Russell, D.J., and Sayood, K., 2010, Data Compression Concepts and Algorithms and their Applications to Bioinformatics, Entropy, 12(1), 34-52.
Ma,B., Tromp,J. and Li,M. (2002) PatternHunter—faster and more sensitive homology search. Bioinformatics, 18, 440–445.1698
Syed Mahamud Hossein et al.A Compression & Encryption Algorithm on DNA Sequences Using Dynamic Look up Table and Modified Huffman Techniques, I.J. Information Technology and Computer Science, 2013, pp 39-61
Md. Syed Mahamud Hossein,A Compression and Encryption Algorithms on DNA Sequences using R2CP and Modified Huffman Technique, International Journal of Computer Applications , 2012 ,pp 1-10
Dhajvir Singh Rai et al., Survey of Compression of DNA Sequence, International Journal of Computer Applications, 2013, pp- 52-58
Jie Liu et al., A Fixed-Length Coding Algorithm for DNA Sequence Compression(Draft,using Bioinformatics LATEX template), Bioinformatics,2005,pp 1–3
Xin Chen, San Kwong and Mine Li, “A Compression Algorithm for DNA Sequences Using Approximate Matching for Better Compression Ratio to Reveal the True Characteristics of DNA”, IEEE Engineering in Medicine and Biology, 2001, pp 61-66
DNA Sequence, Huffman Code, Compression Rate, Node, Encode, Decode, Lossless Compression, Repeat, Genetic Palindrome, Palindrome, Substitution and Encryption.