A String-Matching Operation using Finite Automata and Online Interface for Bioinformatics Algorithms

Authors

Keywords:

Bioinformatics, A-BOM, Interface, Approximate Pattern Matching

Abstract

In this study, I present a new web interface for major bioinformatics algorithms and introduce a novel approximate string-matching algorithm. My web interface executes major algorithms on the field for the use of computational biologists, students or any other interested researchers. In the web interface, algorithms come under three sections: Sequence alignment, pattern matching and motif finding. In each section, I introduce algorithms in order to find best fitting one for specific dataset and problem. The interface introduces execution time, memory usage and context specific results of algorithms such as alignment score. The interface utilizes emerging open source languages and tools. In order to develop light and user-friendly interface, all parts of the interface coded with Python language. On the other hand, Django is used for web interface. Second contribution of the study is novel A-BOM algorithm, which is designed for approximate pattern matching problem. The algorithm is approximate matching variation of Backward Oracle Matching. I compare my algorithm with popular approximate string-matching algorithms. Results denote that A-BOM introduces 30% to 80% short runtime improvement when compared to current approximate pattern matching algorithms on long patterns.

How to cite this article:
Pattnaik S. A String-Matching Operation using Finite Automata and Online Interface for Bioinformatics Algorithms. J Engr Desg Anal 2020; 3(2): 1-7.

References

Pevsner J. Bioformatics and fuctional genomics.

Smith TF, Waterman MS. Identification of comman molecular subsequences. Journal of molecular biology. Academic Press Incorporated, London, 40-48. doi: 10.1016/00222836(81) 90087-5.

Needleman SB, Wunsch CD. A general method applicable to the search for similarities in the amino acid sequence of two protiens. Journal of molecular biology. Academic Press Incorporated, London. 1970; 40-48. doi: 10. 1016/00222836(81) 90087-5.

Bishop CM. Machine learning and pattern recognition. Information science and statistic springer, Heidelberg.

Dhaeseleer P. How does DNA sequence motif discovery work? Nature biotechnology 2006; 24(8): 959-961.

Ozcan G,Unsal OS. Fast bitwise pattern matching algorithm for DNA sequences on modern hardware. Turkish Journal of Electrical Engineering & Computer Sciences 2015; 23(5): 1405-1417.

Langmead B, Salzberg SL. Fast gapped read alignment Bowtie 2. Nature Methods 2012; 9(4): 357.

Knuth DE, Morris JH, Pratt WR. Fast pattern matching in Strings. Journal of Molecular Biology, SIAM Journal on Computing 1977; 323-350. DOI: 10.1137/0206024 9. Boyer RS, Moore JS, Pratt WR. A fast string searching algorithm. Journal of Molecular Biology 1977; 762-772.

Published

2021-04-28