Pin Me

Need for Databases in Bioinformatics

written by: Finn Orfano • edited by: Leigh A. Zaykoski • updated: 7/23/2008

There is huge volume of information on biological sciences spread across countries all over the world. In order to share data across the borders, the best way is to create an internet database, through which any researcher anywhere can extract information relating to their field.

  • slide 1 of 1

    First let us know about what a database is. A database is a structured collection of data or records that are stored in a computer. A database helps us to organize the data and makes it available any time.

    Bioinformatics is the field of computerizing biological data. Since the discovery of the DNA and other biological discoveries, a huge amount of research into macro, middle, and micro level research has been unleashed the world over. This has helped to make new drugs and aided to further research to unlock the millions of biological substances, sequences, functions and uses. Thus an initiative was launched by researchers world wide to establish an “internet" data base, located at different parts of the world, which would log all the research done so far  and make it available to other researchers across the world.  And they in turn would add their own discoveries or observations on the subject. Thus the database is a ever growing one, and as yet, its size is best defined as INFINITE.

    These databases contain all the important information needed by the user. They provide everything important and contain all the appropriate information needed. The concept of databases in bioinformatics is used mainly to integrate data. Data integration is one of the main and important concepts of bioinformatics.

    Thus these databases help researchers anywhere to access the data of other researchers and then use that database to extract the information required for the specific research being performed.

    Databases are usually created by using Oracle/SQL or MS-Access. The data sources of bioinformatics often have large and complex structures reflecting the richness of scientific concepts. Many of the bioinformatic data sources contain information such as genes, proteins, sequence annotations or micro array results. Data sources more often than not have similar or overlapping elements, using conflicting data definitions.

    Generally, the data files are quite messy. They miss commas, blank lines and may contain bizarre head lines. To avoid this we use XML files. If a database is created, it is easy to have back up of all the files. You can modify your entry by different commands.

    It is worse to handle binary data files. They are quite tough to handle and are also confusing. So, to get rid of all this confusing stuff we usually create databases. There are many data access methods such as using linear, hash and Btree etc. Many other data indexing techniques are present to improve access. There are many kinds of databases such as flat-file, hierarchical, network, relational, object etc. SQL provides different commands for manipulation of data in a database. It has different commands such as INSERT, UPDATE, CREATE, and others to manipulate data.