What is an online database

Part 4: An overview of important database systems

Basic course in databases: keeping track of things

In the last parts of our series of articles, we gave a comprehensive introduction to the basics of databases. In addition to building a relational database system, we dealt with the query language SQL. And now? We present a selection of some typical representatives from the database scene and take a look at their essential properties. And as an addition, you get a look at the non-relational database systems, i.e. the so-called NoSQL databases.

This article provides an overview of some of the database management systems (DBMS) available on the market and the underlying techniques used by the systems. Both relational (SQL-based) and non-relational (NoSQL) DBMS are presented. The possible uses must be weighed up in each specific case. With the SQL databases you need a schema that describes the purpose of the database in operation. An example: A personnel database needs a table person with the corresponding columns, which identify the characteristics of a person in more detail. This is usually not the case with NoSQL databases. They work largely without a schema and mostly document-based, i. H. all information is stored in documents. A prerequisite is a more complex logic in the application in order to be able to deal with this type of data management. The box "Overview of the series of articles" provides information on the basic access options to a database.

Overview of the series of articles

Relational databases

Relational databases are based on a schema. A schema is an abstract illustration of the relationships (relations) between the data, or a kind of model. There is a large number of relational DBMSs that differ in the functions they make available. Possible properties of a relational DBMS are:

  • Transaction support: The database is designed in such a way that there is a consistent database at all times. This requires a powerful concept to group requests, changes and deletion processes and to process them jointly. A transaction is always identified by a beginning and an end. All database operations of a transaction are carried out together (COMMIT) or withdrawn (ROLLBACK).
  • Native cluster support: The aim is to achieve a load distribution over several computers in order to be able to process a large number of database operations.
  • Support of sequences: A sequence is a database object with which a unique and consecutive numbering (in the case of artificial primary keys) can be generated.
  • Support of the embedded mode: Another feature is the ability to start the database within an application and access the database using API access. This so-called embedded mode is useful for data-intensive applications. The database is directly linked to the program and is started and stopped together with it. This mode is limited to smaller databases.

With that in mind, we'll look at some relational DBMS in the following sections of text. See Table 1 for more information.

The access options to a database (modes)

There are various ways of accessing the data within the database:

  • Embedded mode: The database serves as a structured, fast and reliable memory for data that is generated during the runtime of the application and where fast access is required. This mode is mainly used in applications that manage very large amounts of data and in which fast access is required, for example in image management software.
  • A remote connection is a classic client / server communication. Communication takes place, for example, via the ODBC interface. For this purpose, the DBMS runs in its own process and makes its services available to the other applications.
  • Mixed mode: The database is started as an embedded DBMS. In this mode, however, it is possible to access the data within the database from other applications.

MySQl

In 1994, MySQL was initially developed as a post-implementation (clone) for mSQL. At that time it was used to make databases of the mask-controlled database system UNIREG available in web applications. MySQL was therefore compatible with both mSQL and UNIREG. The software was released in 1997 under version number 3.21. This version number should indicate that it is based on a core that has already been further developed. Right from the start, it was designed for large amounts of data and very good performance. Stability and availability were occasionally critical. Today, MySQL is one of the most popular database systems that are being developed as open source. The system is very widely used in web applications. One advantage is the very good integration in a wide variety of systems. For example, content management systems such as Joomla or WordPress have a native connection to MySQL. The database supports embedded and server mode.

PostgreSQL

PostgreSQL is mainly used in the professional environment. The big advantage here is the fact that this system is available as open source, but still offers a range of functionalities that can compete with commercial systems. If you look closely, this DBMS belongs to the object-relational database management systems (ORDBMS). They represent a kind of link between the relational databases and object databases and are used wherever quantities of objects have to be related to other data or objects. PostgreSQL has natively supported replication over the network since version 9.0. The DBMS is being developed as an open source project by the PostgreSQL Global Development Group and has existed under this name since 1996.

Postgres originally emerged from a database development at the University of California at Berkeley. It started with the Ingres project, with Michael Stonebraker in charge of leaving the university in 1982 to sell Ingres commercially. When Stonebraker returned to the university in 1985, he started Post-Ingres to solve the problems of the database management systems of the time. In 1989 the first version of Postgres was completed and in 1994 the students Andrew Yu and Jolly Chen added an SQL interpreter to it. The software was released as open source under the name Postgres95. The Postgres95 code complied with the ANSI-C standard and was reduced in size by 25 percent, while performance and reliability were also improved. In the course of the development of the World Wide Web, interest in databases increased. The development of PostgreSQL started in 1996, together with the change to its current name. The first published version under this name is 6.0. Since then, PostgreSQL has been continuously developed.

Here are some technical facts about PostgreSQL: In addition to a comprehensive transaction concept, the Multiversion Concurrency Control (Box: Multiversion Concurrency Control), the DBMS also supports complex queries including subqueries. The referential integrity is ensured, set operations are possible. The maximum database size is only limited by the available memory. The programming of triggers and procedures is possible in different languages, import and export possibilities of database schemes are available.

Multiversion Concurrency Control

Multiversion Concurrency Control (MVCC) is a technology that enables competing (simultaneous) accesses to a relational database as efficiently as possible without blocking or endangering the consistency of the database.

Derby

The DBMS was originally developed by Cloudscape Inc. under the name JBMS. A first version was published in 1997. The product was later renamed "Cloudscape" and in 1999 it was bought by Informix Software Inc. Their database division was again taken over by IBM in 2001. In 2004, IBM transferred the Cloudscape software to the Apache Software Foundation under the name "Derby" as free software. From the beginning of 2005, Sun Microsystems also participated in Derby. In 2006 Derby was integrated into the Java DB Development Kit. Derby itself is completely implemented in Java. The main properties of the database are:

  • Size: The jar file is less than a megabyte in size
  • Platform independent: Derby is completely implemented in Java and can therefore be easily ported to other platforms
  • Standards-based: Derby fully supports the SQL92 and SQL99 standards

Derby is being developed under the project "The Apache DB Project". An embedded mode for Java applications is also supported.

Oracle DB (or Oracle RDBMS)

This DBMS is one of the top dogs among the databases and offers pretty much everything in terms of functions in a carefree package that is expected from a DBMS. Some important functions are:

  • Implementation of the ACID properties
  • Cross-platform support for distributed databases
  • Data warehouse functionality
  • Java Message Service
  • OLAP and data mining
  • Intelligent data backup (optionally with block or row change tracking)
  • Stored procedures in PL / SQL or Java
  • Regular expression support in queries
  • Versioning of tables (long-term transactions)

HSQLDB

HSQL is a relational SQL database (RDBMS) programmed entirely in Java and is available under a modified BSD license. The system is very widespread because it is used as an embedded database in some projects, including it is used as a standalone database in OpenOffice.org. HSQL differs from other database systems in many ways. The Java file is only 700 KB in size. It is characterized by a high degree of integrability, as HSQL includes its own server. This means that the database can receive queries over the network or be embedded in the relevant application (access then takes place via API calls). A so-called “memory only” mode is supported. Here all data is only held in memory and not saved on the data carrier. Changes are lost when the server is closed (or the application in standalone mode). HSQL offers the possibility to use all SQL commands that modify the table data (CREATE, ALTER, INSERT, UPDATE) in a log file. The size of a table / field is currently limited to 8 GB.

H2 database engine

The H2 engine has existed since 2004. In December 2005, H2 was released for the first time. The name stands for "Hypersonic 2". The H2-DBMS is also one of the leaner systems; the underlying Java file is only 1.2 MB. Nevertheless, the system offers a good range of functions. The database can be run without installation (so-called "rapid prototyping"). The H2 DBMS supports both an embedded mode and a server mode. The database can also be operated completely in the main memory. The range of functions corresponds to the HSQLDB system, but has been newly implemented and optimized.

MySQLPostgreSQLH2DerbyOracle DB
APIJavaC, Delphi, C ++, Java / JDBC, Tcl, PHP, Perl, Python, Ruby, .NETJavaJava native, C, C ++ via ODBCJava
Written inC, C ++C.JavaJavaC, C ++
LicenseDual license system (proprietary and GPL)BSD licenseDual license system (Eclipse and Mozilla Public License)Apache license 2.0Proprietary
Native clusteringYesYesYesYesYes
Full text searchYesYesYesNoYes
Row level lockingYesYesYesYesYes
Multi Version ConcurrencyYesYesYesNoYes
Role Based SecurityYesYesYesYesYes
Custom aggregate functionsYesYesYesNoYes
SequencesNoYesYesNoYes
Transactions
CLOB / BLOB compressionNoNoYesNoNo

Table 1: Overview of some relational DBMS

NoSQL databases

Now we come to the NoSQL databases. Table 2 gives a condensed overview. These databases were created by the changing demands on a database on the web and the increasingly loosing requirements on the underlying schema of a database. NoSQL should not be understood as "No SQL", but rather as "Not only SQL". Quotation from the book NoSQL [1]: "NoSQL wants to highlight new alternatives to the ubiquitous relational data model and to common database technologies such as transaction management, which are superior to the ubiquitous relational database system’ for certain application classes in terms of operating costs, application development or scalability. "

NoSQL databases are based on the following principles:

  • Map / Reduce: This approach arose from the need to be able to process the ever-increasing amount of data better and faster. A so-called map / reduce framework enables efficient and parallel processing of large amounts of data. It was developed by Google in 2004 and patented in 2010. Let us now consider this algorithm using an example: The aim is to generate word lists with frequency information from a set of text documents. The input data for this algorithm consists of a key (for example a document name) and the document itself. This usually involves a large number of documents that have to be processed simultaneously. This should be done on several computers. The process consists of two phases. First comes the map phase. In this phase the data is processed in such a way that at the end there is a list of key-value pairs for each document. These are intermediate results, each containing the word and the number of occurrences. In the reduce phase, all interim results that belong together are added up, thus determining the total number of occurrences per word.
  • BASE (Basically Available, Soft State, Eventually Consistent): One difference between the SQL and NoSQL databases is the different view of consistency within the database. Almost all SQL-based databases follow the so-called ACID paradigm (box: ACID). In contrast, many NoSQL database systems rely on BASE. The main concern here is high data availability. This takes precedence over the absolute consistency of the data.

ACID

ACID is an acronym. It describes desirable properties of transactions in database management systems and distributed systems:

  • Atomicity: A transaction should either be carried out entirely or not at all
  • Consistency: A database should be transferred from one consistent state to another by means of a transaction, i.e. it should never be inconsistent
  • Isolation: Each transaction should be carried out in isolation
  • Durability: Every result of a transaction should be saved permanently

CouchDB

One of the most popular databases in this field is CouchDB. It was developed to meet the new requirements of the web 2.0. It is one of the document-based databases, is based on Google's BigTable and follows the principle of Map / Reduce. Inspired by the benefits of these new technologies, former Lotus Notes developer Damien Katz started developing CouchDB in 2005. It was later continued as the Apache project. CouchDB focuses on an uncomplicated use of the database. Documents of any syntax can be stored in CouchDB. The documents are stored in B-trees and are given a document ID and a revision ID for indexing. Each time an instance is updated, a new revision ID is generated, which later enables the changes to be found incrementally. The API access takes place via a REST interface. The CAP theorem (CAP - Consistency, Availability, Partition Tolerance) describes some strategies for distributing application logic over a network. CouchDB uses replication to synchronize changes between the individual nodes and uses the CAP theorem:

  • Consistency: All database users see the same data, even with competing updates
  • Availability: All database users can access one version of the data
  • Partition tolerance: The database can be distributed over several servers

Another interesting detail is that CouchDB is written in the Erlang programming language and therefore takes advantage of many specific advantages of this language, such as parallel processing, high availability, fault tolerance and the ability to switch modules at runtime.

MongoDB

MongoDB was designed with high performance in mind; H. very short response times even with large amounts of data. The database is created in the programming language C ++. As a document-oriented database, it is well suited for extensive data structures such as a blog post with comments without relational relationships to other data structures. Both a separate syntax and an implementation of the Map / Reduce algorithm are available for queries.

Neo4J

Neo4J is written in Java and the database fully implements the ACID paradigm.The structures are mapped onto a network (graph) in the file system. In addition to the data, a graph database also saves the relationships in a structured form. Graphs are made up of nodes and edges and contain additional information, e.g. B. About dependencies and directions. The first version of this database was used in a content management system as early as 2003. In 2007 it was outsourced as a single project and converted into an open source project.

Neo4JMongoDBCouchDB
APIJava, REST, JRuby, Ruby, Python, Jython, Scala, Clojure,C, C ++, Java, PHP, Ruby, Perl, PythonRESTful (see box) JSON API, JavaScript, plug-in architecture PHP, Perl, Ruby
Written inJavaC ++Erlang
LicenseAGPL and commercialGNU AGPL v3.0Apache license 2.0
Transaction modelConcurrent read accesses, synchronization at node level for write operationsUpdate-in-placeMVCC (Multiversion Concurrency Control)
ReplicationMaster-slaveMaster-slaveincremental replication with bidirectional conflict detection and management. Master-master, master-slave

Table 2: Overview of some NoSQL databases

REST (ful)

The acronym REST stands for Representational State Transfer and denotes a software architecture style for distributed hypermedia information systems such as the World Wide Web. Restful refers to an API or technique that relates to REST as a thought model. Mostly the RESTful Web Services (JSR 311) are meant here.

Conclusion

We are at the end of our four-part series of articles on databases. The aim was to give you a compact introduction to database technologies. In addition to fundamental aspects of database design, we also looked at the topic of SQL for processing the data. In relation to relational databases, SQL can still be seen as the central tool for handling the data. But the development continues. In addition to relational databases, so-called NoSQL databases are also on the agenda today. Using these approaches, it should be possible in particular to get the flood of data on the web under control. We have given a brief overview of this approach, which initially appears to be very unusual, in this final part. We are sure that in one or two years the development in this area will take one more step forward and there will be more news to report.

Developer magazine

This article was published in the developer magazine.

Of course, you can also read the developer magazine digitally via the developer.kiosk in your browser or on your Android and iOS devices. The Developer Magazine is also available in our shop by subscription or as a single issue.

Our editorial team recommends: