Tuesday, August 5, 2014

Relational Database Basics Example 1: Part 1

    Relational Database Basics
Example 1: Part 1

    I'm going to start off my series of MySQL tutorials with a basic overview of a simple relational database model. This model was created in MySQL Workbench, but I will also plan to cover the same example in Jet Brain's new 0xDBE, and as a simple SQL script in separate posts. The start of this series will be very basic, and these tutorials are aimed at programmers who want a basic understanding of SQL languages and relational databases. With that in mind lets start with the entity relationship diagram for this example. This may be a little confusing to look at initially, but it will all make sense in time.

Term Definitions

    Before we start, we should go over some relational database terms. (You can also skip these, and come back to them when you need to. All terms are presented in bold throughout the article).
  • Entity - A table. Entities have attributes and usually represent a collection of a single object or concept, called entity instances. It may be useful to think of an entity as a Java object that is then instantiated with the data specific to each entity instance.
  • Attribute - a property that belongs to all members of an entity. Attributes have data types and are the columns of a table.
  • Entity Instance - A row in a table. An entity instance is a single member of the group that is the entity.
  • Schema - I think schema is best explained by this Stack Overflow answer.
    "A database schema is a way to logically group objects such as tables, views, stored procedures etc. Think of a schema as a container of objects.
    You can assign a user login permissions to a single schema so that the user can only access the objects they are authorized to access.
    Schemas can be created and altered in a database, and users can be granted access to a schema. A schema can be owned by any user, and schema ownership is transferable."
  • DDL - Data Definition Language, SQL used to define the structure of the database
  • DML - Data Manipulation Language, SQL used for managing data
  • Primary Key - An unique id for each row of a table. We use these to index data for a table, and they become foreign keys in other tables.
  • Foreign Key - a primary key from another table. We store these to connect entity instances between tables.
  • Relation/Relationship - a connection between two entities created with primary and foreign keys
  • Entity Relation Diagram - a graphical depiction of the relations between entities
  • Associative Entity - a table that exists to connect and store data between two other tables. Associative entities always have a one to many relation with both tables it is connected to.

MySQL Workbench Example

This is the overall ERD (entity relationship diagram). We will examine all of its components, and then put it all back together.
        In this example, we are selling customers service plans. The bill is a record of which customer has which service plan, and the address table is a list of all of the addresses associated with each customer. These table along with their relationships make up the foundation of our database. But we can strip this down even more to get a better understanding of each table and relation.

Our database without relations. Notice that some tables lost some attributes when we stripped them of their relations.
    Here we're starting with just the tables. These tell us all the information that we store about our business. We need: a name, a phone number, an email, an address, the name of a service plan, and the price that we charge the client for the service plan. In this example, the bill table doesn't have any data of its own, it is just a way to associate clients with service plans. Since we know that clients have addresses, we need to connect the tables to associate each client with their address(es).

The relation between client and address. Note that client_id is now in the address table as a foreign key.

    Now we have the client_id attribute in the address table. The line between the tables is done in Crow's Foot Notation. The notation on the line means that one client MUST have one to many addresses. In SQL we just have relationships involving 0, 1, or many. The double dash by client means that an address MUST belong to a client. If it was a single line with a 0 next to it, we could include addresses in the database that don't belong to a client. The dash and crows foot next to the address table means that a client must have an address, and that they can have more than one address. We could associate a client with a single address, and have both sides of the line would have two dashes. Bellow is what the relation looks like if we have clients who don't have addresses, but clients still can have multiple addresses.

This is what the relation looks like if we have clients without addresses.

    I put address in another table to make the client table look cleaner for the example, but we could just have the address information in the client table if we knew that all clients only had one address. Having addresses in another table is also useful to find all of the clients in a certain zip code or city, without searching through a large text field that stores addresses in the client table. That's all a bit of optimization that we'll deal with later.

Bill is an associative entity between client and service plan.

    Now let's look at the relation between client, bill, and service plan. Bill is an associative entity between client and service plan. If we had information to store that was relevant to the bill itself, something like a Boolean has_paid_yet, we would store it here. In our example bill merely keeps a record of which clients have which service plans. An associative entity is used because a client can have one to many service plans and a service plan can be purchased by one to many clients. 

Here's our completed database again.
    The service plan relation is particularly useful. We could store the service plan in the bill table, or even in the client table. If we did that though, we would have to look at the service plan column for every row of the table and then find everything matching "tier 1" in that column to find a list of all of the customers who have "tier 1" service. It is much easier to query, to store, and optimize this data if service plan is its own entity. Since service plan is just a statically sized table that contains all of service plan offerings, we can have a table with as many rows as we have service plans. If we store the data in the client table, we store as many service plans in VARCHAR and DECIMAL as we have clients. This is a fundamental benefit of proper database design. This means that we may have 5 service plans stored in a table, instead of storing those same 5 service plans duplicated in 100,000 clients. We just connect the two with an INT in the client table that acts as out foreign key. This is a more efficient way to store data, but it does initially make it harder to access the data again. We will cover joining the tables toget to look up a client and their address in the next part of this series.
    In the next part of this series, we will look at adding data to the tables and writing queries. Then I will do the same example in raw SQL without the diagrams. I also plan to show the creation of this example in MySQL Workbench and 0xDBE.