MariaDB Crash Course - Ben Forta [42]
Now suppose you had multiple catalog items created by the same vendor. Where would you store the vendor information (things such as vendor name, address, and contact information)? You wouldn’t want to store that data along with the products for several reasons:
• Because the vendor information is the same for each product that vendor produces, repeating the information for each product is a waste of time and storage space.
• If vendor information changes (for example, if the vendor moves or his area code changes), you would need to update every occurrence of the vendor information.
• When data is repeated (that is, the vendor information is used with each product), there is a high likelihood that the data will not be entered exactly the same way each time. Inconsistent data is extremely difficult to use in reporting.
The key here is that having multiple occurrences of the same data is never a good thing, and that principle is the basis for relational database design. Relational tables are designed so information is split into multiple tables, one for each data type. The tables are related to each other through common values (and thus the relational in relational design).
In our example, you can create two tables, one for vendor information and one for product information. The vendors table contains all the vendor information, one table row per vendor, along with a unique identifier for each vendor. This value, called a primary key, can be a vendor ID, or any other unique value. (Primary keys were first mentioned in Chapter 1, “Understanding SQL”).
The products table stores only product information, and no vendor specific information other than the vendor ID (the vendors table’s primary key). This key, called a foreign key, relates the vendors table to the products table, and using this vendor ID enables you to use the vendors table to find the details about the appropriate vendor.
* * *
New Term: Foreign key
A column in one table that contains the primary key values from another table, thus defining the relationships between tables.
* * *
What does this do for you? Well, consider the following:
• Vendor information is never repeated, and so time and space are not wasted.
• If vendor information changes, you can update a single record in the vendors table. Data in related tables does not change.
• As no data is repeated, the data used is obviously consistent, making data reporting and manipulation much simpler.
The bottom line is that relational data can be stored efficiently and manipulated easily. Because of this, relational databases scale far better than nonrelational databases.
* * *
New Term: Scale
Able to handle an increasing load without failing. A well-designed database or application is said to scale well.
* * *
Why Use Joins?
As just explained, breaking data into multiple tables enables more efficient storage, easier manipulation, and greater scalability. But these benefits come with a price.
If data is stored in multiple tables, how can you retrieve that data with a single SELECT statement?
The answer is to use a join. Simply put, a join is a mechanism used to associate tables within a SELECT statement (and thus the name join). Using a special syntax, multiple tables can be joined so a single set of output is returned, and the join associates the correct rows in each table on-the-fly.
* * *
Note: Maintaining Referential Integrity
It is important to understand that a join is not a physical entity—in other words, it does not exist in the actual database tables. A join is created by MariaDB as needed, and it persists for the duration of the query execution.
When using relational tables, it is important that only valid data is inserted into relational columns. Going back to the example, if products were stored in the products table with