HTML, XHTML and CSS All-In-One for Dummies - Andy Harris [260]
XML is commonly used as a common data language, especially in AJAX applications.
Chapter 3: Normalizing Your Data
In This Chapter
Understanding why single-table databases are inadequate
Recognizing common data anomalies
Creating entity-relationship diagrams
Using MySQL Workbench to create data diagrams
Understanding the first three normal forms
Defining data relationships
Databases can be deceptive. Even though databases are pretty easy to create, beginners usually run into problems as soon as they start working with actual data.
Computer scientists (particularly a gentleman named E. F. Codd in the 1970s) have studied potential data problems and defined techniques for organizing data. This scheme is called data normalization. In this chapter, you discover why single-table databases rarely work for real-world data and how to create a well-defined data structure according to basic normalization rules.
On the CD-ROM, I include a script called buildHero.sql that builds all the tables in this chapter. Feel free to load that script into your MySQL environment to see all these tables for yourself.
Recognizing Problems with Single-Table Data
Packing everything you’ve got into a single table is tempting. Although you can do it pretty easily (especially with SQL), and it seems like a good solution, things can go wrong pretty quickly.
Table 3-1 shows a seemingly simple database describing some superheroes.
Table 3-1 A Sample Database
Name
Powers
Villain
Plot
Mission
Age
The Plumber
Sewer snake of doom, unclogging, ability to withstand smells
Septic Slime Master
Overcome Chicago with slime
Stop the Septic Slime
37
Binary Boy
Hexideci-mation beam, obfuscation
Octal
Eliminate the numerals 8 and 9
Make the world safe for binary representation
19
The Janitor
Mighty Mop
Septic Slim Master
Overcome New York with slime
Stop the Septic Slime
41
It seems that not much can go wrong here because the database is only three records and six fields. The data is simple, and there isn’t that much of it. Still, a lot of trouble is lurking just under the surface. The following sections outline potential problems.
The identity crisis
What’s Table 3-1 about? At first, it seems to be about superheroes, but some of the information isn’t about the superhero as much as things related to the superhero, such as villains and missions. This issue may not seem like a big deal, but it causes all kinds of practical problems later on. A table should be about only one thing. When it tries to be about more than that, it can’t do its job as well.
Every time a beginner (and, often, an advanced data developer) creates a table, the table usually contains fields that don’t belong there. You have to break things up into multiple tables so that each table is really about only one thing. The process for doing so solves a bunch of other problems, as well.
The listed powers
Take a look at the powers field. Each superhero can have more than one power. Some heroes have tons of powers. The problem is, how do you handle a situation where one field can have a lot of values? You frequently see the following solutions:
♦ One large text field: That’s what I did in this case. I built a massive (255 character) VARCHAR field and hoped it would be enough. The user just has to type all the possible skills.
♦ Multiple fields: Sometimes, a data designer just makes a bunch of fields, such as power1, power2, and so on.
Both these solutions have the same general flaw. You never know how much room to designate because you never know exactly how many items will be in the list. Say that you choose the large text field approach. You may have a really clever hero with a lot of powers, so you fill up the entire field with a list of powers. What happens if your hero learns one more power? Should you