Relations

Introduction  

Naming convention differences exist between database management systems (DBMS) textbooks.  For clarity, we will ascribe to the formatting and naming conventions presented in the Garcia-Molina, Ullman and Widom (2009) and the Ramakrishnan and Gehrke (2000) textbooks.  This naming convention stipulates that the first letter of a relation name (table name) be capitalized whereas attribute names will begin with a lower case letter.   Also note that DBMS relation names must be unique within the same relational DBMS (RDBMS) and attributes names must be unique within a single relation (Silberschatz, Korth & Sudarshan, 1999).

As a basis, the online Miriam Webster Dictionary (2010) defines a relation as a property that holds between an ordered pair of objects.  Relational mathematics and database theory go beyond this definition as they accommodate associations among multiple entities and therefore relations of degrees greater two (Codd, 1990; Rob & Coronel, 2007).  RDBMS theory extends the mathematical relational model further by requiring set members to be of the same type (i.e. same domain) (Codd, 1990).

Codd’s  relational model for database management is based on first order predicate logic, set theory and the theory of relations (Codd, 1990; Garcia-Molina, Ullman &Widom, 2009; Silberschatz, Korth & Sudarshan, 1999).  With this set theory basis, the reader is reminded that the order of the R-table’s columns and rows is irrelevant and tuples are accessed and manipulated based on their content rather than some type of ordering (Codd, 1990).

By definition, a relation is an unordered set of tuples that draw their components from an unordered sequence of attribute sets (Codd, 1990).  Each of these attribute sets has its own domain in accord with the RDBMS’s purpose (Codd, 1990; Garcia-Molina, Ullman &Widom, 2009; Rob & Coronel, 2007; Silberschatz, Korth & Sudarshan, 1999).  It must be reiterated that relations take place at the data level (logical addressing) rather than the memory/address level that you use to access data in Prog & Logic I (primarily using symbols – ex. int x; and x = y + 2;).  Formally, an attribute may be defined by its name and associated type or domain (Codd, 1990).

In Codd’s R-table format, the tuples are represented as rows and their components also known as attributes are represented as columns (Codd, 1990; Garcia-Molina, Ullman &Widom, 2009; Rob & Coronel, 2007).  The rowed tuples are also known as the relation’s extension and represent assertions (Codd, 1990).  The number of tuples in a relation is known as its cardinality (Codd, 1990).

The attribute’s domains and their constraints comprise the relation’s descriptive data also known as the relation’s intension (Codd, 1990).  The relation’s degree is determined from the number of sets from which a relation draws its components (Codd, 1990).  This can be easily seen in R-table format as the number of columns.

Applied DBMS Relational Theory

To assist the reader and serve as a quick reference, the Garcia-Molina, Ullman and Widom (2009, p.28) Accounts and Customers relation instances are reproduced and presented below.  This representation is consistent with Codd’s (1990) relational table (R-table) format where by definition, an R-table is a snapshot or instance of a stored relation (Codd, 1990; Garcia-Molina, Ullman & Widom, 2009; Silberschatz, Korth & Sudarshan, 1999).  In this format, schema attributes may be further identified as italicized column headers and the tuple’s attribute values are listed in the table’s rows as un-italicized text.

 Accounts relation in table form (Garcia-Molina, Ullman & Widom, 2009, p.28).

acctNo type balance
12345 savings 12000
23456 checking 1000
34567 savings 25

Customers relation in table form (Garcia-Molina, Ullman & Widom, 2009, p.28).

firstName lastName idNo account
Robbie Banks 901-222 12345
Lena Hand 805-333 12345
Lena Hand 805-333 23456 

Relation Attributes

As introduced above, attributes are unordered sets and are represented in columns in R-table format.  Consistent with RDBMS theory, each attribute is atomic and is in accord with the component’s domain (Codd, 1990; Garcia-Molina, Ullman &Widom, 2009; Silberschatz, Korth & Sudarshan, 1999).  As previously cited, the Accounts relation is of degree three and its three attributes are acctNo, type and balance.  As previously cited, the Customers relation is of degree four and its attributes are firstName, lastName, idNo and account.

With this introductory basis, both the Accounts and Customers relations’ cardinality is three.  Additionally, the Accounts relation is of degree three and the Customers relation is of degree four.

Relation Tuples of Each Relation

As introduced above, tuples contain one component of each attribute and collectively represent an unordered set of assertions. As previously cited, both the Accounts and Customers relations are cardinality three. When tuples are listed outside an R-table, tuples are typically enclosed in parenthesis and the attributes are separated by commas.  Using this standard parenthetical RDBMS representation, the three Accounts tuples are (12345, savings, 12000), (23456, checking, 1000) and (34567, savings, 25).  Using this standard parenthetical DBMS representation, the three Customers tuples are (Robbie, Banks, 901-222, 12345), (Lena, Hand, 805-333, 12345) and (Lena, Hand, 805-333, 23456).

Tuple Components

The first tuple components from the Accounts and Customers relations as presented in the Appendix and their associated attribute or column names are as follows.  The Accounts relations’ first tuple’s components assert 123456 is the acctNo, savings is the type and 12000 is the balance.  The Customers relations’ first tuple’s components assert Robbie is the firstName, Banks is the lastName, 901-222 is the idNo and 12345 is the account.

Relation schemas

A relation schema is a textual representation of a relation and is defined by the relation name and an unordered set of attributes (Rob & Coronel, 2007; Silberschatz, Korth & Sudarshan, 1999).  The standard RDBMS representation for a relation schema is the relation name followed by the set of attributes enclosed in parenthesis.  Following this standardized formatting, the relation schema is for the Accounts relation is Accounts (acctNo, type, balance).  Following this standardized formatting, the relation schema is for the Customers relation is Customers (firstName, lastName, idNo, account).  To assist the reader with future research and alternative representations, it should be noted that the relation name and its attributes are often displayed on separate lines and the attributes are often indented to reflect their nesting.

Database Schema

A database schema is defined as the set of relation schemas and ascribes to the formatting specified above in relation schemas.  The database schema is comprised of the Accounts and Customers relations as follows: Accounts (acctNo, type, balance) and Customers (firstName, lastName, idNo, account).  Again, the relation names and attributes may be displayed on separate lines and the attributes may be indented to reflect their nesting.

Domain Attributes

As a basis, RDBMS domains go beyond traditional programming language data types (e.g. Java and C++) as they semantically capture the meaning and constraints of the data type in accord with the RDBMS purpose (Codd, 1990; Silberschatz, Korth & Sudarshan, 1999).  With this basis, domain attributes may be defined as an extended data type that captures the RDBMS intension.  Domains are an essential concept to RDBMS as they serve as the most elemental integrity constraint and provide the necessary basis for meaningful comparison (Codd, 1990; Silberschatz, Korth & Sudarshan, 1999).  Attributes may be thought of as specific domain instances in accord with the RDBMS’s intended purpose (Codd, 1990).  As a result, identifying suitable domains for the Accounts and Customers relations requires knowledge about the business’s and RDBMS’s purpose.

With respect to our instance analysis, consider that the Account’s acctNo could be either an integer or a fixed or variable character string.  If account numbers can begin with a leading zero it would make sense to record acctNo as a character string since leading zeros would be dropped in an integer representation.  Character strings provide a query performance advantage and variable character strings can accommodate future and unforeseen growth and purpose (Rob & Coronel, 2007; Silberschatz, Korth & Sudarshan, 1999).  By observation of the relation instance presented in the Appendix, an acctNo does not exist that begins with a leading zero therefore acctNo may be implemented as an integer.

When documenting domain attributes it is customary to format the domain attributes inside parenthesis pairing the attribute and domain with a colon and separating attribute domain attribute pairs with commas.  Reviewing the Accounts and Customers relation instances, one suitable domain attribute pairing using this formatting convention is as follows.  The Accounts relation could be implemented as Accounts (acctNo:integer, type:string, balance:integer).  The Customer’s relation could be implemented as Customers (firstName: string, lastName: string, idNo: string, account: integer).  It should be noted that idNo was implemented as a string due to the presence of hyphens.

Alternative Relation Presentation

As cited above, the order of the attributes or columns and tuples or rows is irrelevant.  An alternative representation of each relation as presented in the Appendix can be achieved by interchanging columns and rows.  An alternative presentation of the Accounts relation is illustrated in table 1 below.  This alternative Accounts relation presentation exchanges the balance and type columns as well as the first and second tuples.  In table 2, a similar transformation is applied to the Customers relation where the third and fourth columns are interchanged and the first and second tuples are also interchanged.

Table 1

Alternative Accounts Relation

acctNo balance type
23456 1000 checking
12345 12000 savings
34567 25 savings

Table 2

Alternative Customers Relation

firstName lastName account idNo
Lena Hand 12345 805-333
Robbie Banks 12345 901-222
Lena Hand 23456 805-333

References

Codd, E. F. (1990). The relational model for database management: version 2.  Boston, MA: Addison-Wesley Longman Publishing Co.  Retrieved June 1, 2010 from, http://portal.acm.org/citation.cfm?id=77708.

Elmasri, R., & Navathe, S. B. (2000).  Fundamentals of database systems.  Reading MA: Addison-Wesley.

Garcia-Molina, H., Ullman, J.D., & Widom, J. (2009).   Database systems: The complete book, (2nd ed.).   Upper Saddle River, NJ: Pearson Prentice Hall.

Ramakrishnan, R., & Gehrke, J. (2000).  Database Management systems (2nd ed.).  Boston, MA: McGraw-Hill.

Rob, P., & Coronel, C.M. (2007). Database systems design, implementation, and management (7th ed.). Boston, MA: Thomson Course Technology.

Silberschatz, A., Korth, H. F., & Sudarshan, S. (1999). Database system concepts (3rd ed.). Boston, MA: McGraw-Hill.