Views and Indices

Introduction

In this module we will evaluate the importance of views and indices to database applications.  Views and indices improve RDBMS functionality and performance and are central to providing an organization’s diverse constituencies with accurate, relevant and timely data (Garcia-Molina, Ullman & Widom, 2009).  In an effort to constrain the length of this module, we will constrain index analysis to its semantic basis and defer analysis of index data structures and implementations (e.g. single level indices, multiple level indices, bit-mapped indices, function based indices, B-Trees, index optimization, etc.).   For clarity, will continue the formatting and terminology conventions established in previous modules.

RDBMS View and Index Design and Functionality

As a basis, Elmasri and Navathe (200) define the three important defining characteristics of RDBMS to be their ability to support: (a) physical and logical data independence, (b) multiple views, and (c) a catalog to store the database description or schema.

In 1976 the American National Standards Institute (ANSI) Standards Planning and Requirements Committee (SPARC) defined a three level RDBMS architecture comprised of internal, conceptual and external schemas (Codd, 1990; Elmasri & Navathe, 2000).  The internal schema defined in the storage definition language (SDL) describes the physical storage mechanisms and how the base relations and database descriptions or catalog are mapped to the physical storage (Elmasri & Navathe, 2000; Rob & Coronel, 2007).  The internal schema provides an abstract storage platform for the conceptual level facilitating physical data independence.  Physical data independence allows the internal schema to be modified without impacting the higher levels.

The conceptual schema defines the entity relationships, base relations, data types and their necessary operations through the data manipulation language (DML) and data definition language (DDL) (Elmasri & Navathe, 2000; Garcia-Molina, Ullman & Widom, 20096).  Lastly, the external schema also known as the view level provides users with logically independent and authorized access to the base relations through the view definition language (VDL) (Codd, 1990; Elmasri & Navathe, 2000).   Logical data independence allows the conceptual schema to be modified without affecting the external schema’s functionality.  The combination of logical and physical data independence insulates users and applications from the specifics of the DBMS and allows the individual schema components to be improved independently (Codd, 1990).

In order to optimize processing and provide secure and relevant data retrieval, database administrators (DBAs) may implement views and indices.   It is customary for DBAs to provide users with view level access to the base relations rather than the base relations themselves (Codd, 1990; Garcia-Molina, Ullman & Widom, 2009).  View level access allows the DBA to tailor and restrict information for specific constituencies.  It is also customary for DBAs to implement indices to improve performance and provide timely access to finely granular relevant data (Codd, 1990; Garcia-Molina, Ullman & Widom, 2009).

It must be noted this improved access of indices and views comes with a cost.  Implicit in this three tiered architecture is the need to map data requests and transform data between levels.  While this three tiered architecture provides a robust and adaptable DBMS environment, the necessary transformations and mappings between levels consume both time and resources (Elmasri & Navathe, 2000).  This mapping information is also a component of the database’s catalog (Elmasri & Navathe, 2000).

RDBMS Views

As introduced in activity three, the RDBMS model provides a consistent data retrieval mechanism through the external schema or views.  Views, also known as virtual relations are defined as relations derived from other relations (Codd, 1990; Garcia-Molina, Ullman & Widom, 2009).  Views extract aggregate and relational data from base tables stored in the database and previously defined views (Codd, 1990; Elmasri & Navathe, 2000).  Critically important, views also allow DBAs to selectively provide and restrict access to data on a finely granular basis through authentication an authorization.

A view’s purpose is to provide a flexible mechanism for retrieving and tailoring data for a specific group of users in real time (Garcia-Molina, Ullman & Widom, 2009).  Functionally, views are derived from SELECT FROM SQL statements and the generic SQL CREATE VIEW command statement is presented in Table 1 below (Casteel, 2010; Codd, 1990; Garcia-Molina, Ullman & Widom, 2009; Rob & Coronel, 2007).     As evidenced in Table 1, a VIEW is created as a SELECT FROM clause where the AS keyword must specifies a complete SELECT FROM clause (Casteel, 2010).  With this basis views may draw from the SQL’s SELECT FROM clause’s flexible and powerful functionality however the reader is directed to activity three for a more comprehensive analysis of the SQL SELECT FROM statement.  Views can be easily deleted through SQL’s DROP VIEW viewName; command statement.

Table 1

Generic SQL CREATE VIEW statement

            CREATE VIEW

                        AS SELECT attribute(s)

                        FROM Table(s)

                        WHERE condition(s);

Views differ from stored tables in that only the view’s definition is stored and the view instance is either maintained or realized every time it is accessed (Codd, 1990; Elmasri & Navathe, 2000; Garcia-Molina, Ullman & Widom, 2009).   With this basis, a view is always up-to-date and dynamically reflects all modifications to its associated base relations (Elmasri & Navathe, 2000).  Maintaining the relevancy of a view may incur significant and recurring processing overhead if multiple joins are required for the view’s SELECT FROM clause.  Complex mechanisms to incrementally update or materialize views exist however views based on multiple base relations and views that use grouping and aggregate functions are generally not updatable (Elmasri & Navathe, 2000).   To overcome this processing overhead and complexity, DBAs may implement snapshots that capture a static or historical view instance if this functionality is in accord with organizational needs and the data’s fitness for use (Codd, 1990).

A considerable portion of a view’s processing overhead depends on the presence and use of primary and foreign keys and indices to be discussed separately below.  Consistent with a relation’s theoretical basis, each view should contain a primary key however views may contain projected columns that allow missing data (Codd, 1990).  With this basis it may not be possible to enforce view-level primary keys that support entity integrity (Codd, 1990, p. 67).

As introduced above, views are derived relations and their definitions are stored in the RDBMS catalog (Codd, 1990).  According to Codd, the RDBMS catalog minimally contains the following items for each defined view:

“(1) the view name, (2) synonyms for this name, if any, (3) the name of each simple column, (4) for each column, the name of an already declared domain (unless the column is not directly derived from a single base column), (5) whether the column is a component (possibly the only one) of the primary key (if applicable) of the view, (6) the RL expression that defines the view, (7) whether insertions of new rows in the view are permitted by the DBMS, (8) whether deletions of rows from the view are permitted by the DBMS, and (9) for each column of the view, whether updating of its values is permitted by the DBMS” (Codd, 1990, p. 305).

This robust set of criteria can provide DBAs with invaluable guidance when creating and maintaining views as it may pay dividends to evaluate all views against Codd’s (1990) catalog view criteria.

As introduced above, views can control access to data at a finely granular level and therefore serve as powerful confidentiality and authorization or discretionary data access control mechanisms (Codd, 1990; Elmasri & Navathe, 2000).  Confidentiality and authorization are common tenets of information assurance citing that information must be secured from unintentional and willful access and misuse (Bishop, 2003; Silberschatz, Galvin & Gagne, 2010).  Confidentiality models are derived from the military’s Bell-LaPadula model and organizational discretionary access control is typically implemented through an access matrix (Bishop, 2003; Elmasri & Navathe, 2000).

The Bell-LaPadula model establishes a hierarchical mechanism that governs the granting and revoking of privileges based on authentication and authorization (Bishop, 2003; Elmasri & Navathe, 2000).  RDBMS views allow discretionary access control to be extended beyond user account level authorization to select tuple and projected attribute access (Elmasri & Navathe, 2000).  Discretionary access control can be further tuned to restrict access based on resource consumption (Codd, 1990).  In summary, database confidentiality and authorization is critical functionality for today’s organizations and their extended value chains that are now comprised of numerous internal and external constituencies.

As previously introduced, views are derived from SELECT FROM SQL statements and their read only basis restricts them from performing update operations to base tables.  This implicit read only nature of views derived from SELECT FROM SQL is in accord with information assurance policies as views cannot corrupt data therefore ensure data integrity and availability.  It should be noted that modern DBMS may provide the functionality for views to update base tables (Garcia-Molina, Ullman & Widom, 2009) however according to Codd (1990), many contemporary RDBMS products are not in accord with his relational theory.  If it is desirable for a view to modify base table data it may be more prudent to modify the base table directly to remain in accord with Codd’s relational model (1990).

RDBMS Indices

RDBMS indices provide a mechanism to access a database’s tuples logically.  With this basis, indices are of great importance to DBAs and serve as necessary component of RDBMS design for several reasons (Garcia-Molina, Ullman & Widom, 2009; Rob & Coronel, 2007).  Using today’s Von Neumann architecture as a model, data must be in memory to be accessed or processed however typical databases are often too large to fit into main memory (Garcia-Molina, Ullman & Widom, 2009; Silberschatz, Korth & Sudarshan, 1999).  As a result, major portions of a database must reside in secondary storage and the stark differences in speeds throughout the storage or memory hierarchy is well documented (Garcia-Molina, Ullman & Widom, 2009; Nutt, 2004; Silberschatz & Galvin, 1999).

To assess this environment in further detail, recall that file systems typically store data in 512 byte sectors or blocks therefore the finest granular access to the file system is at the block level (Nutt, 2004; Silberschatz, Galvin & Gagne, 2010).  When records are stored in a file system, the records can only be sorted, ordered and stored based on a single field’s value (Nutt, 2004; Silberschatz, Galvin & Gagne, 2010).  As a result, if auxiliary RDBMS retrieval mechanisms are not present, RDBMS components that reside in secondary storage can only be directly accessed and retrieved based on a single attribute (Silberschatz, Galvin & Gagne, 2010; Silberschatz, Korth & Sudarshan, 1999).  If access is required on fields other than the sorted order, the tuples must be searched linearly.

Recall that a component of a RDBMS’s requisite functionality is that data can be quickly retrieved based on multiple dimensions of its atomic and composite values.  Also recall that a relation is an unordered set of tuples and the tuples themselves are comprised of an unordered set of attributes (Codd, 1990).  This definition is consistent with most RDBMS implementations as the CREATE TABLE SQL command is typically implemented a heap-organized table that is implicitly unordered (Casteel, 2010).  This contrast between a RDBMS’s requisite functionality, RDBMS implementations and the Von Neumann architecture demonstrates the need for auxiliary mechanisms to locate finely granular data located in memory and storage.  Indices provide the necessary mechanism to provide timely and finely granular access to a RDBMS’s data.

It must be noted at the outset that RDBMS indices exist outside of the pure theoretical RDBMS model as they are a tool to improve performance therefore indices should only be used for this purpose (Casteel, 2010; Codd, 1990).   With this basis, indices must support rather than impose domain structure and constraints as the domain should be clearly specified at the conceptual level and recorded in the RDBMS catalog (Codd, 1990).  As an example, indices should not impose uniqueness of values within a column as this should be specified as one of the properties of that column (Codd, 1990).

Functionally, indices provide a mapping between column values known as an index key and a row ID where the row ID is a pointer to the row’s physical addresses (Rob & Coronel, 2007).  As a result, an index provides a secondary access path to database data independent of the tuple’s physical location and sorted order (Codd, 1990).  Indices improve performance by quickly locating data necessary to compare data and execute joins (Codd, 1990; Garcia-Molina, Ullman & Widom, 2009).  It must be noted that while indices improve access efficiency, managing the index incurs processing overhead as they must be updated whenever their mapped data or underlying schema is changed.  Furthermore, indices incur memory and storage management similar to an operating system’s file management system as the indices’ structure must also be in memory to be maintained and accessed (Nutt, 2004; Silberschatz, Galvin & Gagne, 2010).

Theoretically, an index can be defined for any column and multiple indices can be constructed on same relation (Codd, 1990).  Indices are typically defined by the DBA to support relevant and timely data retrieval in accord with organization needs.  A generic SQL CREATE INDEX command statement is provided in Table 2 below.  Indices may also be automatically defined by the RDBMS for primary and foreign keys by the RDBMS system (Casteel, 2010).    Defining indexes for a database’s keys makes sense since keys are derived in accord with an organization’s business processes and their unique nature allows them to return just one record or no records at all (Garcia-Molina, Ullman & Widom, 2009).  Defining indices for primary and foreign keys not only speeds up joins but this can also assist the DBA perform referential integrity testing by identifying duplicate attribute values (Codd, 1990; Rob & Coronel, 2007).

Table 2

SQL CREATE INDEX command statement

            CREATE INDEX indexName

ON tableName (attributeName);

Indices may also be defined on non unique attributes or sets of attributes to support an organization’ needs.  If indices are defined on non-unique attributes, it pays dividends if the attributes may be clustered into small groups as an index that retrieves a large data set may not provide a performance improvement when compared with accessing the entire table (Garcia-Molina, Ullman & Widom, 2009).

As introduced above, indices introduce increased management and complexity as they must be updated to remain in accord with the database’s data and its conceptual schema (Codd, 1990; Garcia-Molina, Ullman & Widom, 2009).  Citing the overhead incurred by defining and managing indexes, defining additional indexes requires that the DBA properly balancing their knowledge of the RDBMS structure with the organizational needs.  As an example, small tables may not benefit from indices citing that full table scans may be more efficient than maintaining an index (Casteel, 2010, p. 192).  Management of schema changes can be mitigated if the system supports automated cascading actions (Codd, 1990).  As an example, it would be intuitive to automatically drop an index if the domain it is based on is deleted or dropped.

Summary

RDBMS views and indices are of primary importance to RDBMS design and administration.  To illustrate view’s import to RDBMS design, consider that RDBMS design may be driven by the user’s view perspective (Elmasri & Navathe, 2000).  A RDBMS design driven by this top down approach strives to accommodate the user’s and application’s present and expected use of the data.  Critically important to today’s sensitive business transactions, views provide a framework to extend and restrict access to users based on authentication and authorization.  DBAs and RDBMS strive to foment physical and logical independence and provide information assurance and views provide the requisite functionality to achieve these goals.

Indices are critically important to RDBMS functionality as organizations need timely, relevant and accurate data to drive their decisions.  To illustrate the import of indices further consider that support for indices can be a major dimension in evaluating RDBMS performance and as a result, selecting the appropriate RDBMS product (Codd, 1990).  In summary, RDBMS views and indices are a primary design dimension and play a major role in RDBMS performance and fitness for use.

 

 

References

Bishop, M. (2003). Computer security. Boston, MA: Addison Wesley.

Casteel, J. (2010). Oracle 11g SQL. Boston, MA: Thomson Course Technology.

Codd, E. F. (1990). The relational model for database management: version 2.

Boston, MA: Addison-Wesley Longman Publishing Co.  Retrieved June 1, 2010 from, http://portal.acm.org/citation.cfm?id=77708.

Elmasri, R., & Navathe, S. B. (2000).  Fundamentals of database systems.

Reading MA: Addison-Wesley.

Garcia-Molina, H., Ullman, J.D., & Widom, J.   (2009).   Database systems: The complete book,

(2nd ed.).   Upper Saddle River, NJ: Pearson Prentice Hall.

Nutt, G. (2004). Operating systems (3rd ed.). Boston, MA: Addison Wesley.

Pearlson, K. E., & Saunders, C. S. (2006). Managing and using information systems (3rd ed.).

Hoboken, NY: Wiley Publishing.

Satinger, J. W., Jackson, R. B., & Burd, S. D., (2002). Systems analysis and design (2nd ed.)

Boston, MA: Course Technology.

Silberschatz, A., & Galvin, P. B. (1999). Operating systems concepts (5th ed.).

New York: John Wiley & Sons.

Silberschatz, A., Galvin,  P. B., & Gagne, G. (2010). Operating system concepts with Java

(8th ed.).  Hoboken, NY: John Wiley & Sons.

Silberschatz, A., Korth, H. F., & Sudarshan, S. (1999). Database system concepts (3rd ed.).

Boston, MA: McGraw-Hill.

Rob, P., & Coronel, C.M. (2007). Database systems design, implementation, and management

(7th ed.). Boston, MA: Thomson Course Technology.