Introduction
In this module we will evaluate SQL query performance tuning in a relational database management system (RDBMS) setting. As a basis, consider that organizational data contributes to an organization’s competitive advantage and this data must be accessible, timely, relevant and accurate (Porter, 1985; Robbins & Judge, 2007; Satinger, Jackson & Burd, 2002). It is intuitive that the time necessary to access information can determine success or failure in critical real time applications (e.g. military and medical systems) however simply providing quick access to information is not enough since data can also be qualified as poor quality if it does not meet user expectations (Shasha, 1996; Russom, 2006). This aspect has particular relevance today as organizational marketing data is exposed to Web-based consumers using the same Web portals that serve as the financial transaction points of contact with the consumer.
To ensure timely data access it is understood that RDBMS performance tuning is vital to organizational success however this task is far from trivial (Herodotou & Babu, 2009; Martin, Powley, Zheng & Romanafu, 2005; Russom, 2006; Shasha, 1996). Today’s business climate has been characterized as highly volatile and dynamic as workloads, business needs and technology evolves and emerges without warning (Friedman, 2005; Herodotou & Babu, 2009; Robbins & Judge, 2007). As a result, today’s RDBMS must be continually tuned to adapt to the environment, organizational needs and changes to the underlying computer systems’ architecture (Shasha, 1996).
RDBMS Performance Tuning
It has been established that configuring and tuning today’s large database management systems (DBMS) is a complex and time-consuming process (Garcia-Molina, Ullman & Widom, 2009; Herodotou & Babu, 2009; Martin, Powley, Zheng & Romanafu, 2005; Raab, 2009; Shasha, 1996). RDBMS performance tuning has been defined as the attempt to improve a query execution plan produced by query optimizer however the query optimizer does not work in isolation and is impacted by nearly every component in an enterprise RDBMS (Herodotou & Babu, 2009; Martin, Powley, Zheng & Romanafu, 2005). Furthermore, this tuning complexity is rapidly increasing as RDBMS continue to evolve and expand to support emergent application areas that include online analytical processing (OLAP), multimedia (e.g. images, audio and video) and recursive knowledge base type queries (Martin, Powley, Zheng & Romanafu, 2005; Raab, 2009). The end result is an increasingly complex, continually changing and diverse bursty workload (Martin, Powley, Zheng & Romanafu, 2005). While a detailed discussion of all of these emergent trends would exceed the length restrictions of this paper, consider the impact of the convergence between OLAP and transactional RDBMS introduced below.
Out of necessity, early marketing systems embraced dedicated analytical databases however present marketing analytics often require queries posed in real time against transactional systems (Kotler & Keller, 2007; Raab, 2009). Recall the difference between OLAP and transactional systems where transactional systems queries are posed over small sets of data (i.e. few relations and records) and OLAP queries typically read all records from cubed sets of aggregated data (Garcia-Molina, Ullman & Widom, 2009; Raab, 2008a). This converged functionality can be supported by today’s hardware and maintaining a singular database may be more cost-effective than maintaining a separate analytical database (Raab, 2009). Maintaining a singular RDBMS for both transactional and OLAP processing increases complexity as these two diverse types of queries represent very different perspectives and incur very different overheads. The impact of this shift is the need to tune a RDBMS from several dimensions and this requires deep knowledge of systems and their integration (Shasha, 1996).
In order to achieve good DBMS performance, database administrators must possess detailed and comprehensive knowledge of the system’s desired characteristics, the individual system components and the interrelationships between the individual system components in relation to the desired characteristics (Garcia-Molina, Ullman & Widom, 2009; Herodotou & Babu, 2009; Martin, Powley, Zheng & Romanafu, 2005; Wiese, Rabinovitch, Reichert & Arenswald, 2008). The individual components that impact RDBMS performance include: (a) the RDBMS design and functionality (e.g. normalization, view and index selection), (b) the operating system (OS) functionality and its architecture (e.g. process scheduling, I/O buffering, paging and file system) and whether this is controlled by the OS or RDBMS, (c) the system hardware (e.g. amount and functionality of memory and storage and network components), (d) the programming languages and data structures and their impact on run time execution (e.g. searching, sorting, hash tables and binary trees), (e) the data domains and their application within organizational processes, (f) the Web accessible multi-tiered architecture or application stack that accesses the RDBMS, and,(g) the real and expected workload of the RDBMS throughout the organization’s extended supply and value chain (Garcia-Molina, Ullman & Widom, 2009; Herodotou & Babu, 2009; Kormilitsin, Chirkova, Fathi & Stallmann, 2008; Martin, Powley, Zheng & Romanafu, 2005; Raab, 2008a; Shasha, 1996; Wiese, Rabinovitch, Reichert & Arenswald, 2008).
Garcia-Molina, Ullman and Widom (2009) go into detail about tuning and optimizing SQL queries with respect to optimizing the storage and memory hierarchy however to illustrate the comprehensive interconnectedness of performance tuning consider the following additional concepts related to index tuning. To adequately support the convergence of transactional and analytical systems, many of today’s RDBMS systems provide options to create multiple index types and sort sequences allowing users and DBAs to tune the system to their requirements (Raab, 2008b). As a basis, index selection plays an important component in DBMS performance and represents a considerable portion of RDBMS continuing maintenance and total cost of ownership (TCO) (Kormilitsin, Chirkova Fathi & Stallmann, 2008; Shasha, 1996). As presented in previous coursework, dense indexes can consume considerable portions of memory and accessing them from storage incurs additional delays. In contrast, sparse clustering indexes implemented with B-trees have fewer leaves allowing the system to conserve memory and speed up a tree’s linked list traversal (Shasha, 1996). If a singular system is supporting both transactional and OLAP functionality, it is intuitive that the DBA must pay special attention the high number and granularity of indexes.
Tuning indexes for performance must also take into account today’s distributed transaction based environment. While transaction control provides a RDBMS with its necessary serializability in a concurrent environment its necessary locks play an important role in determining responsive data access. As a result, concurrent transactional control must also be optimized with respect to performance tuning and this requires the DBA to assess and reduce the number and duration of queries blocked and waiting on resource locks (Shasha, 1996). This endeavor cannot be treated in isolation as it is integrated with the very data structures used to store and process the indexes. As an example, consider a monotonically increasing sequential key implemented with a B-tree. In this example, the standard B-tree insertion based on a monotonically increasing value will place all of these new entries along the same rightmost linked list path (Shasha, 1996). System performance will then depend on whether the transactions emanate from a single source or from multiple sources contending for the same lock (Shasha, 1996). Additionally, system responsiveness will depend on whether the system employs record level locking (e.g. Oracle) or page level locking (e.g. Sybase System 10). It is intuitive that page level locking will render more of the database inaccessible than record level locking and as presented earlier, this scenario could be exacerbated further in a converged transactional OLAP RDBMS.
Performance Tuning Methodologies
We have established the need to improve RDBMS performance to meet present, expected and emergent loads and functionality and we have established performance tuning is an increasingly complex and integrated endeavor and requires knowledge of DBMS, OS, system architectures, programming and data structures, and organizational processes. This literature review has revealed three primary approaches to achieving acceptable performance tuning and includes: (a) experimentation and testing, (b) automation and, (c) collaboration. While these three approaches provide a DBA with a methodology framework for improvement it must be noted that according to Oracle, it almost impossible to predict the impact of any change on actual query performance in a large complex production system (Herodotou & Babu, 2009; Yagoub et al., 2008).
Experimentation and Testing
Using experimentation and testing to conduct RDBMS Performance tuning requires the DBA to plan and implement experiments and gather and assess statistics to improve system performance (Herodotou & Babu, 2009; Wiese, Rabinovitch, Reichert & Arenswald, 2008). This is not only labor intensive but it is also difficult and even risky in a real time system (Herodotou & Babu, 2009). As an example, results may vary widely if the system is tested under heavy and light loads and running the tests may even impact the very queries being measured.
On a positive note, experimentation may allow the DBA to establish upper and lower run time bounds and thereby choose the solution that provides the greatest reduction in cost (Kormilitsin, Chirkova Fathi & Stallmann, 2008). Experimentation allows the DBA to test for expected loads and unique circumstances in accord with the transitory nature of today’s technology and business environment. As an example, in order to optimize and make efficient use of memory, some of today’s DBMS make use of compression to fit as much data into memory as possible (Raab, 2008a). It is widely known that compression consumes computational resources therefore using experimentation the DBA may be able to statistically measure and assess the application of compression on the system. It must be noted that while experimentation and testing is a valid performance tuning tool, it is also foreseeable that system improvements could be short lived.
Automation
Automated performance tuning attempts to mitigate the overhead incurred by experimentation and testing by gathering and providing DBAs with relevant statistics and metrics. DBMS manufacturers have recognized the need provide DBAs with assistance in tuning their DBMS and now provide tools for assessing and automating changes to the RDBMS (Wiese, Rabinovitch, Reichert & Arenswald, 2008). Central to this functionality is the need to encapsulate tuning best practices so that they may be automatically applied (Wiese, Rabinovitch, Reichert & Arenswald, 2008). While this approach has merit it is intuitive that this expert system methodology is purely reactive as expert best practices must be discovered, modeled, encoded and distributed before an end user DBA can avail themselves to this functionality.
Collaboration
It has been established that organizations need to develop formalized database tuning best practices. Citing the complexity and diverse skill sets established above, it appears a community-based or open source development approach may be appropriate. This approach is increasingly valid citing the success of other open-source software development projects that include Eclipse and Mozilla Firefox (Wiese, Rabinovitch, Reichert & Arenswald, 2008). On open-source approach can cull the expertise of domain experts and effectively distribute the overwhelming workload. Particularly important, it is also understood that security and information assurance must be foremost in every DBA’s mind and an open-source community can cull the expertise of security experts and also assess performance tuning changes from a security standpoint.
Summary
RDBMS performance tuning is an overwhelming task and as a result is driving the need for increased assistance and automatic self tuning (Martin, Powley, Zheng & Romanafu, 2005). In an optimal scenario, the move towards autonomic or self tuning RDBMS algorithms may allow administrators to specify their expectations or goals allowing the system to dynamically reconfigure (Martin, Powley, Zheng & Romanafu, 2005). This goal oriented approach may allow the DBA to specify goals in terms of response times for classes of transactions (Martin, Powley, Zheng & Romanafu, 2005). This of course requires that the system first identify bottlenecks, constraints and solutions. To this extent it may be advisable engage the diverse open source community in this critical endeavor. To formalize this methodology it may be beneficial to draw from disciplines that operate in other complex and transitory environments such as health care (Wiese, Rabinovitch, Reichert & Arenswald, 2008). The health care industry serves as a nice model since it has normalized typical approaches for diagnosis and treatment in an effort to reduce costs and improve the quality of service (Wiese, Rabinovitch, Reichert & Arenswald, 2008).
References
Friedman, T. L. (2005). The world is flat: A brief history of the 21st century. New York: Farrar, Straus and Giroux.
Garcia-Molina, H., Ullman, J.D., & Widom, J. (2009). Database systems: The complete book, (2nd ed.). Upper Saddle River, NJ: Pearson Prentice Hall.
Herodotou, H., & Babu, S. (2009). Automated SQL tuning through trial and (sometimes) error. Retrieved July 31, 2010 from, http://portal.acm.org/ft_gateway.cfm?id=1594171&type=pdf&coll=GUIDE&dl=GUIDE&CFID=93632109&CFTOKEN=77805535.
Kormilitsin, M., Chirkova, R., Fathi, R., & Stallmann, M. (2008). View and index selection for Query performance improvement: Quality-centered algorithms and heuristics. Retrieved July 31, 2010 from, http://portal.acm.org/ft_gateway.cfm?id=1458261&type=pdf&coll=GUIDE&dl=GUIDE&CFID=93632019&CFTOKEN=55881423
Kotler, P. & Keller, K. L. (2007). Marketing management (12th ed.). Upper Saddle River, NJ: Pearson Publishing.
Martin, P., Powley, W., Zheng, M., & Romanafu, K. (2005). Experimental study of a self-tuning algorithm for DBMS buffer pools. Journal of Database Management, 16(2), 1-20.
Porter, M., (1985). Competitive advantage: Creating and sustaining superior performance. New York: The Free Press.
Raab, D. (2008a). Analytical database options. DM Review, 18 (7), 27.
Raab, D. (2008b). How to judge a columnar database, revisited. DM Review, 18 (10), 15.
Raab, D. (2009). An about face on the database: Markers may shift from analytical two transactional databases as focus on real-time interactions increases. Information Management, 19 (5), 38.
Robbins, S. P., & Judge, T. A. (2007). Organizational behavior. Upper Saddle River, NJ USA: Prentice Hall.
Russom, P. (2006). Liability and leverage. DM Review, 16(8), 43.
Satinger, J. W., Jackson, R. B., & Burd, S. D., (2002). Systems analysis and design (2nd ed.) Boston, MA: Course Technology
Shasha, D. (1996). Tuning databases for high performance. ACM Computing Surveys, 28(1). Retrieved July 31, 2010 from, http://portal.acm.org/ft_gateway.cfm?id=234363&type=pdf&coll=GUIDE&dl=GUIDE&CFID=93635027&CFTOKEN=77739318.
Wiese, D., Rabinovitch, G., Reichert, M., & Arenswald, S. (2008). Autonomic tuning expert – a framework for best-practice oriented autonomic database tuning. IBM Centre for Advanced Studies Conference: Proceedings of the 2008 conference of the center for advanced studies on collaborative research: meeting of minds. Retrieved July 31, 2010 from, http://portal.acm.org/ft_gateway.cfm?id=1463792&type=pdf&coll=GUIDE&dl=GUIDE&CFID=95570219&CFTOKEN=30288333.
Yagoub, K., Belknap, P., Dageville, B., Dias, K., Joshi, S., & Yu, B. (2008). Oracle’s SQL Performance analyzer. DEB, 31(1). Retrieved July 31, 2010 from, ftp://ftp.research.microsoft.com/pub/debull/A08mar/yagoub.pdf.