Sigmod 2013
jninanyac14 de Junio de 2014
8.730 Palabras (35 Páginas)169 Visitas
Performance and Resource Modeling in Highly-Concurrent
OLTP Workloads
Barzan Mozafari∗ Carlo Curino† Alekh Jindal∗ Samuel Madden∗
∗CSAIL MIT †Microsoft
{barzan,alekh,madden}@csail.mit.edu ccurino@microsoft.com
ABSTRACT
Database administrators of Online Transaction Processing (OLTP)
systems constantly face difficult questions. For example, “What is
the maximum throughput I can sustain with my current hardware?”,
“How much disk I/O will my system perform if the requests per second
double?”, or “What will happen if the ratio of transactions in my
system changes?”. Resource prediction and performance analysis are
both vital and difficult in this setting. Here the challenge is due to
high degrees of concurrency, competition for resources, and complex
interactions between transactions, all of which non-linearly impact
performance.
Although difficult, such analysis is a key component in enabling
database administrators to understand which queries are eating up
the resources, and how their system would scale under load. In
this paper, we introduce our framework, called DBSeer, that addresses
this problem by employing statistical models that provide
resource and performance analysis and prediction for highly concurrent
OLTP workloads. Our models are built on a small amount
of training data from standard log information collected during
normal system operation. These models are capable of accurately
measuring several performance metrics, including resource consumption
on a per-transaction-type basis, resource bottlenecks, and
throughput at different load levels. We have validated these models
on MySQL/Linux with numerous experiments on standard benchmarks
(TPC-C) and real workloads (Wikipedia), observing high accuracy
(within a few percent error) when predicting all of the above
metrics.
Categories and Subject Descriptors
H.2.4 [Systems]: Relational databases
Keywords
OLTP, Performance Predictions, Multi-tenancy
1. INTRODUCTION
Operating a large database management system (DBMS) or
a multi-tenant “database-as-a-service” [16] is a challenging and
stressful task for database administrators (DBA), especially as the
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
SIGMOD’13, June 22–27, 2013, New York, New York, USA.
Copyright 2013 ACM 978-1-4503-2037-5/13/06 ...$15.00.
DBMS starts experiencing heavy concurrent load. Although some
databases provide tools for measuring the run-time of an individual
query, many performance problems are a result of interactions
between concurrent queries, which existing systems are not capable
of modeling. Many transactions that run fine in isolation become
much slower when run together, as they interact in complex
ways and contend for shared resources. Load that is added over
time may cause resources that were previously abundant to become
constrained, and query performance to plummet. Applications may
generate unpredictable, time-varying load that puts strain on different
resources (e.g., RAM, disk, or CPU) at different times. To handle
all these scenarios, we need a way to attribute system load, on a perresource
basis to different queries, transactions, or applications. This
attribution enables a number of useful applications, including:
● Diagnosis / Performance Inspection: Why is a given query
running slow (in the presence of concurrency)? Which transaction
groups are causing the spike of lock wait times in the DBMS?
● Run-time Performance Isolation: In a database supporting
multiple applications, which application/transaction is using more
than its allocated share of resources? At what rate should transactions
of a given application be admitted/ dropped in order to avoid
any SLA (service-level agreement) violations?
● Billing: What is the actual contribution of each workload to
the overall resource consumption?
A second challenge for DBAs is to understand how database resource
consumption and performance vary as load on the system
changes, e.g., when partitioning the database, or when a sudden increase
in popularity of a website imposes unexpected load on the
back-end database To address the second challenge, we need what-if
analysis tools to allowDBAs to answer two other classes of questions:
● Performance Prediction: What will the performance (e.g. latency
or throughput) of a given query or application be if the rate of
‘new order’ transactions doubles?
● Provisioning: Which resource (e.g., disk, CPU, RAM)will bottleneck
first if the load on the system increases? What is the hardware
required to deliver a desired throughput?
To address these challenges, we introduce our approach, called
DBSeer, for statistical performance modeling and prediction. Our
techniques involve collecting a limited set of low-overhead statistics
fromthe DBMS and the operating system,measured during normal
system operation.We then use these statistics to build offlinemodels
for the given DBMS and the workload(s) running on it.Thesemodels
allow us to perform attribution and what-if analysis by assessing
the resource (e.g., CPU, disk, RAM, cache, DB locks) requirements
of individual queries or applications and estimating how those requirements
change as the database grows in size, as queries in the
workloads change, or as allocated resources vary.
In DBSeer, we develop two classes of models and compare their
performance. First, we develop black-box models, that make minimal
assumptions about the nature of the underlying system, and
train statistical regression models to predict future performance
based on past performance statistics. Second, we develop white-box
models that take the major components of the underlying database
system into account, to enable more accurate predictions. Specifically,
we develop white-box models for disk I/O, lock contention,
and memory utilization of MySQL. Although these models are focused
onMySQL, we believe that even solving performance prediction
in the context of this one system represents an important step
forward, as MySQL alone is used by millions of users. Moreover, in
Section 8.5, we report preliminary (but promising) results in applying
our models to another DBMS (PostgreSQL).
As we show in our experiments, the trade-off between these two
classes of models is that black-box models are more general but are
also less effective in making predictions outside of the range of inputs
on which they were trained. Unfortunately, many interesting
questions (including many what-if scenarios) require such “out of
range” predictions, e.g., predicting performance when dramatic and
heretofore unseen changes happen in the workload. This is why developing
white-box models is also necessary: they are less general
than black-boxmodels (as they make assumptions about the nature
of the database) but they provide higher extrapolation power.
Our approach in DBSeer1 is specifically designed for highly concurrent
OLTP (transaction processing) applications . These applications
run lightweight transactions that read or write a few records at
a time.We focus on this class of problems becauseOLTP settings are
most frustrating for DBAs, due to their high levels of concurrency
and the complex interactions between transactions (i.e.,competition
for different resources such as locks, cache, I/O). Such competitions
can lead to non-linear effects, where a small change in load can trigger
a large change in performance. Though some prior work has
addressed performance prediction in the case of OLAP (a.k.a. analytical)
databases [12, 3, 8], this problem has not been well studied in
OLTP.Thus, a key contribution of our work is to develop non-linear
models that capture highly concurrent locking and logging operations
of OLTP workloads.
In summary, we make several contributions towards modeling
transactional workload, including:
● Resource Models: we have developed white and black-box
models for predicting different resources, includingCPU,RAM, network,
disk I/O, and lock contention. Our primary contribution here
is a set of novel white-box models for predicting disk I/O and lock
contention.
● Extracting transaction types: we have developed highly accurate
clustering techniques to automatically extract and summarize
“classes of similar transactions” froma query log that allow us to accurately
group similar transactions. We show that this
...