should be made to determine if the needed statistics are available. Also in environments where long
running queries are run only one time, it may be beneficial to ensure that statistics are available prior to
running the queries.
Some properties of database column statistics are as follows:
y
Column statistics occupy little storage, on average 8-12k per column.
y
Column Statistics are gathered through one full scan of the database file for any given number of
columns in the database file.
y
Column statistics are maintained periodically through means of statistics refreshing mechanisms
that require a full scan of the database file.
y
Column statistics are packed in one concise data structure that requires few I/Os to page it into
main memory during query optimization.
As stated above, statistics may have a direct effect on the quality of the access plan chosen by the query
optimizer and thereby influence the end user query performance. Shown below is an illustrative example
that underscores the effect of statistics on access plan selection process.
Statistic Usage Example:
Select * from T1, T2 where T1.A=T2.A and T1.B = ’VALUE1’ and T2.C = ‘VALUE2’
Database characteristics: indexes on T1.A and T2.A exist, NO column statistics, T1 has 100 million rows,
T2 has 10 million rows. T1 is 1 GB and T2 0.1 GB
Since statistics are not available, the optimizer has to consider default estimates for selectivity of T1.B =
’VALUE1’ ==> 10% T2.C = ‘VALUE2’ ==> 10%
The actual estimates are T1.B = ’VALUE1’ ===>10% and T2.C = ‘VALUE2’ ===>0.00001%
Based on selectivity estimates the optimizer will select the following access plan
Scan(T1) - Probe (T2.A index) - > Probe (T2 Table) ---
the real cost for the above access plan would be approximately 8192 I/Os + 3600 I/Os ~
11792 I/Os
If column statistics existed on T2.C the selectivity estimate for T2.C = ‘VALUE2’ would be 10 rows or
0.00001%
And the query optimizer would select the following plan instead
Scan(T2) - Probe (T1.A index) - > Probe (T1 Table)
Accordingly the real cost could be calculated as follows:
819 I/Os + 10 I/Os ~
830 I/Os. The result of having statistics on T2.C led to an access plan that is
faster by order of magnitude from a case where no statistics exist
.
For more information on database statistics collection see the
DB2 for i5/OS Database Performance and
Query Optimization
manual.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
©
Copyright IBM Corp. 2008
Chapter 4 - DB2 Performance
51