Table 1: FR
Aj
ratios for 1GB TPC-H data warehouse.
Attribute (A
j
) Table (T
k
)
FR
Aj
(ΣAIR
QiTkAj
)
C_MktSegment Customer 0,0004
C_Phone Customer 0,0000
L_ShipDate LineItem 0,0740
L_ShipMode LineItem 0,0438
L_ShipInstruct LineItem 0,0384
L_Quantity LineItem 0,0284
L_ReturnFlag LineItem 0,0142
L_ReceiptDate LineItem 0,0125
L_Discount LineItem 0,0106
O_OrderDate Orders 0,0738
O_OrderStatus Orders 0,0213
O_Comment Orders 0,0000
P_Brand Part 0,0020
P_Container Part 0,0017
P_Size Part 0,0014
P_Name Part 0,0009
P_Type Part 0,0008
S_Comment Supplier 0,0000
Table 2: Partitioning attributes cardinality and range
values.
Partitioning
Attribute (PA
j
)
Cardinality of
PA
j
in T
k
(#)
Range Values of
PA
j
in T
k
C_MktSegment 5
‘AUTOMOBILE’,
FURNITURE’, ’MACHINERY’,
’HOUSEHOLD’, ’BUILDING’
L_ShipDate 2526 02-01-1992…01-12-1998
O_OrderDate 2406 01-01-1992…02-08-1998
P_Brand 25 ‘Brand#11’…‘Brand#55’
S_Comment 9999 ‘Customer%’…‘water%’
Table 3: Partitioning schema for 1GB TPC-H using PIN.
Table (Tk) Partitions
Customer
Create partition by List on C_MktSegment (
Partition 1 with values 'BUILDING',
Partition 2 with values 'AUTOMOBILE',
Partition 3 with values 'FURNITURE',
Partition 4 with values 'MACHINERY',
Partition 5 with values 'HOUSEHOLD')
LineItem
Create partition by Range on L_ShipDate (
Partition 1 with values between [01-01-92; 31-01-92],
Partition 2 with values between [01-02-92; 28-02-92],
…
Partition 84 with values between [01-12-98; 31-12-98])
Orders
Create partition by Range on L_ShipDate (
Partition 1 with values between [01-01-92; 31-03-92],
Partition 2 with values between [01-04-92; 30-06-92],
…
Partition 27 with values between [01-07-98; 30-09-98])
Part
Create partition by List on P_Brand (
Partition 1 with values 'Brand#11',
Partition 2 with values 'Brand#12',
…
Partition 25 with values 'Brand#55')
Supplier
Create partition by List on S_Comment (
Partition 1 with values '%Customer%Complaints%',
Partition 2 with all other values)
4.2 Indexing the Data Warehouse
As stated previously, primary keys and referential
integrity constraints will be maintained, so we will
only refer to other indexing for our method. This
means that all attributes used as restriction
predicates in the OLAP queries will be indexed
according to the rule defined in section 3.3 of this
paper. The resulting indexing schema is presented in
Table 4.
Table 4: Indexing schema for TPC-H using PIN.
Attribute (A
j
) Table (T
k
) Index Type
C_MktSegment Customer Bitmap
C_Phone Customer B*Tree
L_ShipDate, L_ShipMode,
L_ShipInstruct, L_Quantity,
L_ReturnFlag, L_ReceiptDate,
L_Discount
LineItem Bitmap
O_OrderDate, O_OrderStatus Orders Bitmap
O_Comment Orders B*Tree
P_Brand, P_Container, P_Size, _Type Part Bitmap
P_Name Part B*Tree
S_Comment Supplier B*Tree
4.3 Results
To analyze the performance of our method, we have
conducted four series of experiments for each setup:
(1) implemented without optimization techniques,
just maintaining its standard primary keys and
integrity constraints; (2) optimized using standard
indexing techniques, such as join indexes; (3)
optimized only by standard partitioning techniques
using our fragmentation rules; and (4) with full
optimization using our method with partitioning and
indexing. Figures 1 and 2 present the results relating
the execution of the workload for size 1GB and 8GB
data warehouses, while Figure 3 presents the results
according to predefined scenarios on 8 simultaneous
users executing the workload.
For performance optimization of the fourth series
of experiments (our method), we executed many
performance optimization tests that included bitmap
join indexes, which usually improve OLAP query
performance (Bellatreche, 2004), and in this case it
did not improve. Contrarily, performance degraded.
We observed that, for our method, only the
restriction predicates should be indexed. This
enforces the validity for our rule defining the
indexing schema, only building indexes for all the
attributes that appear in the OLAP queries WHERE
clause after adequately partitioning the tables
according to our method.
Figures 1 to 3 and Tables 5 and 6 show that PIN
has best scalability features and is the most efficient
technique, outstanding the others in every execution,
both relating to DW size and number of
simultaneous users. Analyzing average gains from a
DB size point of view, the gain for standard
indexing is around 25%, while for standard
partitioning the results show an average gain of
48%. PIN has an average gain of 55%. In
simultaneous users execution, PIN’s advantage is
also evident, with an average gain of 56% against
49% for standard partitioning and 25% for standard
indexing. It also increases the gain compared with
other methods as the DB grows in size and/or
PIN: A PARTITIONING & INDEXING OPTIMIZATION METHOD FOR OLAP
175