一、簡介
在數據庫日漸龐大的今天,為了方便對數據庫數據的管理,比如按時間,按地區去統計一些數據時,基數過于龐大,多有不便。很多商業數據庫都提供分區的概念,按不同的維度去存放數據,便于后期的管理,PostgreSQL也不例外。
PostgresSQL分區的意思是把邏輯上的一個大表分割成物理上的幾塊兒。分區不僅能帶來訪問速度的提升,關鍵的是,它能帶來管理和維護上的方便。
分區的具體好處是:
- 某些類型的查詢性能可以得到極大提升。
- 更新的性能也可以得到提升,因為表的每塊的索引要比在整個數據集上的索引要小。如果索引不能全部放在內存里,那么在索引上的讀和寫都會產生更多的磁盤訪問。
- 批量刪除可以用簡單的刪除某個分區來實現。
- 可以將很少用的數據移動到便宜的、轉速慢的存儲介質上。
在PG里表分區是通過表繼承來實現的,一般都是建立一個主表,里面是空,然后每個分區都去繼承它。無論何時,都應保證主表里面是空的。
小表分區不實際,表在多大情況下才考慮分區呢?PostgresSQL官方給出的建議是:當表本身大小超過了機器物理內存的實際大小時(the size of the table should exceed the physical memory of the database server),可以考慮分區。
PG目前(9.2.2)僅支持范圍分區和列表分區,尚未支持散列分區。
二、環境
系統環境:CentOS release 6.3 (Final)
PostgreSQL版本:PostgreSQL 9.2.2 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.6 20120305 (Red Hat 4.4.6-4), 64-bit
三、實現分區
3.1 創建主表
david
=
#
create
table
tbl_partition (
david(# id
integer
,
david(# name
varchar
(
20
),
david(# gender boolean,
david(# join_date date,
david(# dept
char
(
4
));
CREATE
TABLE
david
=
#
3.2 創建分區表
david
=
#
create
table
tbl_partition_201211 (
check
( join_date
>=
DATE
'
2012-11-01
'
AND
join_date
<
DATE
'
2012-12-01
'
)
) INHERITS (tbl_partition);
CREATE
TABLE
david
=
#
create
table
tbl_partition_201212 (
check
( join_date
>=
DATE
'
2012-12-01
'
AND
join_date
<
DATE
'
2013-01-01
'
)
) INHERITS (tbl_partition);
CREATE
TABLE
david
=
#
create
table
tbl_partition_201301 (
check
( join_date
>=
DATE
'
2013-01-01
'
AND
join_date
<
DATE
'
2013-02-01
'
)
) INHERITS (tbl_partition);
CREATE
TABLE
david
=
#
create
table
tbl_partition_201302 (
check
( join_date
>=
DATE
'
2013-02-01
'
AND
join_date
<
DATE
'
2013-03-01
'
)
) INHERITS (tbl_partition);
CREATE
TABLE
david
=
#
create
table
tbl_partition_201303 (
check
( join_date
>=
DATE
'
2013-03-01
'
AND
join_date
<
DATE
'
2013-04-01
'
)
) INHERITS (tbl_partition);
CREATE
TABLE
david
=
#
create
table
tbl_partition_201304 (
check
( join_date
>=
DATE
'
2013-04-01
'
AND
join_date
<
DATE
'
2013-05-01
'
)
) INHERITS (tbl_partition);
CREATE
TABLE
david
=
#
create
table
tbl_partition_201305 (
check
( join_date
>=
DATE
'
2013-05-01
'
AND
join_date
<
DATE
'
2013-06-01
'
)
) INHERITS (tbl_partition);
CREATE
TABLE
david
=
#
3.3 分區鍵上建索引
david
=
#
create
index
tbl_partition_201211_joindate
on
tbl_partition_201211 (join_date);
CREATE
INDEX
david
=
#
create
index
tbl_partition_201212_joindate
on
tbl_partition_201212 (join_date);
CREATE
INDEX
david
=
#
create
index
tbl_partition_201301_joindate
on
tbl_partition_201301 (join_date);
CREATE
INDEX
david
=
#
create
index
tbl_partition_201302_joindate
on
tbl_partition_201302 (join_date);
CREATE
INDEX
david
=
#
create
index
tbl_partition_201303_joindate
on
tbl_partition_201303 (join_date);
CREATE
INDEX
david
=
#
create
index
tbl_partition_201304_joindate
on
tbl_partition_201304 (join_date);
CREATE
INDEX
david
=
#
create
index
tbl_partition_201305_joindate
on
tbl_partition_201305 (join_date);
CREATE
INDEX
david
=
#
對于開發人員來說,希望數據庫是透明的,只管 insert into tbl_partition。對于數據插向哪個分區,則希望由DB決定。這點,ORACLE實現了,但是PG不行,需要前期人工處理下。
3.4 創建觸發器函數
david
=
#
CREATE
OR
REPLACE
FUNCTION
tbl_partition_insert_trigger()
RETURNS
TRIGGER
AS
$$
BEGIN
IF
( NEW.join_date
>=
DATE
'
2012-11-01
'
AND
NEW.join_date
<
DATE
'
2012-12-01
'
)
THEN
INSERT
INTO
tbl_partition_201211
VALUES
(NEW.
*
);
ELSIF ( NEW.join_date
>=
DATE
'
2012-12-01
'
AND
NEW.join_date
<
DATE
'
2013-01-01
'
)
THEN
INSERT
INTO
tbl_partition_201212
VALUES
(NEW.
*
);
ELSIF ( NEW.join_date
>=
DATE
'
2013-01-01
'
AND
NEW.join_date
<
DATE
'
2013-02-01
'
)
THEN
INSERT
INTO
tbl_partition_201301
VALUES
(NEW.
*
);
ELSIF ( NEW.join_date
>=
DATE
'
2013-02-01
'
AND
NEW.join_date
<
DATE
'
2013-03-01
'
)
THEN
INSERT
INTO
tbl_partition_201302
VALUES
(NEW.
*
);
ELSIF ( NEW.join_date
>=
DATE
'
2013-03-01
'
AND
NEW.join_date
<
DATE
'
2013-04-01
'
)
THEN
INSERT
INTO
tbl_partition_201303
VALUES
(NEW.
*
);
ELSIF ( NEW.join_date
>=
DATE
'
2013-04-01
'
AND
NEW.join_date
<
DATE
'
2013-05-01
'
)
THEN
INSERT
INTO
tbl_partition_201304
VALUES
(NEW.
*
);
ELSIF ( NEW.join_date
>=
DATE
'
2013-05-01
'
AND
NEW.join_date
<
DATE
'
2013-06-01
'
)
THEN
INSERT
INTO
tbl_partition_201305
VALUES
(NEW.
*
);
ELSE
RAISE EXCEPTION
'
Date out of range. Fix the tbl_partition_insert_trigger() function!
'
;
END
IF
;
RETURN
NULL
;
END
;
$$
LANGUAGE plpgsql;
CREATE
FUNCTION
david
=
#
說明: 如果不想丟失數據,上面的ELSE 條件可以改成?INSERT INTO tbl_partition_error_join_date VALUES (NEW.*); 同時需要創建一張結構和tbl_partition 一樣的表tbl_partition_error_join_date,這樣,錯誤的join_date 數據就可以插入到這張表中而不是報錯了。
3.5 創建觸發器
david
=
#
CREATE
TRIGGER
insert_tbl_partition_trigger
david
-
# BEFORE
INSERT
ON
tbl_partition
david
-
#
FOR
EACH ROW
EXECUTE
PROCEDURE
tbl_partition_insert_trigger();
CREATE
TRIGGER
david
=
#
四、查看表
4.1 查看所有表
david
=
# \dt
List
of
relations
Schema
|
Name
|
Type
|
Owner
--
------+----------------------+-------+----------
public
|
tbl_partition
|
table
|
postgres
public
|
tbl_partition_201211
|
table
|
postgres
public
|
tbl_partition_201212
|
table
|
postgres
public
|
tbl_partition_201301
|
table
|
postgres
public
|
tbl_partition_201302
|
table
|
postgres
public
|
tbl_partition_201303
|
table
|
postgres
public
|
tbl_partition_201304
|
table
|
postgres
public
|
tbl_partition_201305
|
table
|
postgres
(
8
rows)
david
=
#
4.2 查看主表
david
=
# \d tbl_partition
Table
"
public
.tbl_partition"
Column
|
Type
|
Modifiers
--
---------+-----------------------+-----------
id
|
integer
|
name
|
character
varying
(
20
)
|
gender
|
boolean
|
join_date
|
date
|
dept
|
character
(
4
)
|
Triggers:
insert_tbl_partition_trigger BEFORE
INSERT
ON
tbl_partition
FOR
EACH ROW
EXECUTE
PROCEDURE
tbl_partition_insert_trigger()
Number
of
child tables:
7
(
Use
\d
+
to
list them.)
david
=
#
4.3 查看分區表
david
=
# \d tbl_partition_201304
Table
"
public
.tbl_partition_201304"
Column
|
Type
|
Modifiers
--
---------+-----------------------+-----------
id
|
integer
|
name
|
character
varying
(
20
)
|
gender
|
boolean
|
join_date
|
date
|
dept
|
character
(
4
)
|
Indexes:
"tbl_partition_201304_joindate" btree (join_date)
Check
constraints:
"tbl_partition_201304_join_date_check"
CHECK
(join_date
>=
'
2013-04-01
'
::date
AND
join_date
<
'
2013-05-01
'
::date)
Inherits: tbl_partition
david
=
#
五、測試
5.1 插入數據
david
=
#
insert
into
tbl_partition
values
(
1
,
'
David
'
,
'
1
'
,
'
2013-01-10
'
,
'
TS
'
);
INSERT
0
0
david
=
#
insert
into
tbl_partition
values
(
2
,
'
Sandy
'
,
'
0
'
,
'
2013-02-10
'
,
'
TS
'
);
INSERT
0
0
david
=
#
insert
into
tbl_partition
values
(
3
,
'
Eagle
'
,
'
1
'
,
'
2012-11-01
'
,
'
TS
'
);
INSERT
0
0
david
=
#
insert
into
tbl_partition
values
(
4
,
'
Miles
'
,
'
1
'
,
'
2012-12-15
'
,
'
SD
'
);
INSERT
0
0
david
=
#
insert
into
tbl_partition
values
(
5
,
'
Simon
'
,
'
1
'
,
'
2012-12-10
'
,
'
SD
'
);
INSERT
0
0
david
=
#
insert
into
tbl_partition
values
(
6
,
'
Rock
'
,
'
1
'
,
'
2012-11-10
'
,
'
SD
'
);
INSERT
0
0
david
=
#
insert
into
tbl_partition
values
(
7
,
'
Peter
'
,
'
1
'
,
'
2013-01-11
'
,
'
SD
'
);
INSERT
0
0
david
=
#
insert
into
tbl_partition
values
(
8
,
'
Sally
'
,
'
0
'
,
'
2013-03-10
'
,
'
BCSC
'
);
INSERT
0
0
david
=
#
insert
into
tbl_partition
values
(
9
,
'
Carrie
'
,
'
0
'
,
'
2013-04-02
'
,
'
BCSC
'
);
INSERT
0
0
david
=
#
insert
into
tbl_partition
values
(
10
,
'
Lee
'
,
'
1
'
,
'
2013-01-05
'
,
'
BMC
'
);
INSERT
0
0
david
=
#
insert
into
tbl_partition
values
(
11
,
'
Nicole
'
,
'
0
'
,
'
2012-11-10
'
,
'
PROJ
'
);
INSERT
0
0
david
=
#
insert
into
tbl_partition
values
(
12
,
'
Renee
'
,
'
0
'
,
'
2013-01-10
'
,
'
TS
'
);
INSERT
0
0
david
=
#
5.2 查看主表數據
david
=
#
select
*
from
tbl_partition;
id
|
name
|
gender
|
join_date
|
dept
--
--+--------+--------+------------+------
3
|
Eagle
|
t
|
2012
-
11
-
01
|
TS
6
|
Rock
|
t
|
2012
-
11
-
10
|
SD
11
|
Nicole
|
f
|
2012
-
11
-
10
|
PROJ
4
|
Miles
|
t
|
2012
-
12
-
15
|
SD
5
|
Simon
|
t
|
2012
-
12
-
10
|
SD
1
|
David
|
t
|
2013
-
01
-
10
|
TS
7
|
Peter
|
t
|
2013
-
01
-
11
|
SD
10
|
Lee
|
t
|
2013
-
01
-
05
|
BMC
12
|
Renee
|
f
|
2013
-
01
-
10
|
TS
2
|
Sandy
|
f
|
2013
-
02
-
10
|
TS
8
|
Sally
|
f
|
2013
-
03
-
10
|
BCSC
9
|
Carrie
|
f
|
2013
-
04
-
02
|
BCSC
(
12
rows)
david
=
#
5.3 查看分區表數據
david
=
#
select
*
from
tbl_partition_201301 ;
id
|
name
|
gender
|
join_date
|
dept
--
--+-------+--------+------------+------
1
|
David
|
t
|
2013
-
01
-
10
|
TS
7
|
Peter
|
t
|
2013
-
01
-
11
|
SD
10
|
Lee
|
t
|
2013
-
01
-
05
|
BMC
12
|
Renee
|
f
|
2013
-
01
-
10
|
TS
(
4
rows)
david
=
#
六、管理分區
6.1 移除數據/分區
實現分區表之后,我們就可以很容易地移除不再使用的舊數據了,最簡單的方法就是:
david
=
#
drop
table
tbl_partition_201304;
這樣可以快速移除大量數據,而不是逐條刪除數據。
另一個推薦做法是將分區從分區表中移除,但是保留訪問權限。
david
=
#
alter
table
tbl_partition_201304 no inherit tbl_partition;
ALTER
TABLE
david
=
#
和直接DROP 相比,該方式僅僅是使子表脫離了原有的主表,而存儲在子表中的數據仍然可以得到訪問,因為此時該表已經被還原成一個普通的數據表了。這樣對于數據庫的DBA來說,就可以在此時對該表進行必要的維護操作,如數據清理、歸檔等,在完成諸多例行性的操作之后,就可以考慮是直接刪除該表(DROP TABLE),還是先清空該表的數據(TRUNCATE TABLE),之后再讓該表重新繼承主表。
david
=
#
alter
table
tbl_partition_201304 inherit tbl_partition;
ALTER
TABLE
david
=
#
6.2 增加分區
我們可以像之前那樣增加一個分區
david
=
#
create
table
tbl_partition_201306 (
check
( join_date
>=
DATE
'
2013-06-01
'
AND
join_date
<
DATE
'
2013-07-01
'
)
) INHERITS (tbl_partition);
david
=
#
create
index
tbl_partition_201306_joindate
on
tbl_partition_201306 (join_date);
同時,需要修改觸發器函數,將插入條件改成相應的值。
說明: 創建觸發器函數時,最好把插入條件寫更未來一點,比如多寫十年,這樣以后增加新分區時就不需要重新創建觸發器函數了,也可以避免一些不必要的錯誤。
另外,還可以如下增加新的分區:
david
=
#
create
table
tbl_partition_201307
david
-
# (
LIKE
tbl_partition INCLUDING DEFAULTS INCLUDING CONSTRAINTS);
CREATE
TABLE
david
=
#
david
=
#
alter
table
tbl_partition_201307
add
constraint
tbl_partition_201307_join_date_check
david
-
#
check
( join_date
>=
DATE
'
2013-07-01
'
AND
join_date
<
DATE
'
2013-08-01
'
);
ALTER
TABLE
david
=
#
david
=
#
create
index
tbl_partition_201307_joindate
on
tbl_partition_201307 (join_date);
david=# copy tbl_partition_201307 from '/tmp/tbl_partition_201307.sql'; //從文件中拷貝數據,這些數據可以是事前準備的
david
=
#
alter
table
tbl_partition_201307 inherit tbl_partition;
七、約束排除
約束排除(Constraint exclusion)是一種查詢優化技巧,它改進了用上面方法定義的表分區的性能。
確保postgresql.conf 里的配置參數constraint_exclusion 是打開的。沒有這個參數,查詢不會按照需要進行優化。這里我們需要做的是確保該選項在配置文件中沒有被注釋掉。
如果沒有約束排除,查詢會掃描tbl_partition 表中的每一個分區。打開了約束排除之后,規劃器將檢查每個分區的約束然后再試圖證明該分區不需要被掃描,因為它不能包含任何符合WHERE子句條件的數據行。如果規劃器可以證明這個,它就把該分區從查詢規劃里排除出去。
可以使用EXPLAIN 命令顯示一個規劃在constraint_exclusion 關閉和打開情況下的不同:
7.1 約束排除關閉
david
=
#
set
constraint_exclusion
=
off
;
SET
david
=
# explain
select
count
(
*
)
from
tbl_partition
where
join_date
>=
DATE
'
2013-04-01
'
;
QUERY
PLAN
--
-----------------------------------------------------------------------------------------------
Aggregate (cost
=
172.80
..
172.81
rows
=
1
width
=
0
)
->
Append (cost
=
0.00
..
167.62
rows
=
2071
width
=
0
)
->
Seq Scan
on
tbl_partition (cost
=
0.00
..
0.00
rows
=
1
width
=
0
)
Filter: (join_date
>=
'
2013-04-01
'
::date)
->
Seq Scan
on
tbl_partition_201211 tbl_partition (cost
=
0.00
..
18.62
rows
=
230
width
=
0
)
Filter: (join_date
>=
'
2013-04-01
'
::date)
->
Seq Scan
on
tbl_partition_201212 tbl_partition (cost
=
0.00
..
18.62
rows
=
230
width
=
0
)
Filter: (join_date
>=
'
2013-04-01
'
::date)
->
Seq Scan
on
tbl_partition_201301 tbl_partition (cost
=
0.00
..
18.62
rows
=
230
width
=
0
)
Filter: (join_date
>=
'
2013-04-01
'
::date)
->
Seq Scan
on
tbl_partition_201302 tbl_partition (cost
=
0.00
..
18.62
rows
=
230
width
=
0
)
Filter: (join_date
>=
'
2013-04-01
'
::date)
->
Seq Scan
on
tbl_partition_201303 tbl_partition (cost
=
0.00
..
18.62
rows
=
230
width
=
0
)
Filter: (join_date
>=
'
2013-04-01
'
::date)
->
Seq Scan
on
tbl_partition_201305 tbl_partition (cost
=
0.00
..
18.62
rows
=
230
width
=
0
)
Filter: (join_date
>=
'
2013-04-01
'
::date)
->
Seq Scan
on
tbl_partition_201304 tbl_partition (cost
=
0.00
..
18.62
rows
=
230
width
=
0
)
Filter: (join_date
>=
'
2013-04-01
'
::date)
->
Seq Scan
on
tbl_partition_201306 tbl_partition (cost
=
0.00
..
18.62
rows
=
230
width
=
0
)
Filter: (join_date
>=
'
2013-04-01
'
::date)
->
Seq Scan
on
tbl_partition_201307 tbl_partition (cost
=
0.00
..
18.62
rows
=
230
width
=
0
)
Filter: (join_date
>=
'
2013-04-01
'
::date)
(
22
rows)
david
=
#
從上面的查詢計劃中可以看出,PostgreSQL 掃描了所有分區。下面我們再看一下打開約束排除之后的查詢計劃: ?
7.2 約束排除開啟
david
=
#
set
constraint_exclusion
=
on
;
SET
david
=
# explain
select
count
(
*
)
from
tbl_partition
where
join_date
>=
DATE
'
2013-04-01
'
;
QUERY
PLAN
--
-----------------------------------------------------------------------------------------------
Aggregate (cost
=
76.80
..
76.81
rows
=
1
width
=
0
)
->
Append (cost
=
0.00
..
74.50
rows
=
921
width
=
0
)
->
Seq Scan
on
tbl_partition (cost
=
0.00
..
0.00
rows
=
1
width
=
0
)
Filter: (join_date
>=
'
2013-04-01
'
::date)
->
Seq Scan
on
tbl_partition_201305 tbl_partition (cost
=
0.00
..
18.62
rows
=
230
width
=
0
)
Filter: (join_date
>=
'
2013-04-01
'
::date)
->
Seq Scan
on
tbl_partition_201304 tbl_partition (cost
=
0.00
..
18.62
rows
=
230
width
=
0
)
Filter: (join_date
>=
'
2013-04-01
'
::date)
->
Seq Scan
on
tbl_partition_201306 tbl_partition (cost
=
0.00
..
18.62
rows
=
230
width
=
0
)
Filter: (join_date
>=
'
2013-04-01
'
::date)
->
Seq Scan
on
tbl_partition_201307 tbl_partition (cost
=
0.00
..
18.62
rows
=
230
width
=
0
)
Filter: (join_date
>=
'
2013-04-01
'
::date)
(
12
rows)
david
=
#
可以看到,PostgreSQL 只掃描四月份以后的分區表。
八、可選的分區方式
還可以通過建立規則的方式進行分區。
CREATE
RULE
insert_tbl_partition_201211
AS
ON
INSERT
TO
tbl_partition
WHERE
( join_date
>=
DATE
'
2012-11-01
'
AND
join_date
<
DATE
'
2012-12-01
'
)
DO INSTEAD
INSERT
INTO
tbl_partition_201211
VALUES
(NEW.
*
);
CREATE
RULE
insert_tbl_partition_201212
AS
ON
INSERT
TO
tbl_partition
WHERE
( join_date
>=
DATE
'
2012-12-01
'
AND
join_date
<
DATE
'
2013-01-01
'
)
DO INSTEAD
INSERT
INTO
tbl_partition_201212
VALUES
(NEW.
*
);
...
CREATE
RULE
insert_tbl_partition_201306
AS
ON
INSERT
TO
tbl_partition
WHERE
( join_date
>=
DATE
'
2013-06-01
'
AND
join_date
<
DATE
'
2013-07-01
'
)
DO INSTEAD
INSERT
INTO
tbl_partition_201306
VALUES
(NEW.
*
);
CREATE
RULE
insert_tbl_partition_201307
AS
ON
INSERT
TO
tbl_partition
WHERE
( join_date
>=
DATE
'
2013-07-01
'
AND
join_date
<
DATE
'
2013-08-01
'
)
DO INSTEAD
INSERT
INTO
tbl_partition_201307
VALUES
(NEW.
*
);
CREATE
RULE
insert_tbl_partition_error_join_date
AS
ON
INSERT
TO
tbl_partition
WHERE
( join_date
>=
DATE
'
2013-08-01
'
OR
join_date
<
DATE
'
2012-11-01
'
)
DO INSTEAD
INSERT
INTO
tbl_partition_error_join_date
VALUES
(NEW.
*
);
九、注意事項
VACUUM?或 ANALYZE tbl_partition?只會對主表起作用,要想分析表,需要分別分析每個分區表。
十、參考資料
- PostgreSQL官方說明: http://www.postgresql.org/docs/9.2/static/ddl-partitioning.html
- ITEYE:http://diegoball.iteye.com/blog/713826
- kenyon(君羊):http://my.oschina.net/Kenyon/blog/59455
更多文章、技術交流、商務合作、聯系博主
微信掃碼或搜索:z360901061
微信掃一掃加我為好友
QQ號聯系: 360901061
您的支持是博主寫作最大的動力,如果您喜歡我的文章,感覺我的文章對您有幫助,請用微信掃描下面二維碼支持博主2元、5元、10元、20元等您想捐的金額吧,狠狠點擊下面給點支持吧,站長非常感激您!手機微信長按不能支付解決辦法:請將微信支付二維碼保存到相冊,切換到微信,然后點擊微信右上角掃一掃功能,選擇支付二維碼完成支付。
【本文對您有幫助就好】元

