All you need to know about partitioning (Part 1)

Part 2

Good evening/afternoon/morning dear habradi! Continue to develop and complement a blog about my favorite open source rdbms Postgresql. Miraculously, it so happened that the theme of today's topic still here never rose. I must say that partitioning in postgresql is very good described in the documentation, but does it stop me?).

Introduction


In General, under partitioning in the General case do not understand some technology, but rather an approach to database design, which appeared long before DBMSs have started to support the so-called partitioned tables. The idea is very simple — divide the table into several smaller parts. There are two subspecies — horizontal and vertical partitioning.

Horizontal partitioning

Part of the tables contain different rows. Put we have a log table of some abstract application LOGS. We can break it down into parts — one for logs in January 2009, the other in February, 2009, etc.

Vertical partitioning

Part of the tables contain different columns. The use of vertical partitioning (when it is really justified) is somewhat more complicated than for horizontal. As a spherical horse propose to consider this option: table NEWS has columns ID, SHORTTEXT, LONGTEXT, and let the LONGTEXT field uses much less of the first two. In this case, it makes sense to break the NEWS table by column (to create two tables for the SHORTTEXT and LONGTEXT, respectively, related to primary keys + to create a NEWS view that contains both columns). Thus, when we only need a description of the news, the DBMS does not have to read from disk, and the entire text of the news.

Support partitioning in modern DBMS

Most modern DBMS support table partitioning in one form or another.
the

    Oracle — supports partitioning starting from 8th version. Work with sections on the one hand very simple (all you can not to think about them, work with a regular table*), and with another — all very flexible. Sections can be divided into "subpartitions", delete, share, move. Supported different variants of the index of a partitioned table (global index, partitioned index). a Reference to the volumetric description.

    Microsoft SQL Server — support partition appeared recently (in 2005). First impression of use — "Well, finally!!:)", the second "Works, everything seems OK". msdn Documentation

    MySQL is supported since version 5.1. a Very good description on habré
    the

  • And more...

*that's a lie, of course, there is a standard set of challenges — time to create a new partition, the old throw, etc., but still somehow everything is simple and clear.

Partitioning in Postgresql


Table partitioning in postgresql differs slightly in implementation from the other databases. The basis for the partitioning is the inheritance tables (thing unique to postgresql). That is, we should have a main table (master table), and its sections will be table-heirs. We will consider partitioning the problem close to reality.

problem Statement

The database is used to collect and analyze data about the site visitors. The data volumes are large enough to think about partitioning. In the analysis in most cases data for the last day.
1. Create a basic table:
CREATE TABLE analytics.events
(
event_id BIGINT DEFAULT nextval('analytics.seq_events') PRIMARY KEY
user_id UUID NOT NULL
event_time TIMESTAMP DEFAULT now() NOT NULL
url VARCHAR(1024) NOT NULL
referrer VARCHAR(1024)
ip INET NOT NULL
);


* This source code was highlighted with Source Code Highlighter.

2. Partition will be in the days of the event_time field. Every day we create a new section. Call section will be according to the rule: analytics.events_DDMMYYYY. Here's an example section for the 1st of January 2010.
CREATE TABLE analytics.events_01012010
(
event_id BIGINT DEFAULT nextval('analytics.seq_events') PRIMARY KEY
CHECK ( event_time >= TIMESTAMP '2010-01-01 00:00:00' AND event_time < TIMESTAMP '2010-01-02 00:00:00' )
) INHERITS (analytics.events);


* This source code was highlighted with Source Code Highlighter.

When you create a section explicitly set the field event_id (PRIMARY KEY is not inherited) and create a CHECK CONSTRAINT on the field event_time, so as not to insert superfluous.

3. Create an index on the event_time field. If you split the tables into sections, we assume that most queries to the table events will use the condition on the field event_time, so that the index on this field would help us a lot.
CREATE INDEX events_01012010_event_time_idx ON analytics.events_01012010 USING btree(event_time);

* This source code was highlighted with Source Code Highlighter.

4. We want to ensure that when you insert into the main table, the data has been found in their intended sections. To do this, do the following trick — create a trigger that will control the data flow.
CREATE OR REPLACE FUNCTION analytics.events_insert_trigger()
RETURNS TRIGGER AS $$
BEGIN
IF (NEW.event_time >= TIMESTAMP '2010-01-01 00:00:00' AND
NEW.event_time < TIMESTAMP '2010-01-02 00:00:00') THEN
INSERT INTO analytics.events_01012010 VALUES (NEW.*);
ELSE
RAISE EXCEPTION 'Date % is out of range. Fix analytics.events_insert_trigger' NEW.event_time;
END IF;
RETURN NULL;
END;
$$
LANGUAGE plpgsql;


* This source code was highlighted with Source Code Highlighter.

CREATE TRIGGER events_before_insert
BEFORE INSERT ON analytics.events
FOR EACH ROW EXECUTE PROCEDURE analytics.events_insert_trigger();


* This source code was highlighted with Source Code Highlighter.


5. Everything is ready, we now have a partitioned table analytics.events. Can start fiercely to analyze its data. By the way, CHECK constraints, we were not only created in order to protect the section from incorrect data. Postgresql can use them when compiling the query plan (though living index on event_time win it will give the minimum), you can use the Directive constraint_exclusion
SET constraint_exclusion = on;
SELECT * FROM analytics.events WHERE event_time > CURRENT_DATE;


* This source code was highlighted with Source Code Highlighter.


End of part

1. A table of events, broken into sections, the analysis of available data in recent days has become easier and faster.
2. The horror of knowing that this was necessary as-that to support, create time partition, not forgetting to change the trigger accordingly.

How easy and carefree to work with partitioned tables will tell in the second part.

UPD1: Replaced States for dividing the partitioning
UPD2:
In explanation of remarks from readers, which, unfortunately, account on habré:
With inheritance there are several important issues to consider when designing. Partitions do not inherit primary key and foreign keys in their columns. That is, when you create a partition, you must explicitly create a PRIMARY KEY and FOREIGN KEYs on the columns section. From myself I will notice that to create the FOREIGN KEY columns of a partitioned table is not the best way. In most cases, the partitioned table is the "fact table" and she refers to "dimension" tables.
Article based on information from habrahabr.ru

Комментарии

Популярные сообщения из этого блога

March Habrameeting in Kiev

PostgreSQL load testing using JMeter, Yandex.Tank and Overload

Monitoring PostgreSQL with Zabbix