The performance of applications based on PostgreSQL: explicit and implicit delays
If you are trying to optimize the performance of Your PostgreSQL based application, You probably use basic tools: EXPLAIN (BUFFERS, ANALYZE), pg_stat_statements, auto_explain, log_statement_min_duration, etc.
Maybe You are looking in the direction of lock conflicts with log_lock_waits follow the behavior of your control points, etc.
But zadumyvalos whether You about network latency? The players know about it, but if this has any relevance to Your application server?

the
Delay typical client/server ring networks can vary from 0.01 MS (local) to ~0.5 MS in the switched network, Wi-Fi, 5 MS, 20 MS with ADSL, 300 MS at Intercontinental routing, and even more for things like satellites and WWAN links.
Trivial SELECT may be of the order of 0.1 MS to execute on the server side. Trivial INSERT 0.5 msec.
Every time Your application executes a query, it is necessary to wait for a response from the server success/fail and may result, metadata query, etc. This entails, at least, the delay of one pass through the network round-trip.
When the work is predominantly with small, simple queries, network latency may be significant for the time of their execution, if the database and app are on different servers.
Most applications, especially ORMы, very prone to execute a large number of fairly simple queries. For example, if Your Hibernate application gets the entity by lazily fetched relationships @OneToMany to 1000 child objects, it's probably going to execute 1001 queries through n+1 sample issue, if not more. This means that it will spend a thousand times the latency of Your network to pass back and forth, just waiting. You can use the left join fetch to avoid this... but then You will give 1000 times the parent in the JOIN and have it be deduplicated.
Similarly, if You populate the database using ORM, You're probably making hundreds of thousands of ordinary INSERTов... and wait after each individual of INSERTов acknowledgement from the server, everything is fine.
Easy enough to focus on the execution time of a query and try to optimize it, but there are plenty of options of what you can do with simple INSERT INTO... VALUES .... Delete some indexes and constraints, make sure that it is in a transaction and everything is almost ready.
But what about getting rid of all network expectations? Even in a LAN they start to grow after thousands of requests.
the
One way to avoid delays is to use a COPY. In order to use the support PostgreSQl'who COPY, Your application or driver must produce CSV-like set of rows and transmit them to the server in a continuous sequence. Or the server may be asked to send Your application CSV-like flow.
In any case, the application cannot alternate COPY with other queries, and copy-paste needs to be loaded directly into the destination table. The General approach is to do a COPY into a temporary table, and already from it to do INSERT INTO... SELECT ..., UPDATE... FROM ...., DELETE FROM... USING..., etc., using the copied data to change the main tables in a single operation.
This is useful if you write your own SQL directly, but frameworks and ORMы do not support this, plus it can only be directly replaced by a simple box. Your application, framework or a custom driver has to deal with the special representation needed to COPY, to seek the necessary metadata, etc.
(Famous drivers that do support COPY include: libpq, PgJDBC, psycopg2 and Pg gem... but not necessarily what frameworks and ORMS built on their basis, too, support the COPY.)
the
PostgreSQL'who have the JDBC driver has a solution for this problem. It is based on the peculiarities of the grouping API JDBC driver: he sends a group of requests to the server, and then waiting for just one confirmation that the whole set was successful.
At least in theory. In fact, some implementation issues are limiting it all to a few hundred queries in a pack at a time at best. The driver also can only perform queries that return a result set if it can determine how big the result will be over time. Despite these limitations, the use of Statement.executeBatch() can offer very large performance gains to applications that perform tasks such as bulk load data to remote database instances.
Since this is a standard API, it can be used by applications that run on multiple DBMS. Hibernate, for example, can use the JDBC group, although he's not doing it by default.
the
Most (all?) other PostgreSQL driver does not support grouping. PgJDBC uses PostgreSQL Protocol fully and independently, while most other drivers are using the libpq C library, which is shipped as part of PostgreSQL.
libpq has an asynchronous non-blocking APIs, but the user can still only have one request running at a time. You must wait for the results of this query before sending the next one.
PostgreSQL server supports grouping just fine, and PgJDBC already uses it. In this regard, I wrote a support group libpq and suggested it in the next version of PostgreSQL. As it only changes the client, if confirmed, he will speed up the process when connecting to older servers.
I would be very interested in feedback from authors and advanced users based on libpq custom drivers and developers based on libpq applications. Patch successfully applied to PostgreSQL version 9.6 beta 1, if You want to try it. A detailed documentation and a comprehensive example programs.
the
I think that the DBMS server is based on RDS or Heroku Postgres is a good example of where the described functionality can be useful. In particular, addressing them not from their own networks as well-shows can hurt the latency.
At ~320 MS network latency:
the
... which is about 120 times faster.
As a rule, You are not using an Intercontinental connection between the server where the application resides and server on which you deployed the database, the same servers are used for clarity delay effect. Even via a Unix socket on the local host I saw a performance increase of 50% for 10000 inserts.
the
Unfortunately, it is impossible to automatically enable the use of grouping for existing applications. They must use a slightly different interface where they send a series of queries and then query the result.
Should be easy enough to adapt applications that already use the async libpq interface, especially if they use non-blocking mode and select()/poll()/epoll()/WaitForMultipleObjectsEx cycle. Applications that use synchronous libpq interface would require more changes.
the
Similarly, custom drivers, and framwork ORM'am usually required interface and internal changes to allow grouping. If they use the event loop and nonblocking I/O, then they should be fairly easy to change.
I would love to see Python, Ruby and other users with the ability to use this functionality, so I'm interested to see who is interested. Imagine that you are able to do this:
the
Asynchronous grouping should not be a difficult on the part of the user.
Practical customers still prefer COPY. Here are some results from my laptop:
the
The grouping operation generates a surprisingly large performance increase even on a local unix socket... but COPY leaves both the individual INSERT approach far behind in the dust.
Use COPY.
the
The image for this article is Goldfields Water Supply Scheme a pipeline from Mundaring Weir near Perth in Western Australia to inland (desert) gold mines. It is suitable to this article, as of the time of its construction, and under a barrage of criticism, its Creator and the Prime mover, C. Y. O'connor, committed suicide in the 12 months before taking the pipeline into operation. Locals often (wrongly) say that he died after the pipeline was opened, but the water did not flow so much it took the time that everyone thought the pipeline project failed. But weeks later, the water went.
Article based on information from habrahabr.ru
Maybe You are looking in the direction of lock conflicts with log_lock_waits follow the behavior of your control points, etc.
But zadumyvalos whether You about network latency? The players know about it, but if this has any relevance to Your application server?

the
Delay effect
Delay typical client/server ring networks can vary from 0.01 MS (local) to ~0.5 MS in the switched network, Wi-Fi, 5 MS, 20 MS with ADSL, 300 MS at Intercontinental routing, and even more for things like satellites and WWAN links.
Trivial SELECT may be of the order of 0.1 MS to execute on the server side. Trivial INSERT 0.5 msec.
Every time Your application executes a query, it is necessary to wait for a response from the server success/fail and may result, metadata query, etc. This entails, at least, the delay of one pass through the network round-trip.
When the work is predominantly with small, simple queries, network latency may be significant for the time of their execution, if the database and app are on different servers.
Most applications, especially ORMы, very prone to execute a large number of fairly simple queries. For example, if Your Hibernate application gets the entity by lazily fetched relationships @OneToMany to 1000 child objects, it's probably going to execute 1001 queries through n+1 sample issue, if not more. This means that it will spend a thousand times the latency of Your network to pass back and forth, just waiting. You can use the left join fetch to avoid this... but then You will give 1000 times the parent in the JOIN and have it be deduplicated.
Similarly, if You populate the database using ORM, You're probably making hundreds of thousands of ordinary INSERTов... and wait after each individual of INSERTов acknowledgement from the server, everything is fine.
Easy enough to focus on the execution time of a query and try to optimize it, but there are plenty of options of what you can do with simple INSERT INTO... VALUES .... Delete some indexes and constraints, make sure that it is in a transaction and everything is almost ready.
But what about getting rid of all network expectations? Even in a LAN they start to grow after thousands of requests.
the
COPY
One way to avoid delays is to use a COPY. In order to use the support PostgreSQl'who COPY, Your application or driver must produce CSV-like set of rows and transmit them to the server in a continuous sequence. Or the server may be asked to send Your application CSV-like flow.
In any case, the application cannot alternate COPY with other queries, and copy-paste needs to be loaded directly into the destination table. The General approach is to do a COPY into a temporary table, and already from it to do INSERT INTO... SELECT ..., UPDATE... FROM ...., DELETE FROM... USING..., etc., using the copied data to change the main tables in a single operation.
This is useful if you write your own SQL directly, but frameworks and ORMы do not support this, plus it can only be directly replaced by a simple box. Your application, framework or a custom driver has to deal with the special representation needed to COPY, to seek the necessary metadata, etc.
(Famous drivers that do support COPY include: libpq, PgJDBC, psycopg2 and Pg gem... but not necessarily what frameworks and ORMS built on their basis, too, support the COPY.)
the
PgJDBC — mode grouping
PostgreSQL'who have the JDBC driver has a solution for this problem. It is based on the peculiarities of the grouping API JDBC driver: he sends a group of requests to the server, and then waiting for just one confirmation that the whole set was successful.
At least in theory. In fact, some implementation issues are limiting it all to a few hundred queries in a pack at a time at best. The driver also can only perform queries that return a result set if it can determine how big the result will be over time. Despite these limitations, the use of Statement.executeBatch() can offer very large performance gains to applications that perform tasks such as bulk load data to remote database instances.
Since this is a standard API, it can be used by applications that run on multiple DBMS. Hibernate, for example, can use the JDBC group, although he's not doing it by default.
the
libpq and grouping
Most (all?) other PostgreSQL driver does not support grouping. PgJDBC uses PostgreSQL Protocol fully and independently, while most other drivers are using the libpq C library, which is shipped as part of PostgreSQL.
libpq has an asynchronous non-blocking APIs, but the user can still only have one request running at a time. You must wait for the results of this query before sending the next one.
PostgreSQL server supports grouping just fine, and PgJDBC already uses it. In this regard, I wrote a support group libpq and suggested it in the next version of PostgreSQL. As it only changes the client, if confirmed, he will speed up the process when connecting to older servers.
I would be very interested in feedback from authors and advanced users based on libpq custom drivers and developers based on libpq applications. Patch successfully applied to PostgreSQL version 9.6 beta 1, if You want to try it. A detailed documentation and a comprehensive example programs.
the
Performance
I think that the DBMS server is based on RDS or Heroku Postgres is a good example of where the described functionality can be useful. In particular, addressing them not from their own networks as well-shows can hurt the latency.
At ~320 MS network latency:
the
-
the
- 500 inserts without gruppirovanie: 167.0 sec the
- 500 inserts grouped: 1.2 seconds
... which is about 120 times faster.
As a rule, You are not using an Intercontinental connection between the server where the application resides and server on which you deployed the database, the same servers are used for clarity delay effect. Even via a Unix socket on the local host I saw a performance increase of 50% for 10000 inserts.
the
Grouping in an existing application
Unfortunately, it is impossible to automatically enable the use of grouping for existing applications. They must use a slightly different interface where they send a series of queries and then query the result.
Should be easy enough to adapt applications that already use the async libpq interface, especially if they use non-blocking mode and select()/poll()/epoll()/WaitForMultipleObjectsEx cycle. Applications that use synchronous libpq interface would require more changes.
the
a Grouping of other custom drivers
Similarly, custom drivers, and framwork ORM'am usually required interface and internal changes to allow grouping. If they use the event loop and nonblocking I/O, then they should be fairly easy to change.
I would love to see Python, Ruby and other users with the ability to use this functionality, so I'm interested to see who is interested. Imagine that you are able to do this:
the
import psycopg2
conn = psycopg2.connect(...)
cur = conn.cursor()
# this is just an idea, this code does not work with psycopg2:
futures = [ cur.async_execute(sql) for sql in my_queries ]
for future in futures:
result = future.result # waits if result not ready yet
... process the result ...
conn.commit()
Asynchronous grouping should not be a difficult on the part of the user.
COPY the fastest
Practical customers still prefer COPY. Here are some results from my laptop:
the
inserting 1000000 rows batched, unbatched and with COPY
batch insert elapsed: 23.715315 s
sequential insert elapsed: 36.150162 s
COPY elapsed: 1.743593 s
Done.
The grouping operation generates a surprisingly large performance increase even on a local unix socket... but COPY leaves both the individual INSERT approach far behind in the dust.
Use COPY.
the
Image
The image for this article is Goldfields Water Supply Scheme a pipeline from Mundaring Weir near Perth in Western Australia to inland (desert) gold mines. It is suitable to this article, as of the time of its construction, and under a barrage of criticism, its Creator and the Prime mover, C. Y. O'connor, committed suicide in the 12 months before taking the pipeline into operation. Locals often (wrongly) say that he died after the pipeline was opened, but the water did not flow so much it took the time that everyone thought the pipeline project failed. But weeks later, the water went.
Комментарии
Отправить комментарий