This shows you the differences between two versions of the page.
|
projects:pgfincore [2010/03/12 00:28] Cédric Villemain update pgfadv_WILLNEED |
projects:pgfincore [2010/05/23 07:15] (current) Cédric Villemain |
||
|---|---|---|---|
| Line 1: | Line 1: | ||
| - | ====== PgFincore ====== | ||
| - | |||
| A talk about PgFincore is scheduled at [[http://www.pgcon.org/2010/|PgCon10]]. | A talk about PgFincore is scheduled at [[http://www.pgcon.org/2010/|PgCon10]]. | ||
| + | ====== PgFincore ====== | ||
| PgFincore is a set of functions to manage blocks in memory. | PgFincore is a set of functions to manage blocks in memory. | ||
| Line 8: | Line 7: | ||
| Each Table or Index is truncated in segment of (usually) 1GB, and each segment is truncated in block in filesystem. | Each Table or Index is truncated in segment of (usually) 1GB, and each segment is truncated in block in filesystem. | ||
| - | Those functions let you know which and how many disk block from a relation are in the buffer cache of the operating system, and eventually write the result to a file. Then using this file, it is possible to restore the buffer cache state for each block of the relation. | + | Those functions let you know which and how many disk block from a relation are in the page cache of the operating system, and eventually write the result to a file. Then using this file, it is possible to restore the page cache state for each block of the relation. |
| - | Other functions are used to set a //POSIX_FADVISE// flag on the entire relation (each segment). The more usefull are probably //WILLNEED// and //DONTNEED// which push and pop blocks of each segments of a relation from buffer cache, respectively. | + | Other functions are used to set a //POSIX_FADVISE// flag on the entire relation (each segment). The more usefull are probably //WILLNEED// and //DONTNEED// which push and pop blocks of each segments of a relation from page cache, respectively. |
| Each functions are call with at least a table name or an index name (or oid) as a parameter and | Each functions are call with at least a table name or an index name (or oid) as a parameter and | ||
| Line 17: | Line 16: | ||
| ===== Download ===== | ===== Download ===== | ||
| - | Lastest release is 0.4 : [[http://pgfoundry.org/frs/download.php/2541/pgfincore-v0.4.0.tar.gz|direct download ]] | + | Lastest release is 0.4.1 : [[http://pgfoundry.org/frs/download.php/2670/pgfincore-v0.4.1.tar.gz|direct download ]] |
| You can grab the latest code with git : | You can grab the latest code with git : | ||
| Line 31: | Line 30: | ||
| First, make stage : | First, make stage : | ||
| - | export USE_PGXS=1 ; make clean ; make | + | USE_PGXS=1 make clean |
| - | su ; export USE_PGXS=1 ; make install | + | USE_PGXS=1 make |
| + | su | ||
| + | USE_PGXS=1 make install | ||
| Then, create the functions from the sql script (it should be in your contrib directory) : | Then, create the functions from the sql script (it should be in your contrib directory) : | ||
| Line 42: | Line 43: | ||
| Here are some examples of usage. If you want more details go to [[#Documentation]] | Here are some examples of usage. If you want more details go to [[#Documentation]] | ||
| - | ==== Load a table or an index in OS Page Buffer ==== | + | ==== Load a table or an index in OS Page Cache ==== |
| - | You may want to try to keep a table or an index into the OS Page Buffer, or preload a table before your well know big query is executed (reducing the query time). | + | You may want to try to keep a table or an index into the OS Page Cache, or preload a table before your well know big query is executed (reducing the query time). |
| - | To do so, just execute the following query frequently enough (perhaps using crontab): | + | To do so, just execute the following query: |
| <code sql> | <code sql> | ||
| - | select * from pgfadv_willneed('pgbench_accounts'); | + | cedric=# select * from pgfadv_willneed('pgbench_accounts'); |
| - | ... | + | relpath | block_size | block_disk | block_free |
| + | --------------------+------------+------------+------------ | ||
| + | base/16384/24598 | 4096 | 262144 | 111882 | ||
| + | base/16384/24598.1 | 4096 | 55318 | 56764 | ||
| + | (2 rows) | ||
| + | |||
| + | Time: 39309,294 ms | ||
| </code> | </code> | ||
| - | ==== Snapshot and Restore the OS Page Buffer state of a table or an index (or more) ==== | + | //The column "block_size" report that block size of the filesystem is 4KB.// |
| + | //The column "block_disk" is the number of blocks of the specified file.// | ||
| + | //The column "block_free" is the number of free blocks in memory (for caching).// | ||
| - | You may want to restore a table or an index into the OS Page Buffer as it was while you do the snapshot. For example if you have to reboot your server, then when PostgreSQL start up the first queries might be slow because nether PostgreSQL or the OS have cache pages about the relations involved in those first queries. | + | ==== Snapshot and Restore the OS Page Cache state of a table or an index (or more) ==== |
| + | |||
| + | You may want to restore a table or an index into the OS Page Cache as it was while you did the snapshot. For example if you have to reboot your server, then when PostgreSQL start up the first queries might be slow because nether PostgreSQL or the OS have cache pages about the relations involved in those first queries. | ||
| Executing a snapshot and a restore is very simple : | Executing a snapshot and a restore is very simple : | ||
| <code sql> | <code sql> | ||
| -- Snapshot | -- Snapshot | ||
| - | select * from pgmincore_snapshot('pgbench_accounts'); | + | cedric=# select * from pgmincore_snapshot('pgbench_accounts'); |
| - | ... | + | relpath | block_size | block_disk | block_mem | group_mem |
| + | --------------------+------------+------------+-----------+----------- | ||
| + | base/16384/24598 | 4096 | 262144 | 131745 | 1 | ||
| + | base/16384/24598.1 | 4096 | 55318 | 55318 | 1 | ||
| -- Restore | -- Restore | ||
| - | select * from pgfadv_willneed_snapshot('pgbench_accounts'); | + | cedric=# select * from pgfadv_willneed_snapshot('pgbench_accounts'); |
| - | ... | + | relpath | block_size | block_disk | block_free |
| + | --------------------+------------+------------+------------ | ||
| + | base/16384/24598 | 4096 | 262144 | 105335 | ||
| + | base/16384/24598.1 | 4096 | 55318 | 50217 | ||
| + | (2 rows) | ||
| + | |||
| + | Time: 38745,140 ms | ||
| </code> | </code> | ||
| + | |||
| + | //The column "block_mem" report how many blocks of the file are in memory.// | ||
| + | //The column "group_mem" report that all the bloks in memory are contigous (only one group).// | ||
| + | |||
| + | A more complete example is at [[./postgres_stuff|my postgres stuff]]. | ||
| ===== Documentation ===== | ===== Documentation ===== | ||
| Line 83: | Line 109: | ||
| ==== pgsysconf ==== | ==== pgsysconf ==== | ||
| - | This function output size of OS blocks, number of free page in the OS Page Buffer. | + | This function output size of OS blocks, number of free page in the OS Page Cache. |
| === Example === | === Example === | ||
| Line 96: | Line 122: | ||
| ==== pgmincore ==== | ==== pgmincore ==== | ||
| - | This function provide information about the file system cache (buffer cache). | + | This function provide information about the file system cache (page cache). |
| For the specified relation (can be call with tableoid too) it return : | For the specified relation (can be call with tableoid too) it return : | ||
| Line 103: | Line 129: | ||
| * block_size : the size of one block disk | * block_size : the size of one block disk | ||
| * block_disk : the total number of file system blocks of the relation | * block_disk : the total number of file system blocks of the relation | ||
| - | * block_mem : the total number of file system blocks of the relation in buffer cache. (not the shared buffers from PostgreSQL but the OS cache) | + | * block_mem : the total number of file system blocks of the relation in page cache. (not the shared buffers from PostgreSQL but the OS cache) |
| * group_mem : the number of groups of adjacent block_mem | * group_mem : the number of groups of adjacent block_mem | ||
| Line 163: | Line 189: | ||
| * block_size : The size in bytes of the filesystem block. | * block_size : The size in bytes of the filesystem block. | ||
| * block_disk : the total number of file system blocks of the relation | * block_disk : the total number of file system blocks of the relation | ||
| - | * block_free : | + | * block_free : the number of free blocks in memory (for caching). |
| === pgfadv_WILLNEED === | === pgfadv_WILLNEED === | ||
| Line 186: | Line 212: | ||
| <code> | <code> | ||
| + | cedric=# select * from pgfadv_willneed_snapshot('pgbench_accounts'); | ||
| + | relpath | block_size | block_disk | block_free | ||
| + | --------------------+------------+------------+------------ | ||
| + | base/16384/24598 | 4096 | 262144 | 105335 | ||
| + | base/16384/24598.1 | 4096 | 55318 | 50217 | ||
| </code> | </code> | ||
| Line 195: | Line 226: | ||
| <code> | <code> | ||
| + | cedric=# select * from pgfadv_dontneed('pgbench_accounts'); | ||
| + | relpath | block_size | block_disk | block_free | ||
| + | --------------------+------------+------------+------------ | ||
| + | base/16384/24598 | 4096 | 262144 | 178743 | ||
| + | base/16384/24598.1 | 4096 | 55318 | 234078 | ||
| </code> | </code> | ||
| Line 204: | Line 240: | ||
| <code> | <code> | ||
| + | cedric=# select * from pgfadv_NORMAL('pgbench_accounts'); | ||
| + | relpath | block_size | block_disk | block_free | ||
| + | --------------------+------------+------------+------------ | ||
| + | base/16384/24598 | 4096 | 262144 | 233954 | ||
| + | base/16384/24598.1 | 4096 | 55318 | 233954 | ||
| </code> | </code> | ||
| Line 213: | Line 254: | ||
| <code> | <code> | ||
| + | cedric=# select * from pgfadv_RANDOM('pgbench_accounts'); | ||
| + | relpath | block_size | block_disk | block_free | ||
| + | --------------------+------------+------------+------------ | ||
| + | base/16384/24598 | 4096 | 262144 | 234233 | ||
| + | base/16384/24598.1 | 4096 | 55318 | 234233 | ||
| </code> | </code> | ||
| Line 222: | Line 268: | ||
| <code> | <code> | ||
| + | cedric=# select * from pgfadv_SEQUENTIAL('pgbench_accounts'); | ||
| + | relpath | block_size | block_disk | block_free | ||
| + | --------------------+------------+------------+------------ | ||
| + | base/16384/24598 | 4096 | 262144 | 233985 | ||
| + | base/16384/24598.1 | 4096 | 55318 | 233985 | ||
| </code> | </code> | ||
| Line 228: | Line 279: | ||
| You can debug the function with the following error level : //DEBUG1// and //DEBUG5// | You can debug the function with the following error level : //DEBUG1// and //DEBUG5// | ||
| - | set client_min_messages TO debug5; | + | set client_min_messages TO debug1; -- debug5 is only usefull to trace each block |
| + | |||
| + | |||
| + | ===== Notes ===== | ||
| + | |||
| + | * linux 2.6.34 : POSIX_FADV_RANDOM now does not deactivate readahead 100% . see http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=0141450f66c3c12a3aaa869748caa64241885cdf | ||
| + | --> impact O_DIRECT | ||
| + | --> improv | ||