manpagez: man pages & more
man cluster(l)
Home | html | info | man
CLUSTER()                        SQL Commands                        CLUSTER()




NAME

       CLUSTER - cluster a table according to an index



SYNOPSIS

       CLUSTER tablename [ USING indexname ]
       CLUSTER



DESCRIPTION

       CLUSTER  instructs  PostgreSQL to cluster the table specified by table-
       name based on the index specified by indexname. The index must  already
       have been defined on tablename.

       When  a  table  is  clustered,  it is physically reordered based on the
       index information. Clustering is a one-time operation: when  the  table
       is  subsequently  updated,  the  changes are not clustered. That is, no
       attempt is made to store new or updated rows according to  their  index
       order.  (If  one  wishes, one can periodically recluster by issuing the
       command again. Also, setting the table's FILLFACTOR  storage  parameter
       to  less  than  100%  can  aid  in  preserving  cluster ordering during
       updates, since updated rows are preferentially kept on the same  page.)

       When  a  table  is  clustered,  PostgreSQL remembers which index it was
       clustered by. The form CLUSTER tablename reclusters the table using the
       same index as before.

       CLUSTER  without  any parameter reclusters all the previously-clustered
       tables in the current database that the calling user owns, or all  such
       tables  if  called  by a superuser. This form of CLUSTER cannot be exe-
       cuted inside a transaction block.

       When a table is being clustered, an ACCESS EXCLUSIVE lock  is  acquired
       on  it.  This  prevents  any  other database operations (both reads and
       writes) from operating on the table until the CLUSTER is finished.


PARAMETERS

       tablename
              The name (possibly schema-qualified) of a table.

       indexname
              The name of an index.


NOTES

       In cases where you are accessing single rows randomly within  a  table,
       the  actual  order of the data in the table is unimportant. However, if
       you tend to access some data more than others, and there  is  an  index
       that groups them together, you will benefit from using CLUSTER.  If you
       are requesting a range of indexed values from  a  table,  or  a  single
       indexed  value  that  has  multiple  rows that match, CLUSTER will help
       because once the index identifies the table page for the first row that
       matches, all other rows that match are probably already on the same ta-
       ble page, and so you save disk accesses and speed up the query.

       During the cluster operation, a temporary copy of the table is  created
       that  contains  the  table data in the index order. Temporary copies of
       each index on the table are created as well. Therefore, you  need  free
       space on disk at least equal to the sum of the table size and the index
       sizes.

       Because CLUSTER remembers the clustering information, one  can  cluster
       the  tables  one  wants  clustered manually the first time, and setup a
       timed event similar to VACUUM  so  that  the  tables  are  periodically
       reclustered.

       Because the planner records statistics about the ordering of tables, it
       is advisable to run ANALYZE [analyze(l)] on the newly clustered  table.
       Otherwise, the planner might make poor choices of query plans.

       There  is another way to cluster data. The CLUSTER command reorders the
       original table by scanning it using the index you specify. This can  be
       slow  on  large  tables  because the rows are fetched from the table in
       index order, and if the table is disordered, the entries are on  random
       pages,  so there is one disk page retrieved for every row moved. (Post-
       greSQL has a cache, but the majority of a big table will not fit in the
       cache.)  The other way to cluster a table is to use:

       CREATE TABLE newtable AS
           SELECT * FROM table ORDER BY columnlist;

       which  uses  the  PostgreSQL sorting code to produce the desired order;
       this is usually much faster than an index  scan  for  disordered  data.
       Then  you  drop  the  old  table,  use ALTER TABLE ... RENAME to rename
       newtable to the old name, and recreate the table's  indexes.   The  big
       disadvantage  of  this approach is that it does not preserve OIDs, con-
       straints, foreign key  relationships,  granted  privileges,  and  other
       ancillary  properties  of  the table -- all such items must be manually
       recreated. Another disadvantage is that this way requires a sort tempo-
       rary  file  about the same size as the table itself, so peak disk usage
       is about three times the table size instead of twice the table size.


EXAMPLES

       Cluster the table employees on the basis of its index employees_ind:

       CLUSTER employees USING employees_ind;


       Cluster the employees table using the same index that was used before:

       CLUSTER employees;


       Cluster all tables in the database that have previously been clustered:

       CLUSTER;



COMPATIBILITY

       There is no CLUSTER statement in the SQL standard.

       The syntax

       CLUSTER indexname ON tablename

       is also supported for compatibility with pre-8.3 PostgreSQL versions.


SEE ALSO

       clusterdb [clusterdb(1)]



SQL - Language Statements         2008-09-19                         CLUSTER()

postgresql 8.3.4 - Generated Thu Oct 2 08:36:42 CDT 2008
© manpagez.com 2000-2024
Individual documents may contain additional copyright information.