How do you use the TABLESAMPLE clause to retrieve a random sample of rows?
Posted by PaulAnd
Last Updated: June 23, 2024
The TABLESAMPLE clause is a SQL feature that allows you to retrieve a random sample of rows from a table. This is especially useful for querying large datasets where retrieving the entire table may be inefficient. The exact syntax may vary slightly depending on the SQL database you are using, but it follows a general structure. Here's how you can use the TABLESAMPLE clause in SQL:
Basic Syntax
SELECT *
FROM table_name
TABLESAMPLE (percentage OR row_count) [REPEATABLE (seed)]
Parameters
- percentage: This specifies the percentage of rows to sample. For example, TABLESAMPLE (10 PERCENT) retrieves approximately 10% of the rows from the table. - row_count: In some database systems, you can specify the number of rows to sample rather than a percentage. For example, TABLESAMPLE (100) would retrieve approximately 100 rows. - REPEATABLE (seed): This is an optional clause that allows you to specify a seed value for the random sampling. Using the same seed will yield the same sample results each time you run the query, which can be useful for reproducibility.
Example Queries
1. Retrieving a Random Sample by Percentage:
SELECT *
FROM customers
TABLESAMPLE (10 PERCENT);
This query retrieves about 10% of the rows from the customers table. 2. Retrieving a Fixed Number of Rows:
SELECT *
FROM orders
TABLESAMPLE (100);
This retrieves approximately 100 random rows from the orders table. 3. Using the REPEATABLE Clause:
SELECT *
FROM products
TABLESAMPLE (10 PERCENT) REPEATABLE (123);
This command retrieves approximately 10% of the rows from the products table and will consistently return the same set of rows each time it is executed due to the specified seed.
Database Specific Notes
- SQL Server: The TABLESAMPLE clause is supported in SQL Server and follows the syntax above. - PostgreSQL: PostgreSQL does not support the TABLESAMPLE syntax exactly as described, but it has a similar functionality using the TABLESAMPLE clause with different options (BERNOULLI and SYSTEM). - Oracle: As of recent versions, Oracle supports random sampling through the use of the DBMS_RANDOM package, but it does not support TABLESAMPLE in its standard form. - MySQL: MySQL does not have a direct TABLESAMPLE clause, but you can achieve similar results using ORDER BY RAND() with a LIMIT. Refer to the specific SQL database documentation for detailed information regarding the use of the TABLESAMPLE clause and the supported syntax.