Creating Tables in ClickHouse
Like most databases, ClickHouse logically groups tables into databases. Use the CREATE DATABASE command to create a new database in ClickHouse:
Similarly, use CREATE TABLE to define a new table. (If you do not specify the database name, the table will be in the
default database.) The following table named is my_first_table in the helloworld database:
In the example above, my_first_table is a MergeTree table with four columns:
-
user_id: a 32-bit unsigned integer -
message: aStringdata type, which replaces types likeVARCHAR,BLOB,CLOBand others from other database systems -
timestamp: aDateTimevalue, which represents an instant in time -
metric: a 32-bit floating point numbernoteThe table engine determines:
-
How and where the data is stored
-
Which queries are supported
-
Whether or not the data is replicated
There are many engines to choose from, but for a simple table on a single-node ClickHouse server, MergeTree is your likely choice.
-
A Brief Intro to Primary Keys
Before you go any further, it is important to understand how primary keys work in ClickHouse (the implementation of primary keys might seem unexpected!):
- primary keys in ClickHouse are not unique for each row in a table
The primary key of a ClickHouse table determines how the data is sorted when written to disk. Every 8,192 rows or 10MB of
data (referred to as the index granularity) creates an entry in the primary key index file. This granularity concept
creates a sparse index that can easily fit in memory, and the granules represent a stripe of the smallest amount of
column data that gets processed during SELECT queries.
The primary key can be defined using the PRIMARY KEY parameter. If you define a table without a PRIMARY KEY specified,
then the key becomes the tuple specified in the ORDER BY clause. If you specify both a PRIMARY KEY and an ORDER BY, the primary key must be a prefix of the sort order.
The primary key is also the sorting key, which is a tuple of (user_id, timestamp). Therefore, the data stored in each
column file will be sorted by user_id, then timestamp.
For more details, check out the Modeling Data training module in ClickHouse Academy.