ClickHosue 入门(一)

简单了解一下ClickHouse。

翻译 Quick Start | ClickHouse Docs

安装和启动

第一步,本地下载 ClickHouse 最简单的方式是运行 curl https://clickhouse.com/ | sh 命令。如果系统支持 ClickHouse,会自动下载 ClickHouse 二进制包并且添加可执行权限。

第二步,运行 sudo ./clickhouse install 命令,会为 ClickHouse 创建相应的软链接。

第三步,脚本运行结束后会提示用户输入密码。

1
2
3
4
5
6
7
Creating log directory /var/log/clickhouse-server.
Creating data directory /var/lib/clickhouse.
Creating pid directory /var/run/clickhouse-server.
chown -R clickhouse:clickhouse '/var/log/clickhouse-server'
chown -R clickhouse:clickhouse '/var/run/clickhouse-server'
chown clickhouse:clickhouse '/var/lib/clickhouse'
Enter password for default user:

ClickHouse 安装成功后输出以下日志:

1
2
3
4
5
Start clickhouse-server with:
sudo clickhouse start

Start clickhouse-client with:
clickhouse-client

第四步,运行以下命令来启动服务器:

1
sudo clickhouse start

连接 ClickHouse

缺省时 ClickHouse 在 8123 端口启动 Http 服务。http://127.0.0.1:8123/play 提供内置的SQL查询器(根据实际情况修改hostname)

1
SHOW databases

点击 RUN 按钮后结果展示如下:

创建表

跟大部分数据库管理系统类似,ClickHouse 从逻辑上将以数据库(databases)的形式对数据表(tables)分组。

ClickHouse 中即使最简单的表也必须指定表引擎(table engine)。表引擎决定以下细节:

  • 数据如何存放,数据存放到哪里
  • 支持哪些查询
  • 数据是否可复制

有许多种引擎可供使用,但对单节点的 ClickHouse 服务器来说MergeTree是常见的选择。以下命令在数据库 helloworld 中创建名为 my_first_table 的数据表。

1
2
3
4
5
6
7
8
9
CREATE TABLE helloworld.my_first_table
(
user_id UInt32,
message String,
timestamp DateTime,
metric Float32
)
ENGINE = MergeTree()
PRIMARY KEY (user_id, timestamp)

插入数据

使用 INSERT INTO TABLE 在 ClickHouse 中插入数据。不过要注意的是每次向 MergeTree 表中插入数据时都会导致数据存储中生成一个 part

使用 ClickHouse 的最佳实践是一次性批量插入大量数据,比如说上万行甚至百万行。(别担心,ClickHouse 可以处理这种量级的数据)。

  • 即使对于简单的例子,我们也是一次插入多条数据:
1
2
3
4
5
INSERT INTO helloworld.my_first_table (user_id, message, timestamp, metric) VALUES
(101, 'Hello, ClickHouse!', now(), -1.0 ),
(102, 'Insert a lot of rows per batch', yesterday(), 1.41421 ),
(102, 'Sort your data based on your commonly-used queries', today(), 2.718 ),
(101, 'Granules are the smallest chunks of data read', now() + 5, 3.14159 )
  • 确认是否正常工作
1
SELECT * FROM helloworld.my_first_table

可以看到已经插入了四行数据:

客户端

可以使用 clickhouse-client 命令行工具连接 ClickHouse server。

1
clickhouse-client

如果看到笑脸提示符,说明已经成功连接到 ClickHouse server。

1
:)

连接成功后可以尝试查询。

1
2
3
SELECT *
FROM helloworld.my_first_table
ORDER BY timestamp

注意返回结果以表格形式展示:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
SELECT *
FROM helloworld.my_first_table
ORDER BY timestamp ASC

Query id: f7a33012-bc8c-4f0f-9641-260ee1ffe4b8

┌─user_id─┬─message────────────────────────────────────────────┬───────────timestamp─┬──metric─┐
│ 102 │ Insert a lot of rows per batch │ 2022-03-21 00:00:00 │ 1.41421 │
│ 102 │ Sort your data based on your commonly-used queries │ 2022-03-22 00:00:00 │ 2.718 │
│ 101 │ Hello, ClickHouse! │ 2022-03-22 14:04:09 │ -1 │
│ 101 │ Granules are the smallest chunks of data read │ 2022-03-22 14:04:14 │ 3.14159 │
└─────────┴────────────────────────────────────────────────────┴─────────────────────┴─────────┘

4 rows in set. Elapsed: 0.008 sec.

可以添加 FORMAT 子句来指定输出格式

1
2
3
4
SELECT *
FROM helloworld.my_first_table
ORDER BY timestamp
FORMAT TabSeparated

上述查询中,输出结果以 tab 分隔符返回:

1
2
3
4
5
6
7
8
Query id: 3604df1c-acfd-4117-9c56-f86c69721121

102 Insert a lot of rows per batch 2022-03-21 00:00:00 1.41421
102 Sort your data based on your commonly-used queries 2022-03-22 00:00:00 2.718
101 Hello, ClickHouse! 2022-03-22 14:04:09 -1
101 Granules are the smallest chunks of data read 2022-03-22 14:04:14 3.14159

4 rows in set. Elapsed: 0.005 sec.

使用 exit 命令退出 clickhouse-client

1
2
:) exit
Bye.

导入CSV

A common task when getting started with a database is to insert some data that you already have in files. We have some sample data online that you can insert that represents clickstream data - it includes a user ID, a URL that was visited, and the timestamp of the event.

一个常见任务是将文件中的数据导入到数据库。我们有一些在线的点击流示例数据,包括用户ID、访问的URL以及事件戳。

假设一个名为 data.csv 的文件中有以下数据:

1
2
3
102,This is data in a file,2022-02-22 10:43:28,123.45
101,It is comma-separated,2022-02-23 00:00:00,456.78
103,Use FORMAT to specify the format,2022-02-21 10:43:30,678.90

以下命令将数据插入 my_first_table 表。

1
clickhouse-client --query='INSERT INTO helloworld.my_first_table FORMAT CSV' < data.csv

可以看到新的数据入库了。

下一步

常见问题

问题一:启动时提示监控端口失败,无法访问 ClickHouse

1
2
3
2022.06.29 11:44:33.144576 [ 23415 ] {} <Warning> Application: Listen [::1]:8123 failed: Poco::Exception. Code: 1000, e.code() = 99, Net Exception: Cannot assign requested address: [::1]:8123 (version 22.7.1.823 (official build)). If it is an IPv6 or IPv4 address and your host has disabled IPv6 or IPv4, then consider to specify not disabled IPv4 or IPv6 address to listen in <listen_host> element of configuration file. Example for disabled IPv6: <listen_host>0.0.0.0</listen_host> . Example for disabled IPv4: <listen_host>::</listen_host>
2022.06.29 11:44:33.144986 [ 23415 ] {} <Warning> Application: Listen [::1]:9000 failed: Poco::Exception. Code: 1000, e.code() = 99, Net Exception: Cannot assign requested address: [::1]:9000 (version 22.7.1.823 (official build)). If it is an IPv6 or IPv4 address and your host has disabled IPv6 or IPv4, then consider to specify not disabled IPv4 or IPv6 address to listen in <listen_host> element of configuration file. Example for disabled IPv6: <listen_host>0.0.0.0</listen_host> . Example for disabled IPv4: <listen_host>::</listen_host>
2022.06.29 11:44:33.145252 [ 23415 ] {} <Warning> Application: Listen [::1]:9004 failed: Poco::Exception. Code: 1000, e.code() = 99, Net Exception: Cannot assign requested address: [::1]:9004 (version 22.7.1.823 (official build)). If it is an IPv6 or IPv4 address and your host has disabled IPv6 or IPv4, then consider to specify not disabled IPv4 or IPv6 address to listen in <listen_host> element of configuration file. Example for disabled IPv6: <listen_host>0.0.0.0</listen_host> . Example for disabled IPv4: <listen_host>::</listen_host>

解决办法:clickhouse-server -- --listen_host=0.0.0.0