- StarRocks介绍
- 快速开始
- 表设计
- 数据导入
- 数据提取
- 使用StarRocks
- 参考手册
- SQL参考
- 用户账户管理
- 集群管理
- ADMIN CANCEL REPAIR
- ADMIN CHECK TABLET
- ADMIN REPAIR
- ADMIN SET CONFIG
- ADMIN SET REPLICA STATUS
- ADMIN SHOW CONFIG
- ADMIN SHOW REPLICA DISTRIBUTION
- ADMIN SHOW REPLICA STATUS
- ALTER CLUSTER
- ALTER SYSTEM
- CANCEL DECOMMISSION
- CREATE CLUSTER
- CREATE FILE
- DROP CLUSTER
- DROP FILE
- ENTER
- INSTALL PLUGIN
- LINK DATABASE
- MIGRATE DATABASE
- SHOW BACKENDS
- SHOW BROKER
- SHOW FILE
- SHOW FRONTENDS
- SHOW FULL COLUMNS
- SHOW INDEX
- SHOW MIGRATIONS
- SHOW PLUGINS
- SHOW TABLE STATUS
- UNINSTALL PLUGIN
- DDL
- ALTER DATABASE
- ALTER TABLE
- ALTER VIEW
- BACKUP
- CANCEL ALTER
- CANCEL BACKUP
- CANCEL RESTORE
- CREATE DATABASE
- CREATE INDEX
- CREATE MATERIALIZED VIEW
- CREATE REPOSITORY
- CREATE RESOURCE
- CREATE TABLE AS SELECT
- CREATE TABLE LIKE
- CREATE TABLE
- CREATE VIEW
- CREATE FUNCTION
- DROP DATABASE
- DROP INDEX
- DROP MATERIALIZED VIEW
- DROP REPOSITORY
- DROP RESOURCE
- DROP TABLE
- DROP VIEW
- DROP FUNCTION
- HLL
- RECOVER
- RESTORE
- SHOW RESOURCES
- SHOW FUNCTION
- TRUNCATE TABLE
- DML
- ALTER ROUTINE LOAD
- BROKER LOAD
- CANCEL LOAD
- DELETE
- EXPORT
- GROUP BY
- INSERT
- PAUSE ROUTINE LOAD
- RESUME ROUTINE LOAD
- CREATE ROUTINE LOAD
- SELECT
- SHOW ALTER
- SHOW BACKUP
- SHOW DATA
- SHOW DATABASES
- SHOW DELETE
- SHOW DYNAMIC PARTITION TABLES
- SHOW EXPORT
- SHOW LOAD
- SHOW PARTITIONS
- SHOW PROPERTY
- SHOW REPOSITORIES
- SHOW RESTORE
- SHOW ROUTINE LOAD
- SHOW ROUTINE LOAD TASK
- SHOW SNAPSHOT
- SHOW TABLES
- SHOW TABLET
- SHOW TRANSACTION
- SPARK LOAD
- STOP ROUTINE LOAD
- STREAM LOAD
- 数据类型
- 辅助命令
- 函数参考
- 日期函数
- convert_tz
- curdate
- current_timestamp
- curtime
- datediff
- date_add
- date_format
- date_sub
- date_trunc
- day
- dayname
- dayofmonth
- dayofweek
- dayofyear
- from_days
- from_unixtime
- hour
- minute
- month
- monthname
- now
- second
- str_to_date
- timediff
- timestampadd
- timestampdiff
- to_date
- to_days
- unix_timestamp
- utc_timestamp
- weekofyear
- year
- 地理位置函数
- 字符串函数
- append_trailing_char_if_absent
- ascii
- char_length
- concat
- concat_ws
- ends_with
- find_in_set
- get_json_double
- get_json_int
- get_json_string
- group_concat
- instr
- lcase
- left
- length
- locate
- lower
- lpad
- ltrim
- money_format
- null_or_empty
- regexp_extract
- regexp_replace
- repeat
- reverse
- right
- rpad
- split
- split_part
- starts_with
- strleft
- strright
- 聚合函数
- Bitmap函数
- 数组函数
- cast函数
- hash函数
- 加密函数
- 日期函数
- 系统变量
- 错误码
- 系统限制
- SQL参考
- 管理手册
- 常见问题解答
- 性能测试
- Release Notes
HLL
description
HLL是基于HyperLogLog算法的工程实现,用于保存HyperLogLog计算过程的中间结果,它只能作为表的value列类型 通过聚合来不断的减少数据量,以此来实现加快查询的目的,基于它到的是一个估算结果,误差大概在1%左右
hll列是通过其它列或者导入数据里面的数据生成的,导入的时候通过hll_hash函数来指定数据中哪一列用于生成hll列 它常用于替代count distinct,通过结合rollup在业务上用于快速计算uv等
相关函数:
HLL_UNION_AGG(hll) 此函数为聚合函数,用于计算满足条件的所有数据的基数估算。此函数还可用于分析函数,只支持默认窗口,不支持window从句。
HLL_RAW_AGG(hll) 此函数为聚合函数,用于聚合hll类型字段,并且返回的还是hll类型。
HLL_CARDINALITY(hll) 此函数用于计算单条hll列的基数估算
HLL_HASH(column_name) 生成HLL列类型,用于insert或导入的时候,导入的使用见相关说明
HLL_EMPTY() 生成空HLL列,用于insert或导入的时候补充默认值,导入的使用见相关说明
example
首先创建一张含有hll列的表
create table test( dt date, id int, name char(10), province char(10), os char(1), set1 hll hll_union, set2 hll hll_union) distributed by hash(id) buckets 32;
导入数据,导入的方式见相关help curl
a. 使用表中的列生成hll列 curl --location-trusted -uname:password -T data -H "label:load_1" \ -H "columns:dt, id, name, province, os, set1=hll_hash(id), set2=hll_hash(name)" http://host/api/test_db/test/_stream_load b. 使用数据中的某一列生成hll列 curl --location-trusted -uname:password -T data -H "label:load_1" \ -H "columns:dt, id, name, province, sex, cuid, os, set1=hll_hash(cuid), set2=hll_hash(os)" http://host/api/test_db/test/_stream_load
聚合数据,常用方式3种:(如果不聚合直接对base表查询,速度可能跟直接使用approx_count_distinct速度差不多)
a. 创建一个rollup,让hll列产生聚合, alter table test add rollup test_rollup(dt, set1); b. 创建另外一张专门计算uv的表,然后insert数据) create table test_uv( dt date, id int, uv_set hll hll_union) distributed by hash(id) buckets 32; insert into test_uv select dt, set1 from test; c. 创建另外一张专门计算uv的表,然后insert并通过hll_hash根据test其它非hll列生成hll列 create table test_uv( dt date, id_set hll hll_union) distributed by hash(id) buckets 32; insert into test_uv select dt, hll_hash(id) from test;
查询,hll列不允许直接查询它的原始值,可以通过配套的函数进行查询
a. 求总uv select HLL_UNION_AGG(uv_set) from test_uv; b. 求每一天的uv select dt, HLL_CARDINALITY(uv_set) from test_uv; c. 求test表中set1的聚合值 select dt, HLL_CARDINALITY(uv) from (select dt, HLL_RAW_AGG(set1) as uv from test group by dt) tmp; select dt, HLL_UNION_AGG(set1) as uv from test group by dt;
keyword
HLL