- StarRocks介绍
- 快速开始
- 表设计
- 数据导入
- 数据提取
- 使用StarRocks
- 参考手册
- SQL参考
- 用户账户管理
- 集群管理
- ADMIN CANCEL REPAIR
- ADMIN CHECK TABLET
- ADMIN REPAIR
- ADMIN SET CONFIG
- ADMIN SET REPLICA STATUS
- ADMIN SHOW CONFIG
- ADMIN SHOW REPLICA DISTRIBUTION
- ADMIN SHOW REPLICA STATUS
- ALTER CLUSTER
- ALTER SYSTEM
- CANCEL DECOMMISSION
- CREATE CLUSTER
- CREATE FILE
- DROP CLUSTER
- DROP FILE
- ENTER
- INSTALL PLUGIN
- LINK DATABASE
- MIGRATE DATABASE
- SHOW BACKENDS
- SHOW BROKER
- SHOW FILE
- SHOW FRONTENDS
- SHOW FULL COLUMNS
- SHOW INDEX
- SHOW MIGRATIONS
- SHOW PLUGINS
- SHOW TABLE STATUS
- UNINSTALL PLUGIN
- DDL
- ALTER DATABASE
- ALTER TABLE
- ALTER VIEW
- BACKUP
- CANCEL ALTER
- CANCEL BACKUP
- CANCEL RESTORE
- CREATE DATABASE
- CREATE INDEX
- CREATE MATERIALIZED VIEW
- CREATE REPOSITORY
- CREATE RESOURCE
- CREATE TABLE AS SELECT
- CREATE TABLE LIKE
- CREATE TABLE
- CREATE VIEW
- CREATE FUNCTION
- DROP DATABASE
- DROP INDEX
- DROP MATERIALIZED VIEW
- DROP REPOSITORY
- DROP RESOURCE
- DROP TABLE
- DROP VIEW
- DROP FUNCTION
- HLL
- RECOVER
- RESTORE
- SHOW RESOURCES
- SHOW FUNCTION
- TRUNCATE TABLE
- DML
- ALTER ROUTINE LOAD
- BROKER LOAD
- CANCEL LOAD
- DELETE
- EXPORT
- GROUP BY
- INSERT
- PAUSE ROUTINE LOAD
- RESUME ROUTINE LOAD
- CREATE ROUTINE LOAD
- SELECT
- SHOW ALTER
- SHOW BACKUP
- SHOW DATA
- SHOW DATABASES
- SHOW DELETE
- SHOW DYNAMIC PARTITION TABLES
- SHOW EXPORT
- SHOW LOAD
- SHOW PARTITIONS
- SHOW PROPERTY
- SHOW REPOSITORIES
- SHOW RESTORE
- SHOW ROUTINE LOAD
- SHOW ROUTINE LOAD TASK
- SHOW SNAPSHOT
- SHOW TABLES
- SHOW TABLET
- SHOW TRANSACTION
- SPARK LOAD
- STOP ROUTINE LOAD
- STREAM LOAD
- 数据类型
- 辅助命令
- 函数参考
- 日期函数
- convert_tz
- curdate
- current_timestamp
- curtime
- datediff
- date_add
- date_format
- date_sub
- date_trunc
- day
- dayname
- dayofmonth
- dayofweek
- dayofyear
- from_days
- from_unixtime
- hour
- minute
- month
- monthname
- now
- second
- str_to_date
- timediff
- timestampadd
- timestampdiff
- to_date
- to_days
- unix_timestamp
- utc_timestamp
- weekofyear
- year
- hours_diff
- minutes_diff
- months_diff
- seconds_diff
- weeks_diff
- years_diff
- quarter
- timestamp
- time_to_sec
- str2date
- microseconds_add
- microseconds_sub
- 加密函数
- 地理位置函数
- 字符串函数
- append_trailing_char_if_absent
- ascii
- char_length
- concat
- concat_ws
- ends_with
- find_in_set
- get_json_double
- get_json_int
- get_json_string
- group_concat
- instr
- lcase
- left
- length
- locate
- lower
- lpad
- ltrim
- rtrim
- money_format
- null_or_empty
- regexp_extract
- regexp_replace
- repeat
- reverse
- right
- rpad
- split
- split_part
- starts_with
- strleft
- strright
- hex
- unhex
- substr
- space
- parse_url
- JSON 函数
- JSON 函数和运算符
- JSON 构造函数
- JSON 查询和处理函数
- JSON 运算符
- 模糊/正则匹配函数
- 工具函数
- 聚合函数
- Bitmap函数
- 数组函数
- bit函数
- cast函数
- hash函数
- 条件函数
- 百分位函数
- 数学函数
- 日期函数
- 系统变量
- 错误码
- 系统限制
- SQL参考
- 管理指南
- 常见问题解答
- 性能测试
- Release Notes
retention
功能
通过应用一系列条件表达式,计算得到关于留存分布的结果(Array)。
语法
output retention(input)
参数说明
- input:一系列表示事件的表达式构成的Array,类型为Array\<BOOLEAN>。
返回值说明
类型为Array\<BOOLEAN>,长度与input一致,其中:
- output中第1个元素的值是input[1]。
- output中第n(n > 1)个元素的值是true,如果input[1]和output[n]都为true。
示例
示例一:
mysql> SELECT retention([lo_orderdate = '1997-08-01' AND lo_orderpriority = '2-HIGH',
lo_orderdate = '1997-08-02' AND lo_orderpriority = '1-URGENT', lo_orderdate = '1997-08-03' AND
lo_orderpriority = '5-LOW']) AS r FROM lineorder_flat GROUP BY lo_linenumber;
+---------+
| r |
+---------+
| [1,1,1] |
| [1,1,1] |
| [1,1,1] |
| [1,1,1] |
| [1,1,1] |
| [1,1,1] |
| [1,1,1] |
+---------+
示例二:
计算满足条件lo_orderdate = '1997-08-02' AND lo_orderpriority = '1-URGENT'
和
满足条件lo_orderdate = '1997-08-03' AND lo_orderpriority = '5-LOW'
所占的比例
mysql> SELECT sum(r[1]), sum(r[2]) / sum(r[1]), sum(r[3]) / sum(r[1]) FROM
(SELECT retention([lo_orderdate = '1997-08-01' AND lo_orderpriority = '2-HIGH',
lo_orderdate = '1997-08-02' AND lo_orderpriority = '1-URGENT', lo_orderdate = '1997-08-03' AND
lo_orderpriority = '5-LOW']) AS r FROM lineorder_flat GROUP BY lo_suppkey) t;
+-------------+---------------------------+---------------------------+
| sum(`r`[1]) | sum(`r`[2]) / sum(`r`[1]) | sum(`r`[3]) / sum(`r`[1]) |
+-------------+---------------------------+---------------------------+
| 43951 | 0.2228163181725103 | 0.2214056562990603 |
+-------------+---------------------------+---------------------------+
示例三:
假设现在有表lineorder_flat
,其中数据为
+--------------+------------------+------------+
| lo_orderdate | lo_orderpriority | lo_custkey |
+--------------+------------------+------------+
| 1022-11-20 | 4-NOT SPECI | 309050 |
| 1222-10-31 | 2-HIGH | 492238 |
| 1380-09-30 | 5-LOW | 123099 |
| 1380-09-30 | 5-LOW | 460237 |
| 1380-09-30 | 5-LOW | 426502 |
| 1022-11-20 | 4-NOT SPECI | 197081 |
| 1380-09-30 | 5-LOW | 918918 |
| 1022-11-20 | 4-NOT SPECI | 327825 |
| 1380-09-30 | 5-LOW | 252542 |
| 1380-09-30 | 5-LOW | 194171 |
+--------------+------------------+------------+
10 rows in set (0.02 sec)
我们使用retention
函数,
mysql> SELECT lo_custkey,
retention([lo_orderdate='1022-11-20' AND lo_orderpriority='4-NOT SPECI',
lo_orderdate='1022-11-21' AND lo_orderpriority='4-LONG']) AS retention
FROM lineorder_flat
GROUP BY lo_custkey;
+------------+-----------+
| lo_custkey | retention |
+------------+-----------+
| 327825 | [1,0] |
| 309050 | [1,0] |
| 252542 | [0,0] |
| 123099 | [0,0] |
| 460237 | [0,0] |
| 194171 | [0,0] |
| 197081 | [1,0] |
| 918918 | [0,0] |
| 492238 | [0,0] |
| 426502 | [0,0] |
+------------+-----------+
10 rows in set (0.01 sec)
可以看到没有第2位是1的结果,原因在于没有满足条件的数据。 我们插入一条数据包含
(lo_orderdate='1022-11-21', lo_orderpriority='4-LONG', lo_custkey=460237)
再次执行
mysql> SELECT lo_custkey,
retention([lo_orderdate='1022-11-20' AND lo_orderpriority='4-NOT SPECI',
lo_orderdate='1022-11-21' AND lo_orderpriority='4-LONG']) AS retention
FROM lineorder_flat
GROUP BY lo_custkey;
+------------+-----------+
| lo_custkey | retention |
+------------+-----------+
| 327825 | [1,0] |
| 309050 | [1,0] |
| 252542 | [0,0] |
| 123099 | [0,0] |
| 460237 | [0,0] |
| 194171 | [0,0] |
| 197081 | [1,0] |
| 918918 | [0,0] |
| 492238 | [0,0] |
| 426502 | [0,0] |
+------------+-----------+
10 rows in set (0.01 sec)
结果依然没变,因为custkey(460237)
对应的第1位条件不满足,
我们插入另一条数据包含
(lo_orderdate='1022-11-21', lo_orderpriority='4-LONG', lo_custkey=327825)
再次执行
mysql> SELECT lo_custkey,
retention([lo_orderdate='1022-11-20' AND lo_orderpriority='4-NOT SPECI',
lo_orderdate='1022-11-21' AND lo_orderpriority='4-LONG']) AS retention
FROM lineorder_flat
GROUP BY lo_custkey;
+------------+-----------+
| lo_custkey | retention |
+------------+-----------+
| 327825 | [1,1] |
| 309050 | [1,0] |
| 252542 | [0,0] |
| 123099 | [0,0] |
| 460237 | [0,0] |
| 194171 | [0,0] |
| 197081 | [1,0] |
| 918918 | [0,0] |
| 492238 | [0,0] |
| 426502 | [0,0] |
+------------+-----------+
10 rows in set (0.01 sec)
可以看到 327825
对应结果的第二位变为了1。
关键字
RETENTION