StarRocks version 3.1
3.1.11
Release date: April 28, 2024
Behavior Changes
- Users are not allowed to drop views in the system database
information_schema
using DROP TABLE. #43556 - Users are not allowed to specify duplicate keys in the ORDER BY clause when creating a Primary Key table. #43374
Improvements
- Queries on Parquet-formatted Iceberg v2 tables support equality deletes.
Bug Fixes
Fixed the following issues:
- When a user queries data from an external table in an external catalog, access to this table is denied even when the user has the SELECT privilege on this table. SHOW GRANTS also shows that the user has this privilege. #44061
str_to_map
may cause BEs to crash. #43930- When a Routine Load job is going on, running
show proc '/routine_loads'
is stuck due to deadlock. #44249 - Persistent Index of Primary Key tables may cause BEs to crash due to issues in concurrency control. #43720
- The
pending_task_run_count
displayed on the page ofleaderFE_IP:8030
is incorrect. The displayed number is the sum of Pending and Running tasks, not Pending tasks. In addition, the information of the metricrefresh_pending
cannot be displayed usingfollowerFE_IP:8030
. #43052 - Querying
information_schema.task_runs
fails frequently. #43052 - Some SQL queries that contain CTEs may encounter the
Invalid plan: PhysicalTopNOperator
error. #44185
3.1.10 (Yanked)
This version has been taken offline due to privilege issues in querying external tables in external catalogs such as Hive and Iceberg.
-
Problem: When a user queries data from an external table in an external catalog, access to this table is denied even when the user has the SELECT privilege on this table. SHOW GRANTS also shows that the user has this privilege.
-
Impact scope: This problem only affects queries on external tables in external catalogs. Other queries are not affected.
-
Temporary workaround: The query succeeds after the SELECT privilege on this table is granted to the user again. But
SHOW GRANTS
will return duplicate privilege entries. After an upgrade to v3.1.11, users can runREVOKE
to remove one of the privilege entries.
Release date: March 29, 2024
New Features
- Primary Key tables support Size-tiered Compaction. #42474
- Added a pattern-matching function
regexp_extract_all
. #42178
Behavior Changes
- When null values in JSON data are evaluated based on the
IS NULL
operator, they are considered NULL values following SQL language. For example,true
is returned forSELECT parse_json('{"a": null}') -> 'a' IS NULL
(before this behavior change,false
is returned). #42815
Improvements
- When Broker Load is used to load data from ORC files that contain TIMESTAMP-type data, StarRocks supports retaining microseconds in the timestamps when converting the timestamps to match its own DATETIME data type. #42348
Bug Fixes
Fixed the following issues:
- In shared-data mode, the garbage collection and thread eviction mechanisms for handling persistent indexes created on Primary Key tables cannot take effect on CN nodes. As a result, obsolete data cannot be deleted. #42241
- When users query ORC files by using Hive catalogs, the query results may be incorrect because StarRocks used to read ORC files from Hive based on mapping by position. To resolve this issue, users can set the session variable
orc_use_column_names
totrue
, which specifies to read ORC files from Hive based on mapping by column name. #42905 - When LDAP authentication for the AD system is adopted, logins without passwords are allowed. #42476
- When disk device names end with digits, the values of monitoring metrics remain 0s because the disk device names may be invalid after such digits are removed. #42741
3.1.9
Release date: March 8, 2024
New Features
- Cloud-native Primary Key tables in shared-data clusters support Size-tiered Compaction to reduce write I/O amplification for the loading of a large number of small-sized files. #41610
- Added the view
information_schema.partitions_meta
, which records detailed metadata of partitions. #41101 - Added the view
sys.fe_memory_usage
, which records the memory usage for StarRocks. #41083
Behavior Changes
- The logic of dynamic partitioning is changed. Now partition columns of the DATE type do not support hour-level data. Note that partition columns of the DATETIME type still support hour-level data. #40328
- The user who can refresh materialized views is changed from the
root
user to the user who creates the materialized views. This change does not affect existing materialized views. #40698 - By default, when comparing columns of constant and string types, StarRocks compares them as strings. Users can use the session variable
cbo_eq_base_type
to adjust the default rule used for the comparison. For example, users can setcbo_eq_base_type
todecimal
, and StarRocks then compares the columns as numeric values. #41712
Improvements
- StarRocks supports using the parameter
s3_compatible_fs_list
to specify which S3-compatible object storage can be accessed via AWS SDK, and supports using the parameterfallback_to_hadoop_fs_list
to specify non-S3-compatible object storage that require access via HDFS Schema (this method necessitates the use of vendor-provided JAR packages). #41612 - The compatibility with Trino's SQL statement syntax is optimized to support converting the following functions of Trino:
current_catalog
,current_schema
,to_char
,from_hex
,to_date
,to_timestamp
, andindex
. #41505 #41270 #40838 - A new session variable
cbo_materialized_view_rewrite_related_mvs_limit
is added to control the maximum number of candidate materialized views allowed during query planning. The default value of this session variable is64
. This session variable helps mitigate the excessive resource consumption caused by a large number of candidate materialized views for a query during the query planning. #39829 - The
agg_type
of BITMAP-type columns in an Aggregate table can be set toreplace_if_not_null
to support updates only to a few columns of the table. #42102 - The session variable
cbo_eq_base_type
is optimized to support specifying the implicit conversion rule applied to the comparison of data that contains both string and numeric data types. By default, such data is compared as strings. #40619 - More DATE-type data (for example, "%Y-%m-%e %H:%i") can be recognized to better support partition expressions for Iceberg tables. #40474
- The JDBC connector supports the TIME data type. #31940
- The
path
parameter in the SQL statement for creating a file external table supports wildcards (*
). However, like theDATA INFILE
parameter in the SQL statement for creating a Broker Load job, thepath
parameter supports using wildcards (*
) to match at most one level of directory or file. #40844 - A new internal SQL log file is added to record log data related to statistics and materialized views. #40682
Bug Fixes
Fixed the following issues:
- "Analyze Error" is thrown if inconsistent letter cases are assigned to the names or aliases of tables or views queried in the creation of a Hive view. #40921
- I/O usage reaches the upper limit if persistent indexes are created on Primary Key tables. #39959
- In shared-data clusters, the primary key index directory is deleted every 5 hours. #40745
- After a table for which list partitioning is enabled is truncated or its partitions are truncated, queries based on the partitioning keys of the table return no data. #40495
- After users execute ALTER TABLE COMPACT by hand, the memory usage statistics for compaction operations are abnormal. #41150
- During data migration between clusters, if only some columns are updated in column mode, the destination cluster may crash. #40692
- The SQL blacklist may not take effect if the submitted SQL statement contains multiple spaces or newline characters. #40457
3.1.8
Release date: February 5, 2024
New Features
- StarRocks Community provides the StarRocks Cross-cluster Data Migration Tool, which supports migrating data from a shared-nothing cluster to either another shared-nothing cluster or a shared-data cluster.
- Supports creating synchronous materialized views with the WHERE clause specified.
- Added metrics that show memory usage of the data cache to MemTracker. #39600
Parameter Change
- Added a BE configuration item,
lake_pk_compaction_max_input_rowsets
, which controls the maximum number of input rowsets allowed in a Primary Key table compaction task in a shared-data StarRocks cluster. This helps optimize resource consumption for compaction tasks. #39611
Improvements
- Supports ORDER BY and INDEX clauses in CTAS statements. #38886
- Supports equality deletes on ORC-formatted Iceberg v2 tables. #37419
- Supports setting the
datacache.partition_duration
property for cloud-native tables created with the list partitioning strategy. This property controls the validity period of the data cache and can be dynamically configured. #35681 #38509 - Optimized the BE configuration item
update_compaction_per_tablet_min_interval_seconds
. This parameter is originally used only to control the frequency of compaction tasks on Primary Key tables. After the optimization, it can also be used to control the frequency of major compaction tasks on Primary Key table indexes. #39640 - Parquet Reader supports converting INT32-type data in Parquet-formatted data to DATETIME-type data and storing the resulting data to StarRocks. #39808
Bug Fixes
Fixed the following issues:
- Using NaN (Not a Number) columns as ORDER BY columns may cause BEs to crash. #30759
- Failure to update primary key indexes may cause the error "get_applied_rowsets failed". #27488
- The resources occupied by compaction_state_cache are not recycled after compaction task failures. #38499
- If partition columns in external tables contain null values, queries against those tables will cause BEs to crash. #38888
- After a table is dropped and then re-created with the same table name, refreshing asynchronous materialized views created on that table fails. #38008
- Refreshing asynchronous materialized views created on empty Iceberg tables fail. #24068
3.1.7
Release date: January 12, 2024
New Features
- Added a new function,
unnest_bitmap
. #38136 - Supports conditional updates for Broker Load. #37400
Behavior Change
- Added the session variable
enable_materialized_view_for_insert
, which controls whether materialized views rewrite the queries in INSERT INTO SELECT statements. The default value isfalse
. #37505 - The FE dynamic parameter
enable_new_publish_mechanism
is changed to a static parameter. You must restart the FE after you modify the parameter settings. #35338 - Added the session variable
enable_strict_order_by
. When this variable is set to the default valueTRUE
, an error is reported for such a query pattern: Duplicate alias is used in different expressions of the query and this alias is also a sorting field in ORDER BY, for example,select distinct t1.* from tbl1 t1 order by t1.k1;
. The logic is the same as that in v2.3 and earlier. When this variable is set toFALSE
, a loose deduplication mechanism is used, which processes such queries as valid SQL queries. #37910
Parameter Change
- Added the FE configuration item
routine_load_unstable_threshold_second
. #36222 - Added the FE configuration item
http_worker_threads_num
, which specifies the number of threads for HTTP server to deal with HTTP requests. The default value is0
. If the value for this parameter is set to a negative value or0
, the actual thread number is twice the number of CPU cores. #37530 - Added the BE configuration item
pindex_major_compaction_limit_per_disk
to configure the maximum concurrency of compaction on a disk. This addresses the issue of uneven I/O across disks due to compaction. This issue can cause excessively high I/O for certain disks. The default value is1
. #36681 - Added session variables
transaction_read_only
andtx_read_only
to specify the transaction access mode, which are compatible with MySQL versions 5.7.20 and above. #37249 - Added the FE configuration item
default_mv_refresh_immediate
, which specifies whether to immediately refresh the materialized view after the materialized view is created. The default value istrue
. #37093 - Added a new BE configuration item
lake_enable_vertical_compaction_fill_data_cache
, which specifies whether to allow compaction tasks to cache data on local disks in a shared-data cluster. The default value isfalse
. #37296
Improvements
- INSERT INTO FILE() SELECT FROM supports reading BINARY-type data from tables and exporting the data to Parquet-formatted files in remote storage. #36797
- Asynchronous materialized views support dynamically setting the
datacache.partition_duration
property, which controls the validity period of the hot data in the data cache. #35681 - Wen using JDK, the default GC algorithm is G1. #37386
- The
date_trunc
,adddate
, andtime_slice
functions support setting theinterval
parameter to values that are accurate to the millisecond and microsecond. #36386 - When the string on the right side of the LIKE operator within the WHERE clause does not include
%
or_
, the LIKE operator is converted into the=
operator. #37515 - A new field
LatestSourcePosition
is added to the return result of SHOW ROUTINE LOAD to record the position of the latest message in each partition of the Kafka topic, helping check the latencies of data loading. #38298 - Added a new resource group property,
spill_mem_limit_threshold
, to control the memory usage threshold (percentage) at which a resource group triggers the spilling of intermediate results when the system variablespill_mode
is set toauto
. The valid range is (0, 1). The default value is1
, indicating the threshold does not take effect. #37707 - The result returned by the SHOW ROUTINE LOAD statement now includes the timestamps of consumption messages from each partition. #36222
- The scheduling policy for Routine Load is optimized, so that slow tasks do not block the execution of the other normal tasks. #37638
Bug Fixes
Fixed the following issues:
- The execution of ANALYZE TABLE gets stuck occasionally. #36836
- The memory consumption by PageCache exceeds the threshold specified by the BE dynamic parameter
storage_page_cache_limit
in certain circumstances. #37740 - Hive metadata in Hive catalogs is not automatically refreshed when new fields are added to Hive tables. #37668
- In some cases,
bitmap_to_string
may return incorrect results due to data type overflow. #37405 - Executing the DELETE statement on an empty table returns "ERROR 1064 (HY000): Index: 0, Size: 0". #37461
- When the FE dynamic parameter
enable_sync_publish
is set toTRUE
, queries on data that is written after the BEs crash and then restart may fail. #37398 - The value of the
TABLE_CATALOG
field inviews
of the StarRocks Information Schema isnull
. #37570 - When
SELECT ... FROM ... INTO OUTFILE
is executed to export data into CSV files, the error "Unmatched number of columns" is reported if the FROM clause contains multiple constants. #38045
3.1.6
Release date: December 18, 2023
New Features
- Added the now(p) function to return the current date and time with the specified fractional seconds precision (accurate to the microsecond). If
p
is not specified, this function returns only date and time accurate to the second. #36676 - Added a new metric
max_tablet_rowset_num
for setting the maximum allowed number of rowsets. This metric helps detect possible compaction issues and thus reduces the occurrences of the error "too many versions". #36539 - Supports obtaining heap profiles by using a command line tool, making troubleshooting easier.#35322
- Supports creating asynchronous materialized views with common table expressions (CTEs). #36142
- Added the following bitmap functions: subdivide_bitmap, bitmap_from_binary, and bitmap_to_binary. #35817 #35621
- Optimized the logic used to compute compaction scores for Primary Key tables, thereby aligning the compaction scores for Primary Key tables within a more consistent range with the other three table types. #36534
Parameter Change
- The default retention period of trash files is changed to 1 day from the original 3 days. #37113
- A new BE configuration item
enable_stream_load_verbose_log
is added. The default value isfalse
. With this parameter set totrue
, StarRocks can record the HTTP requests and responses for Stream Load jobs, making troubleshooting easier. #36113 - A new BE configuration item
enable_lazy_delta_column_compaction
is added. The default value istrue
, indicating that StarRocks does not perform frequent compaction operations on delta columns. #36654 - A new FE configuration item
enable_mv_automatic_active_check
is added to control whether the system automatically checks and re-activates the asynchronous materialized views that are set inactive because their base tables (views) had undergone Schema Change or had been dropped and re-created. The default value istrue
. #36463
Improvements
- A new value option
GROUP_CONCAT_LEGACY
is added to the session variable sql_mode to provide compatibility with the implementation logic of the group_concat function in versions earlier than v2.5. #36150 - The Primary Key table size returned by the SHOW DATA statement includes the sizes of .cols files (these are files related to partial column updates and generated columns) and persistent index files. #34898
- Queries on MySQL external tables and the external tables within JDBC catalogs support including keywords in the WHERE clause. #35917
- Plugin loading failures will no longer cause an error or cause an FE start failure. Instead, the FE can properly start, and the error status of the plug-in can be queried using SHOW PLUGINS. #36566
- Dynamic partitioning supports random distribution. #35513
- The result returned by the SHOW ROUTINE LOAD statement provides a new field
OtherMsg
, which shows information about the last failed task. #35806 - The authentication information
aws.s3.access_key
andaws.s3.access_secret
for AWS S3 in Broker Load jobs are hidden in audit logs. #36571 - The
be_tablets
view in theinformation_schema
database provides a new fieldINDEX_DISK
, which records the disk usage (measured in bytes) of persistent indexes #35615
Bug Fixes
Fixed the following issues:
- The BEs crash if users create persistent indexes in the event of data corruption. #30841
- If users create an asynchronous materialized view that contains nested queries, the error "resolve partition column failed" is reported. #26078
- If users create an asynchronous materialized view on a base table whose data is corrupted, the error "Unexpected exception: null" is reported. #30038
- If users run a query that contains a window function, the SQL error "[1064] [42000]: Row count of const column reach limit: 4294967296" is reported. #33561
- The FE performance plunges after the FE configuration item
enable_collect_query_detail_info
is set totrue
. #35945 - In the StarRocks shared-data mode, the error "Reduce your request rate" may be reported when users attempt to delete files from object storage. #35566
- Deadlocks may occur when users refresh materialized views. #35736
- After the DISTINCT window operator pushdown feature is enabled, errors are reported if SELECT DISTINCT operations are performed on the complex expressions of the columns computed by window functions. #36357
- The BEs crash if the source data file is in ORC format and contains nested arrays. #36127
- Some S3-compatible object storage returns duplicate files, causing the BEs to crash. #36103
- The array_distinct function occasionally causes the BEs to crash. #36377
- Global Runtime Filter may cause BEs to crash in certain scenarios. #35776
3.1.5
Release date: November 28, 2023
New features
- The CN nodes of a StarRocks shared-data cluster now support data export. #34018
Improvements
- The
COLUMNS
view in the system databaseINFORMATION_SCHEMA
can display ARRAY, MAP, and STRUCT columns. #33431 - Supports queries against Parquet, ORC, and CSV formatted files that are compressed by using LZO and stored in Hive. #30923 #30721
- Supports updates onto the specified partitions of an automatically partitioned table. If the specified partitions do not exist, an error is returned. #34777
- Supports automatic refresh of materialized views when Swap, Drop, or Schema Change operations are performed on the tables and views (including the other tables and materialized views associated with these views) on which these materialized views are created. #32829
- Optimized the performance of some Bitmap-related operations, including:
Bug Fixes
Fixed the following issues:
- If a filtering condition is specified in a Broker Load job, BEs may crash during the data loading in certain circumstances. #29832
- An unknown error is reported when SHOW GRANTS is executed. #30100
- When data is loaded into a table that uses expression-based automatic partitioning, the error "Error: The row create partition failed since Runtime error: failed to analyse partition value" may be thrown. #33513
- The error "get_applied_rowsets failed, tablet updates is in error state: tablet:18849 actual row size changed after compaction" is returned for queries. #33246
- In a StarRocks shared-nothing cluster, queries against Iceberg or Hive tables may cause BEs to crash. #34682
- In a StarRocks shared-nothing cluster, if multiple partitions are automatically created during data loading, the data loaded may occasionally be written to unmatched partitions. #34731
- Long-time, frequent data loading into a Primary Key table with persistent index enabled may cause BEs to crash. #33220
- The error "Exception: java.lang.IllegalStateException: null" is returned for queries. #33535
- When
show proc '/current_queries';
is being executed and meanwhile a query begins to be executed, BEs may crash. #34316 - Errors may be thrown if large amounts of data are loaded into a Primary Key table with persistent index enabled. #34352
- After StarRocks is upgraded from v2.4 or earlier to a later version, compaction scores may rise unexpectedly. #34618
- If
INFORMATION_SCHEMA
is queried by using the database driver MariaDB ODBC, theCATALOG_NAME
column returned in theschemata
view holds onlynull
values. #34627 - FEs crash due to the abnormal data loaded and cannot restart. #34590
- If schema changes are being executed while a Stream Load job is in the PREPARED state, a portion of the source data to be loaded by the job is lost. #34381
- Including two or more slashes (
/
) at the end of the HDFS storage path causes the backup and restore of the data from HDFS to fail. #34601 - Setting the session variable
enable_load_profile
totrue
makes Stream Load jobs prone to fail. #34544 - Performing partial updates in column mode onto a Primary Key table causes some tablets of the table to show data inconsistencies between their replicas. #34555
- The
partition_live_number
property added by using the ALTER TABLE statement does not take effect. #34842 - FEs fail to start and report the error "failed to load journal type 118". #34590
- Setting the FE parameter
recover_with_empty_tablet
totrue
may cause FEs to crash. #33071 - Failures in replaying replica operations may cause FEs to crash. #32295
Parameter Change
FE/BE Parameters
- Added an FE configuration item
enable_statistics_collect_profile
, which controls whether to generate profiles for statistics queries. The default value isfalse
. #33815 - The FE configuration item
mysql_server_version
is now mutable. The new setting can take effect for the current session without requiring an FE restart. #34033 - Added a BE/CN configuration item
update_compaction_ratio_threshold
, which controls the maximum proportion of data that a compaction can merge for a Primary Key table in a StarRocks shared-data cluster. The default value is0.5
. We recommend shrinking this value if a single tablet becomes excessively large. For a StarRocks shared-nothing cluster, the proportion of data that a compaction can merge for a Primary Key table is still automatically adjusted. #35129
System Variables
- Added a session variable
cbo_decimal_cast_string_strict
, which controls how the CBO converts data from the DECIMAL type to the STRING type. If this variable is set totrue
, the logic built in v2.5.x and later versions prevails and the system implements strict conversion (namely, the system truncates the generated string and fills 0s based on the scale length). If this variable is set tofalse
, the logic built in versions earlier than v2.5.x prevails and the system processes all valid digits to generate a string. The default value istrue
. #34208 - Added a session variable
cbo_eq_base_type
, which specifies the data type used for data comparison between DECIMAL-type data and STRING-type data. The default value isVARCHAR
, andDECIMAL
is also a valid value. #34208 - Added a session variable
big_query_profile_second_threshold
. When the session variableenable_profile
is set tofalse
and the amount of time taken by a query exceeds the threshold specified by thebig_query_profile_second_threshold
variable, a profile is generated for that query. #33825
3.1.4
Release date: November 2, 2023
New Features
- Supports sort keys for Primary Key tables created in shared-data StarRocks clusters.
- Supports using the str2date function to specify partition expressions for asynchronous materialized views. This helps facilitate incremental updates and query rewrites of asynchronous materialized views created on tables that reside in external catalogs and use the STRING-type data as their partitioning expressions. #29923 #31964
- Added a new session variable
enable_query_tablet_affinity
, which controls whether to direct multiple queries against the same tablet to a fixed replica. This session variable is set tofalse
by default. #33049 - Added the utility function
is_role_in_session
, which is used to check whether the specified roles are activated in the current session. It supports checking nested roles granted to a user. #32984 - Supports setting resource group-level query queue, which is controlled by the global variable
enable_group_level_query_queue
(default value:false
). When the global-level or resource group-level resource consumption reaches a predefined threshold, new queries are placed in queue, and will be run when both the global-level resource consumption and the resource group-level resource consumption fall below their thresholds.- Users can set
concurrency_limit
for each resource group to limit the maximum number of concurrent queries allowed per BE. - Users can set
max_cpu_cores
for each resource group to limit the maximum CPU consumption allowed per BE.
- Users can set
- Added two parameters,
plan_cpu_cost_range
andplan_mem_cost_range
, for resource group classifiers.plan_cpu_cost_range
: the CPU consumption range estimated by the system. The default valueNULL
indicates no limit is imposed.plan_mem_cost_range
: the memory consumption range estimated by the system. The default valueNULL
indicates no limit is imposed.