StarRocks version 3.1

3.1.11

Release date: April 28, 2024

Behavior Changes

Users are not allowed to drop views in the system database information_schema using DROP TABLE. #43556
Users are not allowed to specify duplicate keys in the ORDER BY clause when creating a Primary Key table. #43374

Improvements

Queries on Parquet-formatted Iceberg v2 tables support equality deletes.

Bug Fixes

Fixed the following issues:

When a user queries data from an external table in an external catalog, access to this table is denied even when the user has the SELECT privilege on this table. SHOW GRANTS also shows that the user has this privilege. #44061
str_to_map may cause BEs to crash. #43930
When a Routine Load job is going on, running show proc '/routine_loads' is stuck due to deadlock. #44249
Persistent Index of Primary Key tables may cause BEs to crash due to issues in concurrency control. #43720
The pending_task_run_count displayed on the page of leaderFE_IP:8030 is incorrect. The displayed number is the sum of Pending and Running tasks, not Pending tasks. In addition, the information of the metric refresh_pending cannot be displayed using followerFE_IP:8030. #43052
Querying information_schema.task_runs fails frequently. #43052
Some SQL queries that contain CTEs may encounter the Invalid plan: PhysicalTopNOperator error. #44185

3.1.10 (Yanked)

tip

This version has been taken offline due to privilege issues in querying external tables in external catalogs such as Hive and Iceberg.

Problem: When a user queries data from an external table in an external catalog, access to this table is denied even when the user has the SELECT privilege on this table. SHOW GRANTS also shows that the user has this privilege.
Impact scope: This problem only affects queries on external tables in external catalogs. Other queries are not affected.
Temporary workaround: The query succeeds after the SELECT privilege on this table is granted to the user again. But SHOW GRANTS will return duplicate privilege entries. After an upgrade to v3.1.11, users can run REVOKE to remove one of the privilege entries.

Release date: March 29, 2024

New Features

Primary Key tables support Size-tiered Compaction. #42474
Added a pattern-matching function regexp_extract_all. #42178

Behavior Changes

When null values in JSON data are evaluated based on the IS NULL operator, they are considered NULL values following SQL language. For example, true is returned for SELECT parse_json('{"a": null}') -> 'a' IS NULL (before this behavior change, false is returned). #42815

Improvements

When Broker Load is used to load data from ORC files that contain TIMESTAMP-type data, StarRocks supports retaining microseconds in the timestamps when converting the timestamps to match its own DATETIME data type. #42348

Bug Fixes

Fixed the following issues:

In shared-data mode, the garbage collection and thread eviction mechanisms for handling persistent indexes created on Primary Key tables cannot take effect on CN nodes. As a result, obsolete data cannot be deleted. #42241
When users query ORC files by using Hive catalogs, the query results may be incorrect because StarRocks used to read ORC files from Hive based on mapping by position. To resolve this issue, users can set the session variable orc_use_column_names to true, which specifies to read ORC files from Hive based on mapping by column name. #42905
When LDAP authentication for the AD system is adopted, logins without passwords are allowed. #42476
When disk device names end with digits, the values of monitoring metrics remain 0s because the disk device names may be invalid after such digits are removed. #42741

3.1.9

Release date: March 8, 2024

New Features

Cloud-native Primary Key tables in shared-data clusters support Size-tiered Compaction to reduce write I/O amplification for the loading of a large number of small-sized files. #41610
Added the view information_schema.partitions_meta, which records detailed metadata of partitions. #41101
Added the view sys.fe_memory_usage, which records the memory usage for StarRocks. #41083

Behavior Changes

The logic of dynamic partitioning is changed. Now partition columns of the DATE type do not support hour-level data. Note that partition columns of the DATETIME type still support hour-level data. #40328
The user who can refresh materialized views is changed from the root user to the user who creates the materialized views. This change does not affect existing materialized views. #40698
By default, when comparing columns of constant and string types, StarRocks compares them as strings. Users can use the session variable cbo_eq_base_type to adjust the default rule used for the comparison. For example, users can set cbo_eq_base_type to decimal, and StarRocks then compares the columns as numeric values. #41712

Improvements

StarRocks supports using the parameter s3_compatible_fs_list to specify which S3-compatible object storage can be accessed via AWS SDK, and supports using the parameter fallback_to_hadoop_fs_list to specify non-S3-compatible object storage that require access via HDFS Schema (this method necessitates the use of vendor-provided JAR packages). #41612
The compatibility with Trino's SQL statement syntax is optimized to support converting the following functions of Trino: current_catalog, current_schema, to_char, from_hex, to_date, to_timestamp, and index. #41505 #41270 #40838
A new session variable cbo_materialized_view_rewrite_related_mvs_limit is added to control the maximum number of candidate materialized views allowed during query planning. The default value of this session variable is 64. This session variable helps mitigate the excessive resource consumption caused by a large number of candidate materialized views for a query during the query planning. #39829
The agg_type of BITMAP-type columns in an Aggregate table can be set to replace_if_not_null to support updates only to a few columns of the table. #42102
The session variable cbo_eq_base_type is optimized to support specifying the implicit conversion rule applied to the comparison of data that contains both string and numeric data types. By default, such data is compared as strings. #40619
More DATE-type data (for example, "%Y-%m-%e %H:%i") can be recognized to better support partition expressions for Iceberg tables. #40474
The JDBC connector supports the TIME data type. #31940
The path parameter in the SQL statement for creating a file external table supports wildcards (*). However, like the DATA INFILE parameter in the SQL statement for creating a Broker Load job, the path parameter supports using wildcards (*) to match at most one level of directory or file. #40844
A new internal SQL log file is added to record log data related to statistics and materialized views. #40682

Bug Fixes

Fixed the following issues:

"Analyze Error" is thrown if inconsistent letter cases are assigned to the names or aliases of tables or views queried in the creation of a Hive view. #40921
I/O usage reaches the upper limit if persistent indexes are created on Primary Key tables. #39959
In shared-data clusters, the primary key index directory is deleted every 5 hours. #40745
After a table for which list partitioning is enabled is truncated or its partitions are truncated, queries based on the partitioning keys of the table return no data. #40495
After users execute ALTER TABLE COMPACT by hand, the memory usage statistics for compaction operations are abnormal. #41150
During data migration between clusters, if only some columns are updated in column mode, the destination cluster may crash. #40692
The SQL blacklist may not take effect if the submitted SQL statement contains multiple spaces or newline characters. #40457

3.1.8

Release date: February 5, 2024

New Features

StarRocks Community provides the StarRocks Cross-cluster Data Migration Tool, which supports migrating data from a shared-nothing cluster to either another shared-nothing cluster or a shared-data cluster.
Supports creating synchronous materialized views with the WHERE clause specified.
Added metrics that show memory usage of the data cache to MemTracker. #39600

Parameter Change

Added a BE configuration item, lake_pk_compaction_max_input_rowsets, which controls the maximum number of input rowsets allowed in a Primary Key table compaction task in a shared-data StarRocks cluster. This helps optimize resource consumption for compaction tasks. #39611

Improvements

Supports ORDER BY and INDEX clauses in CTAS statements. #38886
Supports equality deletes on ORC-formatted Iceberg v2 tables. #37419
Supports setting the datacache.partition_duration property for cloud-native tables created with the list partitioning strategy. This property controls the validity period of the data cache and can be dynamically configured. #35681 #38509
Optimized the BE configuration item update_compaction_per_tablet_min_interval_seconds. This parameter is originally used only to control the frequency of compaction tasks on Primary Key tables. After the optimization, it can also be used to control the frequency of major compaction tasks on Primary Key table indexes. #39640
Parquet Reader supports converting INT32-type data in Parquet-formatted data to DATETIME-type data and storing the resulting data to StarRocks. #39808

Bug Fixes

Fixed the following issues:

Using NaN (Not a Number) columns as ORDER BY columns may cause BEs to crash. #30759
Failure to update primary key indexes may cause the error "get_applied_rowsets failed". #27488
The resources occupied by compaction_state_cache are not recycled after compaction task failures. #38499
If partition columns in external tables contain null values, queries against those tables will cause BEs to crash. #38888
After a table is dropped and then re-created with the same table name, refreshing asynchronous materialized views created on that table fails. #38008
Refreshing asynchronous materialized views created on empty Iceberg tables fail. #24068

3.1.7

Release date: January 12, 2024

New Features

Added a new function, unnest_bitmap. #38136
Supports conditional updates for Broker Load. #37400

Behavior Change

Added the session variable enable_materialized_view_for_insert, which controls whether materialized views rewrite the queries in INSERT INTO SELECT statements. The default value is false. #37505
The FE dynamic parameter enable_new_publish_mechanism is changed to a static parameter. You must restart the FE after you modify the parameter settings. #35338
Added the session variable enable_strict_order_by. When this variable is set to the default value TRUE, an error is reported for such a query pattern: Duplicate alias is used in different expressions of the query and this alias is also a sorting field in ORDER BY, for example, select distinct t1.* from tbl1 t1 order by t1.k1;. The logic is the same as that in v2.3 and earlier. When this variable is set to FALSE, a loose deduplication mechanism is used, which processes such queries as valid SQL queries. #37910

Parameter Change

Added the FE configuration item routine_load_unstable_threshold_second. #36222
Added the FE configuration item http_worker_threads_num, which specifies the number of threads for HTTP server to deal with HTTP requests. The default value is 0. If the value for this parameter is set to a negative value or 0, the actual thread number is twice the number of CPU cores. #37530
Added the BE configuration item pindex_major_compaction_limit_per_disk to configure the maximum concurrency of compaction on a disk. This addresses the issue of uneven I/O across disks due to compaction. This issue can cause excessively high I/O for certain disks. The default value is 1. #36681
Added session variables transaction_read_only and tx_read_only to specify the transaction access mode, which are compatible with MySQL versions 5.7.20 and above. #37249
Added the FE configuration item default_mv_refresh_immediate, which specifies whether to immediately refresh the materialized view after the materialized view is created. The default value is true. #37093
Added a new BE configuration item lake_enable_vertical_compaction_fill_data_cache, which specifies whether to allow compaction tasks to cache data on local disks in a shared-data cluster. The default value is false. #37296

Improvements

INSERT INTO FILE() SELECT FROM supports reading BINARY-type data from tables and exporting the data to Parquet-formatted files in remote storage. #36797
Asynchronous materialized views support dynamically setting the datacache.partition_duration property, which controls the validity period of the hot data in the data cache. #35681
Wen using JDK, the default GC algorithm is G1. #37386
The date_trunc, adddate, and time_slice functions support setting the interval parameter to values that are accurate to the millisecond and microsecond. #36386
When the string on the right side of the LIKE operator within the WHERE clause does not include % or _, the LIKE operator is converted into the = operator. #37515
A new field LatestSourcePosition is added to the return result of SHOW ROUTINE LOAD to record the position of the latest message in each partition of the Kafka topic, helping check the latencies of data loading. #38298
Added a new resource group property, spill_mem_limit_threshold, to control the memory usage threshold (percentage) at which a resource group triggers the spilling of intermediate results when the system variable spill_mode is set to auto. The valid range is (0, 1). The default value is 1, indicating the threshold does not take effect. #37707
The result returned by the SHOW ROUTINE LOAD statement now includes the timestamps of consumption messages from each partition. #36222
The scheduling policy for Routine Load is optimized, so that slow tasks do not block the execution of the other normal tasks. #37638

Bug Fixes

Fixed the following issues:

The execution of ANALYZE TABLE gets stuck occasionally. #36836
The memory consumption by PageCache exceeds the threshold specified by the BE dynamic parameter storage_page_cache_limit in certain circumstances. #37740
Hive metadata in Hive catalogs is not automatically refreshed when new fields are added to Hive tables. #37668
In some cases, bitmap_to_string may return incorrect results due to data type overflow. #37405
Executing the DELETE statement on an empty table returns "ERROR 1064 (HY000): Index: 0, Size: 0". #37461
When the FE dynamic parameter enable_sync_publish is set to TRUE, queries on data that is written after the BEs crash and then restart may fail. #37398
The value of the TABLE_CATALOG field in views of the StarRocks Information Schema is null. #37570
When SELECT ... FROM ... INTO OUTFILE is executed to export data into CSV files, the error "Unmatched number of columns" is reported if the FROM clause contains multiple constants. #38045

3.1.6

Release date: December 18, 2023

New Features

Added the now(p) function to return the current date and time with the specified fractional seconds precision (accurate to the microsecond). If p is not specified, this function returns only date and time accurate to the second. #36676
Added a new metric max_tablet_rowset_num for setting the maximum allowed number of rowsets. This metric helps detect possible compaction issues and thus reduces the occurrences of the error "too many versions". #36539
Supports obtaining heap profiles by using a command line tool, making troubleshooting easier.#35322
Supports creating asynchronous materialized views with common table expressions (CTEs). #36142
Added the following bitmap functions: subdivide_bitmap, bitmap_from_binary, and bitmap_to_binary. #35817 #35621
Optimized the logic used to compute compaction scores for Primary Key tables, thereby aligning the compaction scores for Primary Key tables within a more consistent range with the other three table types. #36534

Parameter Change

The default retention period of trash files is changed to 1 day from the original 3 days. #37113
A new BE configuration item enable_stream_load_verbose_log is added. The default value is false. With this parameter set to true, StarRocks can record the HTTP requests and responses for Stream Load jobs, making troubleshooting easier. #36113
A new BE configuration item enable_lazy_delta_column_compaction is added. The default value is true, indicating that StarRocks does not perform frequent compaction operations on delta columns. #36654
A new FE configuration item enable_mv_automatic_active_check is added to control whether the system automatically checks and re-activates the asynchronous materialized views that are set inactive because their base tables (views) had undergone Schema Change or had been dropped and re-created. The default value is true. #36463

Improvements

A new value option GROUP_CONCAT_LEGACY is added to the session variable sql_mode to provide compatibility with the implementation logic of the group_concat function in versions earlier than v2.5. #36150
The Primary Key table size returned by the SHOW DATA statement includes the sizes of .cols files (these are files related to partial column updates and generated columns) and persistent index files. #34898
Queries on MySQL external tables and the external tables within JDBC catalogs support including keywords in the WHERE clause. #35917
Plugin loading failures will no longer cause an error or cause an FE start failure. Instead, the FE can properly start, and the error status of the plug-in can be queried using SHOW PLUGINS. #36566
Dynamic partitioning supports random distribution. #35513
The result returned by the SHOW ROUTINE LOAD statement provides a new field OtherMsg, which shows information about the last failed task. #35806
The authentication information aws.s3.access_key and aws.s3.access_secret for AWS S3 in Broker Load jobs are hidden in audit logs. #36571
The be_tablets view in the information_schema database provides a new field INDEX_DISK, which records the disk usage (measured in bytes) of persistent indexes #35615

Bug Fixes

Fixed the following issues:

The BEs crash if users create persistent indexes in the event of data corruption. #30841
If users create an asynchronous materialized view that contains nested queries, the error "resolve partition column failed" is reported. #26078
If users create an asynchronous materialized view on a base table whose data is corrupted, the error "Unexpected exception: null" is reported. #30038
If users run a query that contains a window function, the SQL error "[1064] [42000]: Row count of const column reach limit: 4294967296" is reported. #33561
The FE performance plunges after the FE configuration item enable_collect_query_detail_info is set to true. #35945
In the StarRocks shared-data mode, the error "Reduce your request rate" may be reported when users attempt to delete files from object storage. #35566
Deadlocks may occur when users refresh materialized views. #35736
After the DISTINCT window operator pushdown feature is enabled, errors are reported if SELECT DISTINCT operations are performed on the complex expressions of the columns computed by window functions. #36357
The BEs crash if the source data file is in ORC format and contains nested arrays. #36127
Some S3-compatible object storage returns duplicate files, causing the BEs to crash. #36103
The array_distinct function occasionally causes the BEs to crash. #36377
Global Runtime Filter may cause BEs to crash in certain scenarios. #35776

3.1.5

Release date: November 28, 2023

New features

The CN nodes of a StarRocks shared-data cluster now support data export. #34018

Improvements

The COLUMNS view in the system database INFORMATION_SCHEMA can display ARRAY, MAP, and STRUCT columns. #33431
Supports queries against Parquet, ORC, and CSV formatted files that are compressed by using LZO and stored in Hive. #30923 #30721
Supports updates onto the specified partitions of an automatically partitioned table. If the specified partitions do not exist, an error is returned. #34777
Supports automatic refresh of materialized views when Swap, Drop, or Schema Change operations are performed on the tables and views (including the other tables and materialized views associated with these views) on which these materialized views are created. #32829
Optimized the performance of some Bitmap-related operations, including:
- Optimized nested loop joins. #340804 #35003
- Optimized the bitmap_xor function. #34069
- Supports Copy on Write to optimize Bitmap performance and reduce memory consumption. #34047

Bug Fixes

Fixed the following issues:

If a filtering condition is specified in a Broker Load job, BEs may crash during the data loading in certain circumstances. #29832
An unknown error is reported when SHOW GRANTS is executed. #30100
When data is loaded into a table that uses expression-based automatic partitioning, the error "Error: The row create partition failed since Runtime error: failed to analyse partition value" may be thrown. #33513
The error "get_applied_rowsets failed, tablet updates is in error state: tablet:18849 actual row size changed after compaction" is returned for queries. #33246
In a StarRocks shared-nothing cluster, queries against Iceberg or Hive tables may cause BEs to crash. #34682
In a StarRocks shared-nothing cluster, if multiple partitions are automatically created during data loading, the data loaded may occasionally be written to unmatched partitions. #34731
Long-time, frequent data loading into a Primary Key table with persistent index enabled may cause BEs to crash. #33220
The error "Exception: java.lang.IllegalStateException: null" is returned for queries. #33535
When show proc '/current_queries'; is being executed and meanwhile a query begins to be executed, BEs may crash. #34316
Errors may be thrown if large amounts of data are loaded into a Primary Key table with persistent index enabled. #34352
After StarRocks is upgraded from v2.4 or earlier to a later version, compaction scores may rise unexpectedly. #34618
If INFORMATION_SCHEMA is queried by using the database driver MariaDB ODBC, the CATALOG_NAME column returned in the schemata view holds only null values. #34627
FEs crash due to the abnormal data loaded and cannot restart. #34590
If schema changes are being executed while a Stream Load job is in the PREPARED state, a portion of the source data to be loaded by the job is lost. #34381
Including two or more slashes (/) at the end of the HDFS storage path causes the backup and restore of the data from HDFS to fail. #34601
Setting the session variable enable_load_profile to true makes Stream Load jobs prone to fail. #34544
Performing partial updates in column mode onto a Primary Key table causes some tablets of the table to show data inconsistencies between their replicas. #34555
The partition_live_number property added by using the ALTER TABLE statement does not take effect. #34842
FEs fail to start and report the error "failed to load journal type 118". #34590
Setting the FE parameter recover_with_empty_tablet to true may cause FEs to crash. #33071
Failures in replaying replica operations may cause FEs to crash. #32295

Parameter Change

FE/BE Parameters

Added an FE configuration item enable_statistics_collect_profile, which controls whether to generate profiles for statistics queries. The default value is false. #33815
The FE configuration item mysql_server_version is now mutable. The new setting can take effect for the current session without requiring an FE restart. #34033
Added a BE/CN configuration item update_compaction_ratio_threshold, which controls the maximum proportion of data that a compaction can merge for a Primary Key table in a StarRocks shared-data cluster. The default value is 0.5. We recommend shrinking this value if a single tablet becomes excessively large. For a StarRocks shared-nothing cluster, the proportion of data that a compaction can merge for a Primary Key table is still automatically adjusted. #35129

System Variables

Added a session variable cbo_decimal_cast_string_strict, which controls how the CBO converts data from the DECIMAL type to the STRING type. If this variable is set to true, the logic built in v2.5.x and later versions prevails and the system implements strict conversion (namely, the system truncates the generated string and fills 0s based on the scale length). If this variable is set to false, the logic built in versions earlier than v2.5.x prevails and the system processes all valid digits to generate a string. The default value is true. #34208
Added a session variable cbo_eq_base_type, which specifies the data type used for data comparison between DECIMAL-type data and STRING-type data. The default value is VARCHAR, and DECIMAL is also a valid value. #34208
Added a session variable big_query_profile_second_threshold. When the session variable enable_profile is set to false and the amount of time taken by a query exceeds the threshold specified by the big_query_profile_second_threshold variable, a profile is generated for that query. #33825

3.1.4

Release date: November 2, 2023

New Features

Supports sort keys for Primary Key tables created in shared-data StarRocks clusters.
Supports using the str2date function to specify partition expressions for asynchronous materialized views. This helps facilitate incremental updates and query rewrites of asynchronous materialized views created on tables that reside in external catalogs and use the STRING-type data as their partitioning expressions. #29923 #31964
Added a new session variable enable_query_tablet_affinity, which controls whether to direct multiple queries against the same tablet to a fixed replica. This session variable is set to false by default. #33049
Added the utility function is_role_in_session, which is used to check whether the specified roles are activated in the current session. It supports checking nested roles granted to a user. #32984
Supports setting resource group-level query queue, which is controlled by the global variable enable_group_level_query_queue (default value: false). When the global-level or resource group-level resource consumption reaches a predefined threshold, new queries are placed in queue, and will be run when both the global-level resource consumption and the resource group-level resource consumption fall below their thresholds.
- Users can set concurrency_limit for each resource group to limit the maximum number of concurrent queries allowed per BE.
- Users can set max_cpu_cores for each resource group to limit the maximum CPU consumption allowed per BE.
Added two parameters, plan_cpu_cost_range and plan_mem_cost_range, for resource group classifiers.
- plan_cpu_cost_range: the CPU consumption range estimated by the system. The default value NULL indicates no limit is imposed.
- plan_mem_cost_range: the memory consumption range estimated by the system. The default value NULL indicates no limit is imposed.

Improvements

Window functions COVAR_SAMP, COVAR_POP, CORR, VARIANCE, VAR_SAMP, STD, and STDDEV_SAMP now support the ORDER BY clause and Window clause. #30786
An error instead of NULL is returned if a decimal overflow occurs during queries on the DECIMAL type data. #30419
The number of concurrent queries allowed in a query queue is now managed by the leader FE. Each follower FE notifies of the leader FE when a query starts and finishes. If the number of concurrent queries reaches the global-level or resource group-level concurrency_limit, new queries are rejected or placed in queue.

Bug Fixes

Fixed the following issues:

Spark or Flink may report data read errors due to inaccurate memory usage statistics. #30702 #30751
Memory usage statistics for Metadata Cache are inaccurate. #31978
BEs crash when libcurl is invoked. #31667
When StarRocks materialized views created on Hive views are refreshed, an error "java.lang.ClassCastException: com.starrocks.catalog.HiveView cannot be cast to com.starrocks.catalog.HiveMetaStoreTable" is returned. #31004
If the ORDER BY clause contains aggregate functions, an error "java.lang.IllegalStateException: null" is returned. #30108
In shared-data StarRocks clusters, the information of table keys is not recorded in information_schema.COLUMNS. As a result, DELETE operations cannot be performed when data is loaded by using Flink Connector. #31458
When data is loaded by using Flink Connector, the load job is suspended unexpectedly if there are highly concurrent load jobs and both the number of HTTP threads and the number of Scan threads have reached their upper limits. #32251
When a field of only a few bytes is added, executing SELECT COUNT(*) before the data change finishes returns an error that reads "error: invalid field name". #33243
Query results are incorrect after the query cache is enabled. #32781
Queries fail during hash joins, causing BEs to crash. #32219
DATA_TYPE and COLUMN_TYPE for BINARY or VARBINARY data types are displayed as unknown in the information_schema.columns view. #32678

Behavior Change

From v3.1.4 onwards, persistent indexing is enabled by default for Primary Key tables created in new StarRocks clusters (this does not apply to existing StarRocks clusters whose versions are upgraded to v3.1.4 from an earlier version). #33374
A new FE parameter enable_sync_publish which is set to true by default is added. When this parameter is set to true, the Publish phase of a data load into a Primary Key table returns the execution result only after the Apply task finishes. As such, the data loaded can be queried immediately after the load job returns a success message. However, setting this parameter to true may cause data loads into Primary Key tables to take a longer time. (Before this parameter is added, the Apply task is asynchronous with the Publish phase.) #27055

3.1.3

Release date: September 25, 2023

New Features

Primary Key tables created in shared-data StarRocks clusters support index persistence onto local disks in the same way as they do in shared-nothing StarRocks clusters.
The aggregate function group_concat supports the DISTINCT keyword and the ORDER BY clause. #28778
Stream Load, Broker Load, Kafka Connector, Flink Connector, and Spark Connector support partial updates in column mode on a Primary Key table. #28288
Data in partitions can be automatically cooled down over time. (This feature is not supported for list partitioning.) #29335 #29393

Improvements

Executing SQL commands with invalid comments now returns results consistent with MySQL. #30210

Bug Fixes

Fixed the following issues:

If the BITMAP or HLL data type is specified in the WHERE clause of a DELETE statement to be executed, the statement cannot be properly executed. #28592
After a follower FE is restarted, CpuCores statistics are not up-to-date, resulting in query performance degradation. #28472 #30434
The execution cost of the to_bitmap() function is incorrectly calculated. As a result, an inappropriate execution plan is selected for the function after materialized views are rewritten. #29961
In certain use cases of the shared-data architecture, after a follower FE is restarted, queries submitted to the follower FE return an error that reads "Backend node not found. Check if any backend node is down". #28615
If data is continuously loaded into a table that is being altered by using the ALTER TABLE statement, an error "Tablet is in error state" may be thrown. #29364
Modifying the FE dynamic parameter max_broker_load_job_concurrency using the ADMIN SET FRONTEND CONFIG command does not take effect. #29964 #29720
BEs crash if the time unit in the date_diff() function is a constant but the dates are not constants. #29937
In the shared-data architecture, automatic partitioning does not take effect after asynchronous load is enabled. #29986
If users create a Primary Key table by using the CREATE TABLE LIKE statement, an error Unexpected exception: Unknown properties: {persistent_index_type=LOCAL} is thrown. #30255
Restoring Primary Key tables causes metadata inconsistency after BEs are restarted. #30135
If users load data into a Primary Key table on which truncate operations and queries are concurrently performed, an error "java.lang.NullPointerException" is thrown in certain cases. #30573
If predicate expressions are specified in materialized view creation statements, the refresh results of those materialized views are incorrect. #29904
After users upgrade their StarRocks cluster to v3.1.2, the storage volume properties of the tables created before the upgrade are reset to null. #30647
If checkpointing and restoration are concurrently performed on tablet metadata, some tablet replicas will be lost and cannot be retrieved. #30603
If users use CloudCanal to load data into table columns that are set to NOT NULL but have no default value specified, an error "Unsupported dataFormat value is : \N" is thrown. #30799

Behavior Change

When using the group_concat function, users must use the SEPARATOR keyword to declare the separator.
The default value of the session variable group_concat_max_len which controls the default maximum length of the string returned by the group_concat function is changed from unlimited to 1024.

3.1.2

Release date: August 25, 2023

Bug Fixes

Fixed the following issues:

If a user specifies which database is to be connected by default and the user only has permissions on tables in the database but does not have permissions on the database, an error stating that the user does not have permissions on the database is thrown. #29767
The values returned by the RESTful API action show_data for cloud-native tables are incorrect. #29473
BEs crash if queries are canceled while the array_agg() function is being run. #29400
The Default field values returned by the SHOW FULL COLUMNS statement for columns of the BITMAP or HLL data type are incorrect. #29510
If the array_map() function in queries involves multiple tables, the queries fail due to pushdown strategy issues. #29504
Queries against ORC-formatted files fail because the bugfix ORC-1304 (apache/orc#1299) from Apache ORC is not merged. #29804

Behavior Change

For a newly deployed StarRocks v3.1 cluster, you must have the USAGE privilege on the destination external catalog if you want to run SET CATALOG to switch to that catalog. You can use GRANT to grant the required privileges.

For a v3.1 cluster upgraded from an earlier version, you can run SET CATALOG with inherited privilege.

3.1.1

Release date: August 18, 2023

New Features

Supports Azure Blob Storage for shared-data clusters.
Supports List partitioning for shared-data clusters.
Supports aggregate functions COVAR_SAMP, COVAR_POP, and CORR.
Supports the following window functions: COVAR_SAMP, COVAR_POP, CORR, VARIANCE, VAR_SAMP, STD, and STDDEV_SAMP.

Improvements

Supports implicit conversions for all compound predicates and for all expressions in the WHERE clause. You can enable or disable implicit conversions by using the session variable enable_strict_type. The default value of this session variable is false.

Bug Fixes

Fixed the following issues:

When data is loaded into tables that have multiple replicas, a large number of invalid log records are written if some partitions of the tables are empty. #28824
Inaccurate estimation of average row size causes partial updates in column mode on Primary Key tables to occupy excessively large memory. #27485
If clone operations are triggered on tablets in an ERROR state, disk usage increases. #28488
Compaction causes cold data to be written to the local cache. #28831

3.1.0

Release date: August 7, 2023

New Features

Shared-data cluster

Added support for Primary Key tables, on which persistent indexes cannot be enabled.
Supports the AUTO_INCREMENT column attribute, which enables a globally unique ID for each data row and thus simplifies data management.
Supports automatically creating partitions during loading and using partitioning expressions to define partitioning rules, thereby making partition creation easier to use and more flexible.
Supports abstraction of storage volumes, in which users can configure storage location and authentication information, in shared-data StarRocks clusters. Users can directly reference an existing storage volume when creating a database or table, making authentication configuration easier.

Data Lake analytics

Supports accessing views created on tables within Hive catalogs.
Supports accessing Parquet-formatted Iceberg v2 tables.
Supports sinking data to Parquet-formatted Iceberg tables.
[Preview] Supports accessing data stored in Elasticsearch by using Elasticsearch catalogs. This simplifies the creation of Elasticsearch external tables.
[Preview] Supports performing analytics on streaming data stored in Apache Paimon by using Paimon catalogs.

Storage engine, data ingestion, and query

Upgraded automatic partitioning to expression partitioning. Users only need to use a simple partition expression (either a time function expression or a column expression) to specify a partitioning method at table creation, and StarRocks will automatically create partitions based on the data characteristics and the rule defined in the partition expression during data loading. This method of partition creation is suitable for most scenarios and is more flexible and user-friendly.
Supports list partitioning. Data is partitioned based on a list of values predefined for a particular column, which can accelerate queries and manage clearly categorized data more efficiently.
Added a new table named loads to the Information_schema database. Users can query the results of Broker Load and Insert jobs from the loads table.
Supports logging the unqualified data rows that are filtered out by Stream Load, Broker Load, and Spark Load jobs. Users can use the log_rejected_record_num parameter in their load job to specify the maximum number of data rows that can be logged.
Supports random bucketing. With this feature, users do not need to configure bucketing columns at table creation, and StarRocks will randomly distribute the data loaded into it to buckets. Using this feature together with the capability of automatically setting the number of buckets (BUCKETS) that StarRocks has provided since v2.5.7, users no longer need to consider bucket configurations, and table creation statements are greatly simplified. In big data and high performance-demanding scenarios, however, we recommend that users continue using hash bucketing, because this way they can use bucket pruning to accelerate queries.
Supports using the table function FILES() in INSERT INTO to directly load the data of Parquet- or ORC-formatted data files stored in AWS S3. The FILES() function can automatically infer the table schema, which relieves the need to create external catalogs or file external tables before data loading and therefore greatly simplifies the data loading process.
Supports generated columns. With the generated column feature, StarRocks can automatically generate and store the values of column expressions and automatically rewrite queries to improve query performance.
Supports loading data from Spark to StarRocks by using Spark connector. Compared to Spark Load, the Spark connector provides more comprehensive capabilities. Users can define a Spark job to perform ETL operations on the data, and the Spark connector serves as the sink in the Spark job.
Supports loading data into columns of the MAP and STRUCT data types, and supports nesting Fast Decimal values in ARRAY, MAP, and STRUCT.

SQL reference

Added the following storage volume-related statements: CREATE STORAGE VOLUME, ALTER STORAGE VOLUME, DROP STORAGE VOLUME, SET DEFAULT STORAGE VOLUME, DESC STORAGE VOLUME, SHOW STORAGE VOLUMES.
Supports altering table comments using ALTER TABLE. #21035
Added the following functions:
- Struct functions: struct (row), named_struct
- Map functions: str_to_map, map_concat, map_from_arrays, element_at, distinct_map_keys, cardinality
- Higher-order Map functions: map_filter, map_apply, transform_keys, transform_values
- Array functions: array_agg supports ORDER BY, array_generate, element_at, cardinality
- Higher-order Array functions: all_match, any_match
- Aggregate functions: min_by, percentile_disc
- Table functions: FILES, generate_series
- Date functions: next_day, previous_day, last_day, makedate, date_diff
- Bitmap functions: bitmap_subset_limit, bitmap_subset_in_range
- Math functions: cosine_similarity, cosine_similarity_norm

Privileges and security

Added privilege items related to storage volumes and privilege items related to external catalogs, and supports using GRANT and REVOKE to grant and revoke these privileges.

Improvements

Shared-data cluster

Optimized the data cache in shared-data StarRocks clusters. The optimized data cache allows for specifying the range of hot data. It can also prevent queries against cold data from occupying the local disk cache, thereby ensuring the performance of queries against hot data.

Materialized view

Optimized the creation of an asynchronous materialized view:
- Supports random bucketing. If users do not specify bucketing columns, StarRocks adopts random bucketing by default.
- Supports using ORDER BY to specify a sort key.
- Supports specifying attributes such as colocate_group, storage_medium, and storage_cooldown_time.
- Supports using session variables. Users can configure these variables by using the properties("session.<variable_name>" = "<value>") syntax to flexibly adjust view refreshing strategies.
- Enables the spill feature for all asynchronous materialized views and implements a query timeout duration of 1 hour by default.
- Supports creating materialized views based on views. This makes materialized views easier to use in data modeling scenarios, because users can flexibly use views and materialized views based on their varying needs to implement layered modeling.
Optimized query rewrite with asynchronous materialized views:
- Supports Stale Rewrite, which allows materialized views that are not refreshed within a specified time interval to be used for query rewrite regardless of whether the base tables of the materialized views are updated. Users can specify the time interval by using the mv_rewrite_staleness_second property at materialized view creation.
- Supports rewriting View Delta Join queries against materialized views that are created on Hive catalog tables (a primary key and a foreign key must be defined).
- Optimized the mechanism for rewriting queries that contain union operations, and supports rewriting queries that contain joins or functions such as COUNT DISTINCT and time_slice.
Optimized the refreshing of asynchronous materialized views:
- Optimized the mechanism for refreshing materialized views that are created on Hive catalog tables. StarRocks now can perceive partition-level data changes, and refreshes only the partitions with data changes during each automatic refresh.
- Supports using the REFRESH MATERIALIZED VIEW WITH SYNC MODE syntax to synchronously invoke materialized view refresh tasks.
Enhanced the use of asynchronous materialized views:
- Supports using ALTER MATERIALIZED VIEW {ACTIVE | INACTIVE} to enable or disable a materialized view. Materialized views that are disabled (in the INACTIVE state) cannot be refreshed or used for query rewrite, but can be directly queried.
- Supports using ALTER MATERIALIZED VIEW SWAP WITH to swap two materialized views. Users can create a new materialized view and then perform an atomic swap with an existing materialized view to implement schema changes on the existing materialized view.
Optimized synchronous materialized views:
- Supports direct queries against synchronous materialized views using SQL hints [_SYNC_MV_], allowing for walking around issues that some queries cannot be properly rewritten in rare circumstances.
- Supports more expressions, such as CASE-WHEN, CAST, and mathematical operations, which make materialized views suitable for more business scenarios.

Data Lake analytics

Optimized metadata caching and access for Iceberg to improve Iceberg data query performance.
Optimized the data cache to further improve data lake analytics performance.

Storage engine, data ingestion, and query

Announced the general availability of the spill feature, which supports spilling the intermediate computation results of some blocking operators to disk. With the spill feature enabled, when a query contains aggregate, sort, or join operators, StarRocks can cache the intermediate computation results of the operators to disk to reduce memory consumption, thereby minimizing query failures caused by memory limits.
Supports pruning on cardinality-preserving joins. If users maintain a large number of tables which are organized in the star schema (for example, SSB) or the snowflake schema (for example, TCP-H) but they query only a small number of these tables, this feature helps prune unnecessary tables to improve the performance of joins.
Supports partial updates in column mode. Users can enable the column mode when they perform partial updates on Primary Key tables by using the UPDATE statement. The column mode is suitable for updating a small number of columns but a large number of rows, and can improve the updating performance by up to 10 times.
Optimized the collection of statistics for the CBO. This reduces the impact of statistics collection on data ingestion and increases statistics collection performance.
Optimized the merge algorithm to increase the overall performance by up to 2 times in permutation scenarios.
Optimized the query logic to reduce dependency on database locks.
Dynamic partitioning further supports the partitioning unit to be year. #28386

SQL reference

Conditional functions case, coalesce, if, ifnull, and nullif support the ARRAY, MAP, STRUCT, and JSON data types.
The following Array functions support nested types MAP, STRUCT, and ARRAY:
- array_agg
- array_contains, array_contains_all, array_contains_any
- array_slice, array_concat
- array_length, array_append, array_remove, array_position
- reverse, array_distinct, array_intersect, arrays_overlap
- array_sortby
The following Array functions support the Fast Decimal data type:
- array_agg
- array_append, array_remove, array_position, array_contains
- array_length
- array_max, array_min, array_sum, array_avg
- arrays_overlap, array_difference
- array_slice, array_distinct, array_sort, reverse, array_intersect, array_concat
- array_sortby, array_contains_all, array_contains_any

Bug Fixes

Fixed the following issues:

Requests to reconnect to Kafka for Routine Load jobs cannot be properly processed. #23477
For SQL queries that involve multiple tables and contain a WHERE clause, if these SQL queries have the same semantics but the order of the tables in each SQL query is different, some of these SQL queries may fail to be rewritten to benefit from the related materialized views. #22875
Duplicate records are returned for queries that contain a GROUP BY clause. #19640
Invoking the lead() or lag() function may cause BE crashes. #22945
Rewriting partial partition queries based on materialized views that are created on external catalog tables fail. #19011
SQL statements that contain both a backward slash (\) and a semicolon (;) cannot be properly parsed. #16552
A table cannot be truncated if a materialized view created on the table is removed. #19802

Behavior Change

The storage_cache_ttl parameter is deleted from the table creation syntax used for shared-data StarRocks clusters. Now the data in the local cache is evicted based on the LRU algorithm.
The BE configuration items disable_storage_page_cache and alter_tablet_worker_count and the FE configuration item lake_compaction_max_tasks are changed from immutable parameters to mutable parameters.
The default value of the BE configuration item block_cache_checksum_enable is changed from true to false.
The default value of the BE configuration item enable_new_load_on_memory_limit_exceeded is changed from false to true.
The default value of the FE configuration item max_running_txn_num_per_db is changed from 100 to 1000.
The default value of the FE configuration item http_max_header_size is changed from 8192 to 32768.
The default value of the FE configuration item tablet_create_timeout_second is changed from 1 to 10.
The default value of the FE configuration item max_routine_load_task_num_per_be is changed from 5 to 16, and error information will be returned if a large number of Routine Load tasks are created.
The FE configuration item quorom_publish_wait_time_ms is renamed as quorum_publish_wait_time_ms, and the FE configuration item async_load_task_pool_size is renamed as max_broker_load_job_concurrency.
The BE configuration item routine_load_thread_pool_size is deprecated. Now the routine load thread pool size per BE node is controlled only by the FE configuration item max_routine_load_task_num_per_be.
The BE configuration item txn_commit_rpc_timeout_ms and the system variable tx_visible_wait_timeout are deprecated.
The FE configuration items max_broker_concurrency and load_parallel_instance_num are deprecated.
The FE configuration item max_routine_load_job_num is deprecated. Now StarRocks dynamically infers the maximum number of Routine Load tasks supported by each individual BE node based on the max_routine_load_task_num_per_be parameter and provides suggestions on task failures.
The CN configuration item thrift_port is renamed as be_port.
Two new Routine Load job properties, task_consume_second and task_timeout_second, are added to control the maximum amount of time to consume data and the timeout duration for individual load tasks within a Routine Load job, making job adjustment more flexible. If users do not specify these two properties in their Routine Load job, the FE configuration items routine_load_task_consume_second and routine_load_task_timeout_second prevail.
The session variable enable_resource_group is deprecated because the Resource Group feature is enabled by default since v3.1.0.
Two new reserved keywords, COMPACTION and TEXT, are added.

StarRocks version 3.1

3.1.11​

Behavior Changes​

Improvements​

Bug Fixes​

3.1.10 (Yanked)​

New Features​

Behavior Changes​

Improvements​

Bug Fixes​

3.1.9​

New Features​

Behavior Changes​

Improvements​

Bug Fixes​

3.1.8​

New Features​

Parameter Change​

Improvements​

Bug Fixes​

3.1.7​

New Features​

Behavior Change​

Parameter Change​

Improvements​

Bug Fixes​

3.1.6​

New Features​

Parameter Change​

Improvements​

Bug Fixes​

3.1.5​

New features​

Improvements​

Bug Fixes​

Parameter Change​

FE/BE Parameters​

System Variables​

3.1.4​

New Features​

Improvements​

Bug Fixes​

Behavior Change​

3.1.3​

New Features​

Improvements​

Bug Fixes​

Behavior Change​

3.1.2​

Bug Fixes​

Behavior Change​

3.1.1​

New Features​

Improvements​

Bug Fixes​

3.1.0​

New Features​

Shared-data cluster​

Data Lake analytics​

Storage engine, data ingestion, and query​

SQL reference​

Privileges and security​

Improvements​

Shared-data cluster​

Materialized view​

Data Lake analytics​

Storage engine, data ingestion, and query​

SQL reference​

Bug Fixes​

Behavior Change​

What did you think of this doc?

3.1.11

Behavior Changes

Improvements

Bug Fixes

3.1.10 (Yanked)

New Features

Behavior Changes

Improvements

Bug Fixes

3.1.9

New Features

Behavior Changes

Improvements

Bug Fixes

3.1.8

New Features

Parameter Change

Improvements

Bug Fixes

3.1.7

New Features

Behavior Change

Parameter Change

Improvements

Bug Fixes

3.1.6

New Features

Parameter Change

Improvements

Bug Fixes

3.1.5

New features

Improvements

Bug Fixes

Parameter Change

FE/BE Parameters

System Variables

3.1.4

New Features

Improvements

Bug Fixes

Behavior Change

3.1.3

New Features

Improvements

Bug Fixes

Behavior Change

3.1.2

Bug Fixes

Behavior Change

3.1.1

New Features

Improvements

Bug Fixes

3.1.0

New Features

Shared-data cluster

Data Lake analytics

Storage engine, data ingestion, and query

SQL reference

Privileges and security

Improvements

Shared-data cluster

Materialized view

Data Lake analytics

Storage engine, data ingestion, and query

SQL reference

Bug Fixes

Behavior Change