- Introduction
- Quick Start
- Table Design
- Data Loading
- Data Export
- Using StarRocks
- Reference
- SQL Reference
- User Account Management
- Cluster Management
- ADMIN CANCEL REPAIR
- ADMIN CHECK TABLET
- ADMIN REPAIR
- ADMIN SET CONFIG
- ADMIN SET REPLICA STATUS
- ADMIN SHOW CONFIG
- ADMIN SHOW REPLICA DISTRIBUTION
- ADMIN SHOW REPLICA STATUS
- ALTER CLUSTER
- ALTER SYSTEM
- CANCEL DECOMMISSION
- CREATE CLUSTER
- CREATE FILE
- DROP CLUSTER
- DROP FILE
- ENTER
- INSTALL PLUGIN
- SHOW BACKENDS
- SHOW BROKER
- SHOW FILE
- SHOW FRONTENDS
- SHOW FULL COLUMNS
- SHOW INDEX
- SHOW MIGRATIONS
- SHOW PLUGINS
- SHOW TABLE STATUS
- UNINSTALL PLUGIN
- DDL
- ALTER DATABASE
- ALTER TABLE
- ALTER VIEW
- BACKUP
- CANCEL ALTER
- CANCEL BACKUP
- CANCEL RESTORE
- CREATE DATABASE
- CREATE INDEX
- CREATE MATERIALIZED VIEW
- CREATE REPOSITORY
- CREATE RESOURCE
- CREATE TABLE AS SELECT
- CREATE TABLE LIKE
- CREATE TABLE
- CREATE VIEW
- CREATE FUNCTION
- DROP DATABASE
- DROP INDEX
- DROP MATERIALIZED VIEW
- DROP REPOSITORY
- DROP RESOURCE
- DROP TABLE
- DROP VIEW
- DROP FUNCTION
- HLL
- RECOVER
- RESTORE
- SHOW RESOURCES
- SHOW FUNCTION
- TRUNCATE TABLE
- DML
- ALTER ROUTINE LOAD
- BROKER LOAD
- CANCEL LOAD
- DELETE
- EXPORT
- GROUP BY
- INSERT
- PAUSE ROUTINE LOAD
- RESUME ROUTINE LOAD
- ROUTINE LOAD
- SELECT
- SHOW ALTER
- SHOW BACKUP
- SHOW DATA
- SHOW DATABASES
- SHOW DELETE
- SHOW DYNAMIC PARTITION TABLES
- SHOW EXPORT
- SHOW LOAD
- SHOW PARTITIONS
- SHOW PROPERTY
- SHOW REPOSITORIES
- SHOW RESTORE
- SHOW ROUTINE LOAD
- SHOW ROUTINE LOAD TASK
- SHOW SNAPSHOT
- SHOW TABLES
- SHOW TABLET
- SHOW TRANSACTION
- SPARK LOAD
- STOP ROUTINE LOAD
- STREAM LOAD
- Data Type
- Auxiliary Commands
- Function Reference
- Date Functions
- convert_tz
- curdate
- current_timestamp
- curtime
- datediff
- date_add
- date_format
- date_sub
- date_trunc
- day
- dayname
- dayofmonth
- dayofweek
- dayofyear
- from_days
- from_unixtime
- hour
- minute
- month
- monthname
- now
- second
- str_to_date
- timediff
- timestampadd
- timestampdiff
- to_date
- to_days
- unix_timestamp
- utc_timestamp
- weekofyear
- year
- Geographic Functions
- String Functions
- append_trailing_char_if_absent
- ascii
- char_length
- concat
- concat_ws
- ends_with
- find_in_set
- get_json_double
- get_json_int
- get_json_string
- group_concat
- instr
- lcase
- left
- length
- locate
- lower
- lpad
- ltrim
- money_format
- null_or_empty
- regexp_extract
- regexp_replace
- repeat
- reverse
- right
- rpad
- split
- split_part
- starts_with
- strleft
- strright
- Aggregation Functions
- Bitmap Functions
- Array Functions
- cast function
- hash function
- Crytographic Functions
- Date Functions
- System Variable
- Error Code
- System Limit
- SQL Reference
- Administration
- FAQs
- Deployemnt FAQs
- Data Migration
- SQL FAQs
- Benchmark
- Release Notes
Metadata Recovery
You may need to manually recover your FE if any of the following happens:
- FE fails to start bdbje
- FE fails to synchronize with other FEs
- Can’t perform metadata write operation
- No MASTER found
Recovery Principle
To manually recover FEs, start a new MASTER using the metadata stored in the current meta_dir
, and then add other FEs one by one.
Recovery example
Please strictly follow the steps below:
Stop all FE processes. To avoid unanticipated problems, prevent anyone from accessing the data when the recovery is in progress.
Find the FE node with the latest metadata a. Back up all FE
meta_dir
directories first b. Usually, the metadata of Master FE is up-to-date. c. make sure the metadata is up-to-date by checking the suffix of theimage.xxxx
file inmeta_dir/image
directory. The bigger the suffix, the newer the metadata is. d.mate_dir path
can be found infe.conf
! [8-1](. /assets/8-1.png) e. Themeta_dir
folder structure is as follows: ! [8-2](... /assets/8-2.png) f. Compare the suffixes of theimage.xxxx
file in theimage
directory to identify the node with the most recent metadata ! [8-3](. /assets/8-3.png) g. Use this FE node with the most recent metadata for recovery. It is recommended to choose the FOLLOWER node for recovery if possible. Usecat ROL
to see the node role.The following operations are performed on the FE node with the most recent metadata selected by step 2. a. If the node is an
OBSERVER
, changerole=OBSERVER
torole=FOLLOWER
in themeta_dir/image/ROLE
file. (Recovering from anOBSERVER
node can be tricky, which will be explained later.) If the node is a FOLLOWER, skip this step. b. Addmetadata_failure_recovery=true
infe.conf
. Themetadata_failure_recovery=true
means to clear the metadata of "bdbje". In this way, bdbje will no longer contact other FEs and start as a standalone FE. This parameter should be set to true only when restart is in session, and must be set to false after restart is complete. Otherwise, the metadata of bdbje will be cleared again when restart is initiated and the other FEs will not work properly. c. Runsh bin/start_fe.sh --deamon
to start the FE. This FE will start as MASTER normally, you will seetransfer from XXXX to MASTER
infe.log
. d. Connect to this FE and execute import queries to check the access. If an error occurs, troubleshoot the FE logs and restart the FE. e. If no error occurs, you should be able to see all the FEs added to the cluster byshow frontends;
. The current FE is MASTER. f. IMPORTANT STEP. Removemetadata_failure_recovery=true
fromfe.conf
, or set it to false, and restart this FE. If the recovery is done with the metadata of an OBSERVER node,show frontends;
will show the current FE asOBSERVER
, butIsMaster
is shown as true. This inconsistency is because the "OBSERVER" record is in StarRocks' metadata, whereas the value ofIsMaster
is recorded in bdbje's metadata. This inconsistency will prevent future operations (e.g.,load
) from being performed, so it is necessary to fix it by following the steps: g. Drop all FE nodes except for this "OBSERVER" one. h. Add a new FOLLOWER FE byADD FOLLOWER
, assuming that it is on hostA. i. Start a brand new FE on hostA and join the cluster with the--helper
method. j. Runshow frontends;
t, you should see two FEs, one for the previous OBSERVER and one for the newly added FOLLOWER, and the OBSERVER is master. k. Ensure that the new FOLLOWER is working properly, and then re-execute the failover operation (step b to step f) with the metadata of this new FOLLOWER. (If the synchronization of IDs as shown in the figure is completed, that means the new FOLLOWER is working properly) ! [8-4](... /assets/8-4.png) The purpose of these steps above is to manually create metadata of the FOLLOWER node, and then use the metadata to start the fault recovery again.After step 3 is executed successfully, we will add the other FEs by
ALTER SYSTEM DROP FOLLOWER/OBSERVER
.
The above steps complete the recovery.