Class ServerCrashProcedure
- All Implemented Interfaces:
Comparable<Procedure<MasterProcedureEnv>>
,ServerProcedureInterface
- Direct Known Subclasses:
HBCKServerCrashProcedure
The procedure flow varies dependent on whether meta is assigned and if we are to split logs.
We come in here after ServerManager has noticed a server has expired. Procedures queued on the rpc should have been notified about fail and should be concurrently getting themselves ready to assign elsewhere.
-
Nested Class Summary
Nested classes/interfaces inherited from class org.apache.hadoop.hbase.procedure2.StateMachineProcedure
StateMachineProcedure.Flow
Nested classes/interfaces inherited from class org.apache.hadoop.hbase.procedure2.Procedure
Procedure.LockState
Nested classes/interfaces inherited from interface org.apache.hadoop.hbase.master.procedure.ServerProcedureInterface
ServerProcedureInterface.ServerOperationType
-
Field Summary
Modifier and TypeFieldDescriptionprivate boolean
private org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProcedureProtos.ServerCrashState
static final boolean
Default value ofMASTER_SCP_RETAIN_ASSIGNMENT
private static final org.slf4j.Logger
static final String
Configuration parameter to enable/disable the retain region assignment during ServerCrashProcedure.private boolean
Whether DeadServer knows that we are processing it.private int
private List<RegionInfo>
Regions that were on the crashed server.private ServerName
Name of the crashed server to process.private boolean
private MonitoredTask
private CompletableFuture<Void>
Fields inherited from class org.apache.hadoop.hbase.procedure2.StateMachineProcedure
stateCount
Fields inherited from class org.apache.hadoop.hbase.procedure2.Procedure
NO_PROC_ID, NO_TIMEOUT
-
Constructor Summary
ConstructorDescriptionUsed when deserializing from a procedure store; we'll construct one of these then call #deserializeStateData(InputStream).ServerCrashProcedure
(MasterProcedureEnv env, ServerName serverName, boolean shouldSplitWal, boolean carryingMeta) Call this constructor queuing up a Procedure. -
Method Summary
Modifier and TypeMethodDescriptionprotected boolean
abort
(MasterProcedureEnv env) The abort() call is asynchronous and each procedure must decide how to deal with it, if they want to be abortable.protected Procedure.LockState
The user should override this method if they need a lock on an Entity.private void
assignRegions
(MasterProcedureEnv env, List<RegionInfo> regions) Assign the regions on the crashed RS to other Rses.private void
private Procedure[]
createSplittingWalProcedures
(MasterProcedureEnv env, boolean splitMeta) protected void
deserializeStateData
(ProcedureStateSerializer serializer) Called on store load to allow the user to decode the previously serialized state.protected StateMachineProcedure.Flow
executeFromState
(MasterProcedureEnv env, org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProcedureProtos.ServerCrashState state) called to perform a single step of the specified 'state' of the procedureprivate boolean
protected org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProcedureProtos.ServerCrashState
Return the initial state object that will be used for the first call to executeFromState().protected ProcedureMetrics
Override this method to provide procedure specific counters for submitted count, failed count and time histogram.(package private) List<RegionInfo>
Returns List of Regions on crashed server.Returns Name of this server instance.Given an operation type we can take decisions about what to do with pending operations.protected org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProcedureProtos.ServerCrashState
getState
(int stateId) Convert an ordinal (or state id) to an Enum (or more descriptive) state object.protected int
getStateId
(org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProcedureProtos.ServerCrashState state) Convert the Enum (or more descriptive) state object to an ordinal (or state id).private CompletableFuture<Void>
boolean
Returns True if this server has an hbase:meta table region.protected boolean
Used to keep the procedure lock even when the procedure is yielding or suspended.private boolean
boolean
protected boolean
Moved out here so can be overridden by the HBCK fix-up SCP to be less strict about what it will tolerate as a 'match'.private boolean
isSplittingDone
(MasterProcedureEnv env, boolean splitMeta) protected void
The user should override this method, and release lock if necessary.protected void
rollbackState
(MasterProcedureEnv env, org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProcedureProtos.ServerCrashState state) called to perform the rollback of the specified stateprotected void
serializeStateData
(ProcedureStateSerializer serializer) The user-level code of the procedure may have some state to persist (e.g.protected boolean
Called by the ProcedureExecutor when the timeout set by setTimeout() is expired.private void
protected boolean
By default, the executor will keep the procedure result around util the eviction TTL is expired.void
Extend the toString() information with the procedure details e.g.(package private) void
updateProgress
(boolean updateState) static void
updateProgress
(MasterProcedureEnv env, long parentId) private void
Split logs using 'classic' zk-based coordination.private void
Split hbase:meta logs using 'classic' zk-based coordination.Methods inherited from class org.apache.hadoop.hbase.procedure2.StateMachineProcedure
addChildProcedure, execute, failIfAborted, getCurrentState, getCurrentStateId, getCycles, isEofState, isRollbackSupported, isRollbackSupported, isYieldAfterExecutionStep, isYieldBeforeExecuteFromState, rollback, setNextState, toStringState
Methods inherited from class org.apache.hadoop.hbase.procedure2.Procedure
addStackIndex, afterReplay, beforeReplay, bypass, compareTo, completionCleanup, doExecute, doRollback, elapsedTime, getChildrenLatch, getException, getLastUpdate, getNonceKey, getOwner, getParentProcId, getProcId, getProcIdHashCode, getResult, getRootProcedureId, getRootProcId, getStackIndexes, getState, getSubmittedTime, getTimeout, getTimeoutTimestamp, hasChildren, hasException, hasLock, hasOwner, hasParent, hasTimeout, haveSameParent, incChildrenLatch, isBypass, isFailed, isFinished, isInitializing, isLockedWhenLoading, isRunnable, isSuccess, isWaiting, removeStackIndex, setAbortFailure, setChildrenLatch, setExecuted, setFailure, setFailure, setLastUpdate, setNonceKey, setOwner, setOwner, setParentProcId, setProcId, setResult, setRootProcId, setStackIndexes, setState, setSubmittedTime, setTimeout, skipPersistence, suspend, toString, toStringClass, toStringDetails, toStringSimpleSB, updateMetricsOnFinish, updateMetricsOnSubmit, updateTimestamp, waitInitialized, wasExecuted
-
Field Details
-
LOG
-
MASTER_SCP_RETAIN_ASSIGNMENT
Configuration parameter to enable/disable the retain region assignment during ServerCrashProcedure.By default retain assignment is disabled which makes the failover faster and improve the availability; useful for cloud scenario where region block locality is not important. Enable this when RegionServers are deployed on same host where Datanode are running, this will improve read performance due to local read.
see HBASE-24900 for more details.
- See Also:
-
DEFAULT_MASTER_SCP_RETAIN_ASSIGNMENT
Default value ofMASTER_SCP_RETAIN_ASSIGNMENT
- See Also:
-
serverName
Name of the crashed server to process. -
notifiedDeadServer
Whether DeadServer knows that we are processing it. -
regionsOnCrashedServer
Regions that were on the crashed server. -
carryingMeta
-
shouldSplitWal
-
status
-
currentRunningState
private org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProcedureProtos.ServerCrashState currentRunningState -
updateMetaFuture
-
processedRegions
-
-
Constructor Details
-
ServerCrashProcedure
public ServerCrashProcedure(MasterProcedureEnv env, ServerName serverName, boolean shouldSplitWal, boolean carryingMeta) Call this constructor queuing up a Procedure.- Parameters:
serverName
- Name of the crashed server.shouldSplitWal
- True if we should split WALs as part of crashed server processing.carryingMeta
- True if carrying hbase:meta table region.
-
ServerCrashProcedure
public ServerCrashProcedure()Used when deserializing from a procedure store; we'll construct one of these then call #deserializeStateData(InputStream). Do not use directly.
-
-
Method Details
-
isInRecoverMetaState
-
executeFromState
protected StateMachineProcedure.Flow executeFromState(MasterProcedureEnv env, org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProcedureProtos.ServerCrashState state) throws ProcedureSuspendedException, ProcedureYieldException Description copied from class:StateMachineProcedure
called to perform a single step of the specified 'state' of the procedure- Specified by:
executeFromState
in classStateMachineProcedure<MasterProcedureEnv,
org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProcedureProtos.ServerCrashState> state
- state to execute- Returns:
- Flow.NO_MORE_STATE if the procedure is completed, Flow.HAS_MORE_STATE if there is another step.
- Throws:
ProcedureSuspendedException
ProcedureYieldException
-
getRegionsOnCrashedServer
Returns List of Regions on crashed server. -
cleanupSplitDir
-
isSplittingDone
-
createSplittingWalProcedures
private Procedure[] createSplittingWalProcedures(MasterProcedureEnv env, boolean splitMeta) throws IOException - Throws:
IOException
-
filterDefaultMetaRegions
-
isDefaultMetaRegion
-
zkCoordinatedSplitMetaLogs
Split hbase:meta logs using 'classic' zk-based coordination. Superceded by procedure-based WAL splitting.- Throws:
IOException
- See Also:
-
zkCoordinatedSplitLogs
Split logs using 'classic' zk-based coordination. Superceded by procedure-based WAL splitting.- Throws:
IOException
- See Also:
-
updateProgress
-
rollbackState
protected void rollbackState(MasterProcedureEnv env, org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProcedureProtos.ServerCrashState state) throws IOException Description copied from class:StateMachineProcedure
called to perform the rollback of the specified state- Specified by:
rollbackState
in classStateMachineProcedure<MasterProcedureEnv,
org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProcedureProtos.ServerCrashState> state
- state to rollback- Throws:
IOException
- temporary failure, the rollback will retry later
-
getState
protected org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProcedureProtos.ServerCrashState getState(int stateId) Description copied from class:StateMachineProcedure
Convert an ordinal (or state id) to an Enum (or more descriptive) state object.- Specified by:
getState
in classStateMachineProcedure<MasterProcedureEnv,
org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProcedureProtos.ServerCrashState> - Parameters:
stateId
- the ordinal() of the state enum (or state id)- Returns:
- the state enum object
-
getStateId
protected int getStateId(org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProcedureProtos.ServerCrashState state) Description copied from class:StateMachineProcedure
Convert the Enum (or more descriptive) state object to an ordinal (or state id).- Specified by:
getStateId
in classStateMachineProcedure<MasterProcedureEnv,
org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProcedureProtos.ServerCrashState> - Parameters:
state
- the state enum object- Returns:
- stateId the ordinal() of the state enum (or state id)
-
getInitialState
protected org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProcedureProtos.ServerCrashState getInitialState()Description copied from class:StateMachineProcedure
Return the initial state object that will be used for the first call to executeFromState().- Specified by:
getInitialState
in classStateMachineProcedure<MasterProcedureEnv,
org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProcedureProtos.ServerCrashState> - Returns:
- the initial state enum object
-
abort
Description copied from class:Procedure
The abort() call is asynchronous and each procedure must decide how to deal with it, if they want to be abortable. The simplest implementation is to have an AtomicBoolean set in the abort() method and then the execute() will check if the abort flag is set or not. abort() may be called multiple times from the client, so the implementation must be idempotent.NOTE: abort() is not like Thread.interrupt(). It is just a notification that allows the procedure implementor abort.
- Overrides:
abort
in classStateMachineProcedure<MasterProcedureEnv,
org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProcedureProtos.ServerCrashState>
-
acquireLock
Description copied from class:Procedure
The user should override this method if they need a lock on an Entity. A lock can be anything, and it is up to the implementor. The Procedure Framework will call this method just before it invokesProcedure.execute(Object)
. It callsProcedure.releaseLock(Object)
after the call to execute. If you need to hold the lock for the life of the Procedure -- i.e. you do not want any other Procedure interfering while this Procedure is running, seeProcedure.holdLock(Object)
. Example: in our Master we can execute request in parallel for different tables. We can create t1 and create t2 and these creates can be executed at the same time. Anything else on t1/t2 is queued waiting that specific table create to happen. There are 3 LockState:- LOCK_ACQUIRED should be returned when the proc has the lock and the proc is ready to execute.
- LOCK_YIELD_WAIT should be returned when the proc has not the lock and the framework should take care of readding the procedure back to the runnable set for retry
- LOCK_EVENT_WAIT should be returned when the proc has not the lock and someone will take care of readding the procedure back to the runnable set when the lock is available.
- Overrides:
acquireLock
in classProcedure<MasterProcedureEnv>
- Returns:
- the lock state as described above.
-
releaseLock
Description copied from class:Procedure
The user should override this method, and release lock if necessary.- Overrides:
releaseLock
in classProcedure<MasterProcedureEnv>
-
setTimeoutFailure
Description copied from class:Procedure
Called by the ProcedureExecutor when the timeout set by setTimeout() is expired. Another usage for this method is to implement retrying. A procedure can set the state toWAITING_TIMEOUT
by callingsetState
method, and throw aProcedureSuspendedException
to halt the execution of the procedure, and do not forget a callProcedure.setTimeout(int)
method to set the timeout. And you should also override this method to wake up the procedure, and also return false to tell the ProcedureExecutor that the timeout event has been handled.- Overrides:
setTimeoutFailure
in classProcedure<MasterProcedureEnv>
- Returns:
- true to let the framework handle the timeout as abort, false in case the procedure handled the timeout itself.
-
toStringClassDetails
Description copied from class:Procedure
Extend the toString() information with the procedure details e.g. className and parameters- Overrides:
toStringClassDetails
in classProcedure<MasterProcedureEnv>
- Parameters:
sb
- the string builder to use to append the proc specific information
-
getProcName
- Overrides:
getProcName
in classProcedure<MasterProcedureEnv>
-
serializeStateData
Description copied from class:Procedure
The user-level code of the procedure may have some state to persist (e.g. input arguments or current position in the processing state) to be able to resume on failure.- Overrides:
serializeStateData
in classStateMachineProcedure<MasterProcedureEnv,
org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProcedureProtos.ServerCrashState> - Parameters:
serializer
- stores the serializable state- Throws:
IOException
-
deserializeStateData
Description copied from class:Procedure
Called on store load to allow the user to decode the previously serialized state.- Overrides:
deserializeStateData
in classStateMachineProcedure<MasterProcedureEnv,
org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProcedureProtos.ServerCrashState> - Parameters:
serializer
- contains the serialized state- Throws:
IOException
-
getServerName
Description copied from interface:ServerProcedureInterface
Returns Name of this server instance.- Specified by:
getServerName
in interfaceServerProcedureInterface
-
hasMetaTableRegion
Description copied from interface:ServerProcedureInterface
Returns True if this server has an hbase:meta table region.- Specified by:
hasMetaTableRegion
in interfaceServerProcedureInterface
-
getServerOperationType
Description copied from interface:ServerProcedureInterface
Given an operation type we can take decisions about what to do with pending operations. e.g. if we get a crash handler and we have some assignment operation pending we can abort those operations.- Specified by:
getServerOperationType
in interfaceServerProcedureInterface
- Returns:
- the operation type that the procedure is executing.
-
shouldWaitClientAck
Description copied from class:Procedure
By default, the executor will keep the procedure result around util the eviction TTL is expired. The client can cut down the waiting time by requesting that the result is removed from the executor. In case of system started procedure, we can force the executor to auto-ack.- Overrides:
shouldWaitClientAck
in classProcedure<MasterProcedureEnv>
- Parameters:
env
- the environment passed to the ProcedureExecutor- Returns:
- true if the executor should wait the client ack for the result. Defaults to return true.
-
isMatchingRegionLocation
Moved out here so can be overridden by the HBCK fix-up SCP to be less strict about what it will tolerate as a 'match'.- Returns:
- True if the region location in
rsn
matches that of this crashed server.
-
getUpdateMetaFuture
-
setUpdateMetaFuture
-
assignRegions
private void assignRegions(MasterProcedureEnv env, List<RegionInfo> regions) throws IOException, ProcedureSuspendedException Assign the regions on the crashed RS to other Rses. In this method we will go through all the RegionStateNodes of the give regions to find out whether there is already an TRSP for the region, if so we interrupt it and let it retry on other server, otherwise we will schedule a TRSP to bring the region online. We will also check whether the table for a region is enabled, if not, we will skip assigning it. -
getProcedureMetrics
Description copied from class:Procedure
Override this method to provide procedure specific counters for submitted count, failed count and time histogram.- Overrides:
getProcedureMetrics
in classProcedure<MasterProcedureEnv>
- Parameters:
env
- The environment passed to the procedure executor- Returns:
- Container object for procedure related metric
-
holdLock
Description copied from class:Procedure
Used to keep the procedure lock even when the procedure is yielding or suspended.- Overrides:
holdLock
in classProcedure<MasterProcedureEnv>
- Returns:
- true if the procedure should hold on the lock until completionCleanup()
-
updateProgress
-