Class SnapshotManager

All Implemented Interfaces:
Stoppable

@LimitedPrivate("Configuration") @Unstable public class SnapshotManager extends MasterProcedureManager implements Stoppable
This class manages the procedure of taking and restoring snapshots. There is only one SnapshotManager for the master.

The class provides methods for monitoring in-progress snapshot actions.

Note: Currently there can only be one snapshot being taken at a time over the cluster. This is a simplification in the current implementation.

  • Field Details

  • Constructor Details

  • Method Details

    • getCompletedSnapshots

      public List<org.apache.hadoop.hbase.shaded.protobuf.generated.SnapshotProtos.SnapshotDescription> getCompletedSnapshots() throws IOException
      Gets the list of all completed snapshots.
      Returns:
      list of SnapshotDescriptions
      Throws:
      IOException - File system exception
    • getCompletedSnapshots

      private List<org.apache.hadoop.hbase.shaded.protobuf.generated.SnapshotProtos.SnapshotDescription> getCompletedSnapshots(org.apache.hadoop.fs.Path snapshotDir, boolean withCpCall) throws IOException
      Gets the list of all completed snapshots.
      Parameters:
      snapshotDir - snapshot directory
      withCpCall - Whether to call CP hooks
      Returns:
      list of SnapshotDescriptions
      Throws:
      IOException - File system exception
    • resetTempDir

      private void resetTempDir() throws IOException
      Cleans up any zk-coordinated snapshots in the snapshot/.tmp directory that were left from failed snapshot attempts. For unfinished procedure2-coordinated snapshots, keep the working directory.
      Throws:
      IOException - if we can't reach the filesystem
    • deleteSnapshot

      public void deleteSnapshot(org.apache.hadoop.hbase.shaded.protobuf.generated.SnapshotProtos.SnapshotDescription snapshot) throws IOException
      Delete the specified snapshot
      Throws:
      SnapshotDoesNotExistException - If the specified snapshot does not exist.
      IOException - For filesystem IOExceptions
    • isSnapshotDone

      public boolean isSnapshotDone(org.apache.hadoop.hbase.shaded.protobuf.generated.SnapshotProtos.SnapshotDescription expected) throws IOException
      Check if the specified snapshot is done
      Returns:
      true if snapshot is ready to be restored, false if it is still being taken.
      Throws:
      IOException - IOException if error from HDFS or RPC
      UnknownSnapshotException - if snapshot is invalid or does not exist.
    • isTakingSnapshot

      boolean isTakingSnapshot(org.apache.hadoop.hbase.shaded.protobuf.generated.SnapshotProtos.SnapshotDescription snapshot, boolean checkTable)
      Check to see if there is a snapshot in progress with the same name or on the same table. Currently we have a limitation only allowing a single snapshot per table at a time. Also we don't allow snapshot with the same name.
      Parameters:
      snapshot - description of the snapshot being checked.
      checkTable - check if the table is already taking a snapshot.
      Returns:
      true if there is a snapshot in progress with the same name or on the same table.
    • isTakingSnapshot

      public boolean isTakingSnapshot(TableName tableName)
      Check to see if the specified table has a snapshot in progress. Currently we have a limitation only allowing a single snapshot per table at a time.
      Parameters:
      tableName - name of the table being snapshotted.
      Returns:
      true if there is a snapshot in progress on the specified table.
    • isTableTakingAnySnapshot

      public boolean isTableTakingAnySnapshot(TableName tableName)
    • isTakingSnapshot

      private boolean isTakingSnapshot(TableName tableName, boolean checkProcedure)
      Check to see if the specified table has a snapshot in progress. Since we introduce the SnapshotProcedure, it is a little bit different from before. For zk-coordinated snapshot, we can just consider tables in snapshotHandlers only, but for MergeTableRegionsProcedure and SplitTableRegionProcedure, we need to consider tables in snapshotToProcIdMap also, for the snapshot procedure, we don't need to check if table in snapshot.
      Parameters:
      tableName - name of the table being snapshotted.
      checkProcedure - true if we should check tables in snapshotToProcIdMap
      Returns:
      true if there is a snapshot in progress on the specified table.
    • prepareWorkingDirectory

      public void prepareWorkingDirectory(org.apache.hadoop.hbase.shaded.protobuf.generated.SnapshotProtos.SnapshotDescription snapshot) throws HBaseSnapshotException
      Check to make sure that we are OK to run the passed snapshot. Checks to make sure that we aren't already running a snapshot or restore on the requested table.
      Parameters:
      snapshot - description of the snapshot we want to start
      Throws:
      HBaseSnapshotException - if the filesystem could not be prepared to start the snapshot
    • updateWorkingDirAclsIfRequired

      private static void updateWorkingDirAclsIfRequired(org.apache.hadoop.fs.Path workingDir, org.apache.hadoop.fs.FileSystem workingDirFS) throws IOException
      If the parent dir of the snapshot working dir (e.g. /hbase/.hbase-snapshot) has non-empty ACLs, use them for the current working dir (e.g. /hbase/.hbase-snapshot/.tmp/{snapshot-name}) so that regardless of whether the snapshot commit phase performs atomic rename or non-atomic copy of the working dir to new snapshot dir, the ACLs are retained.
      Parameters:
      workingDir - working dir to build the snapshot.
      workingDirFS - working dir file system.
      Throws:
      IOException - If ACL read/modify operation fails.
    • snapshotDisabledTable

      private void snapshotDisabledTable(org.apache.hadoop.hbase.shaded.protobuf.generated.SnapshotProtos.SnapshotDescription snapshot) throws IOException
      Take a snapshot of a disabled table.
      Parameters:
      snapshot - description of the snapshot to take. Modified to be SnapshotProtos.SnapshotDescription.Type.DISABLED.
      Throws:
      IOException - if the snapshot could not be started or filesystem for snapshot temporary directory could not be determined
    • snapshotEnabledTable

      private void snapshotEnabledTable(org.apache.hadoop.hbase.shaded.protobuf.generated.SnapshotProtos.SnapshotDescription snapshot) throws IOException
      Take a snapshot of an enabled table.
      Parameters:
      snapshot - description of the snapshot to take.
      Throws:
      IOException - if the snapshot could not be started or filesystem for snapshot temporary directory could not be determined
    • snapshotTable

      private void snapshotTable(org.apache.hadoop.hbase.shaded.protobuf.generated.SnapshotProtos.SnapshotDescription snapshot, TakeSnapshotHandler handler) throws IOException
      Take a snapshot using the specified handler. On failure the snapshot temporary working directory is removed. NOTE: prepareToTakeSnapshot() called before this one takes care of the rejecting the snapshot request if the table is busy with another snapshot/restore operation.
      Parameters:
      snapshot - the snapshot description
      handler - the snapshot handler
      Throws:
      IOException
    • getTakingSnapshotLock

    • isTakingAnySnapshot

      public boolean isTakingAnySnapshot()
      The snapshot operation processing as following:
      1. Create a Snapshot Handler, and do some initialization;
      2. Put the handler into snapshotHandlers
      So when we consider if any snapshot is taking, we should consider both the takingSnapshotLock and snapshotHandlers;
      Returns:
      true to indicate that there're some running snapshots.
    • takeSnapshot

      public void takeSnapshot(org.apache.hadoop.hbase.shaded.protobuf.generated.SnapshotProtos.SnapshotDescription snapshot) throws IOException
      Take a snapshot based on the enabled/disabled state of the table.
      Throws:
      HBaseSnapshotException - when a snapshot specific exception occurs.
      IOException - when some sort of generic IO exception occurs.
    • takeSnapshot

      public long takeSnapshot(org.apache.hadoop.hbase.shaded.protobuf.generated.SnapshotProtos.SnapshotDescription snapshot, long nonceGroup, long nonce) throws IOException
      Throws:
      IOException
    • submitSnapshotProcedure

      private long submitSnapshotProcedure(org.apache.hadoop.hbase.shaded.protobuf.generated.SnapshotProtos.SnapshotDescription snapshot, long nonceGroup, long nonce) throws IOException
      Throws:
      IOException
    • takeSnapshotInternal

      private void takeSnapshotInternal(org.apache.hadoop.hbase.shaded.protobuf.generated.SnapshotProtos.SnapshotDescription snapshot) throws IOException
      Throws:
      IOException
    • sanityCheckBeforeSnapshot

      private TableDescriptor sanityCheckBeforeSnapshot(org.apache.hadoop.hbase.shaded.protobuf.generated.SnapshotProtos.SnapshotDescription snapshot, boolean checkTable) throws IOException
      Check if the snapshot can be taken. Currently we have some limitations, for zk-coordinated snapshot, we don't allow snapshot with same name or taking multiple snapshots of a table at the same time, for procedure-coordinated snapshot, we don't allow snapshot with same name.
      Parameters:
      snapshot - description of the snapshot being checked.
      checkTable - check if the table is already taking a snapshot. For zk-coordinated snapshot, we need to check if another zk-coordinated snapshot is in progress, for the snapshot procedure, this is unnecessary.
      Returns:
      the table descriptor of the table
      Throws:
      IOException
    • setSnapshotHandlerForTesting

      public void setSnapshotHandlerForTesting(TableName tableName, SnapshotSentinel handler)
      Set the handler for the current snapshot

      Exposed for TESTING

      Parameters:
      handler - handler the master should use TODO get rid of this if possible, repackaging, modify tests.
    • getCoordinator

      Returns distributed commit coordinator for all running snapshots
    • isSnapshotCompleted

      private boolean isSnapshotCompleted(org.apache.hadoop.hbase.shaded.protobuf.generated.SnapshotProtos.SnapshotDescription snapshot) throws IOException
      Check to see if the snapshot is one of the currently completed snapshots Returns true if the snapshot exists in the "completed snapshots folder".
      Parameters:
      snapshot - expected snapshot to check
      Returns:
      true if the snapshot is stored on the FileSystem, false if is not stored
      Throws:
      IOException - if the filesystem throws an unexpected exception,
      IllegalArgumentException - if snapshot name is invalid.
    • cloneSnapshot

      private long cloneSnapshot(org.apache.hadoop.hbase.shaded.protobuf.generated.SnapshotProtos.SnapshotDescription reqSnapshot, TableName tableName, org.apache.hadoop.hbase.shaded.protobuf.generated.SnapshotProtos.SnapshotDescription snapshot, TableDescriptor snapshotTableDesc, NonceKey nonceKey, boolean restoreAcl, String customSFT) throws IOException
      Clone the specified snapshot. The clone will fail if the destination table has a snapshot or restore in progress.
      Parameters:
      reqSnapshot - Snapshot Descriptor from request
      tableName - table to clone
      snapshot - Snapshot Descriptor
      snapshotTableDesc - Table Descriptor
      nonceKey - unique identifier to prevent duplicated RPC
      Returns:
      procId the ID of the clone snapshot procedure
      Throws:
      IOException
    • cloneSnapshot

      long cloneSnapshot(org.apache.hadoop.hbase.shaded.protobuf.generated.SnapshotProtos.SnapshotDescription snapshot, TableDescriptor tableDescriptor, NonceKey nonceKey, boolean restoreAcl, String customSFT) throws HBaseSnapshotException
      Clone the specified snapshot into a new table. The operation will fail if the destination table has a snapshot or restore in progress.
      Parameters:
      snapshot - Snapshot Descriptor
      tableDescriptor - Table Descriptor of the table to create
      nonceKey - unique identifier to prevent duplicated RPC
      Returns:
      procId the ID of the clone snapshot procedure
      Throws:
      HBaseSnapshotException
    • restoreOrCloneSnapshot

      public long restoreOrCloneSnapshot(org.apache.hadoop.hbase.shaded.protobuf.generated.SnapshotProtos.SnapshotDescription reqSnapshot, NonceKey nonceKey, boolean restoreAcl, String customSFT) throws IOException
      Restore or Clone the specified snapshot
      Parameters:
      nonceKey - unique identifier to prevent duplicated RPC
      Throws:
      IOException
    • restoreSnapshot

      private long restoreSnapshot(org.apache.hadoop.hbase.shaded.protobuf.generated.SnapshotProtos.SnapshotDescription reqSnapshot, TableName tableName, org.apache.hadoop.hbase.shaded.protobuf.generated.SnapshotProtos.SnapshotDescription snapshot, TableDescriptor snapshotTableDesc, NonceKey nonceKey, boolean restoreAcl) throws IOException
      Restore the specified snapshot. The restore will fail if the destination table has a snapshot or restore in progress.
      Parameters:
      reqSnapshot - Snapshot Descriptor from request
      tableName - table to restore
      snapshot - Snapshot Descriptor
      snapshotTableDesc - Table Descriptor
      nonceKey - unique identifier to prevent duplicated RPC
      restoreAcl - true to restore acl of snapshot
      Returns:
      procId the ID of the restore snapshot procedure
      Throws:
      IOException
    • restoreSnapshot

      private long restoreSnapshot(org.apache.hadoop.hbase.shaded.protobuf.generated.SnapshotProtos.SnapshotDescription snapshot, TableDescriptor tableDescriptor, NonceKey nonceKey, boolean restoreAcl) throws HBaseSnapshotException
      Restore the specified snapshot. The restore will fail if the destination table has a snapshot or restore in progress.
      Parameters:
      snapshot - Snapshot Descriptor
      tableDescriptor - Table Descriptor
      nonceKey - unique identifier to prevent duplicated RPC
      restoreAcl - true to restore acl of snapshot
      Returns:
      procId the ID of the restore snapshot procedure
      Throws:
      HBaseSnapshotException
    • isRestoringTable

      private boolean isRestoringTable(TableName tableName)
      Verify if the restore of the specified table is in progress.
      Parameters:
      tableName - table under restore
      Returns:
      true if there is a restore in progress of the specified table.
    • removeSentinelIfFinished

      private SnapshotSentinel removeSentinelIfFinished(Map<TableName,SnapshotSentinel> sentinels, org.apache.hadoop.hbase.shaded.protobuf.generated.SnapshotProtos.SnapshotDescription snapshot)
      Return the handler if it is currently live and has the same snapshot target name. The handler is removed from the sentinels map if completed.
      Parameters:
      sentinels - live handlers
      snapshot - snapshot description
      Returns:
      null if doesn't match, else a live handler.
    • cleanupSentinels

      private void cleanupSentinels()
      Removes "abandoned" snapshot/restore requests. As part of the HBaseAdmin snapshot/restore API the operation status is checked until completed, and the in-progress maps are cleaned up when the status of a completed task is requested. To avoid having sentinels staying around for long time if something client side is failed, each operation tries to clean up the in-progress maps sentinels finished from a long time.
    • cleanupSentinels

      private void cleanupSentinels(Map<TableName,SnapshotSentinel> sentinels)
      Remove the sentinels that are marked as finished and the completion time has exceeded the removal timeout.
      Parameters:
      sentinels - map of sentinels to clean
    • cleanupCompletedRestoreInMap

      Remove the procedures that are marked as finished
    • cleanupCompletedSnapshotInMap

      Remove the procedures that are marked as finished
    • stop

      public void stop(String why)
      Description copied from interface: Stoppable
      Stop this service. Implementers should favor logging errors over throwing RuntimeExceptions.
      Specified by:
      stop in interface Stoppable
      Parameters:
      why - Why we're stopping.
    • isStopped

      public boolean isStopped()
      Description copied from interface: Stoppable
      Returns True if Stoppable.stop(String) has been closed.
      Specified by:
      isStopped in interface Stoppable
    • checkSnapshotSupport

      Throws an exception if snapshot operations (take a snapshot, restore, clone) are not supported. Called at the beginning of snapshot() and restoreSnapshot() methods.
      Throws:
      UnsupportedOperationException - if snapshot are not supported
    • checkSnapshotSupport

      private void checkSnapshotSupport(org.apache.hadoop.conf.Configuration conf, MasterFileSystem mfs) throws IOException, UnsupportedOperationException
      Called at startup, to verify if snapshot operation is supported, and to avoid starting the master if there're snapshots present but the cleaners needed are missing. Otherwise we can end up with snapshot data loss.
      Parameters:
      conf - The Configuration object to use
      mfs - The MasterFileSystem to use
      Throws:
      IOException - in case of file-system operation failure
      UnsupportedOperationException - in case cleaners are missing and there're snapshot in the system
    • initialize

      public void initialize(MasterServices master, MetricsMaster metricsMaster) throws org.apache.zookeeper.KeeperException, IOException, UnsupportedOperationException
      Description copied from class: MasterProcedureManager
      Initialize a globally barriered procedure for master.
      Specified by:
      initialize in class MasterProcedureManager
      Parameters:
      master - Master service interface
      Throws:
      org.apache.zookeeper.KeeperException
      IOException
      UnsupportedOperationException
    • restoreUnfinishedSnapshotProcedure

    • getProcedureSignature

      Description copied from class: ProcedureManager
      Return the unique signature of the procedure. This signature uniquely identifies the procedure. By default, this signature is the string used in the procedure controller (i.e., the root ZK node name for the procedure)
      Specified by:
      getProcedureSignature in class ProcedureManager
    • execProcedure

      public void execProcedure(org.apache.hadoop.hbase.shaded.protobuf.generated.HBaseProtos.ProcedureDescription desc) throws IOException
      Description copied from class: MasterProcedureManager
      Execute a distributed procedure on cluster
      Overrides:
      execProcedure in class MasterProcedureManager
      Parameters:
      desc - Procedure description
      Throws:
      IOException
    • checkPermissions

      public void checkPermissions(org.apache.hadoop.hbase.shaded.protobuf.generated.HBaseProtos.ProcedureDescription desc, AccessChecker accessChecker, User user) throws IOException
      Description copied from class: MasterProcedureManager
      Check for required permissions before executing the procedure.
      Specified by:
      checkPermissions in class MasterProcedureManager
      Throws:
      IOException - if permissions requirements are not met.
    • isProcedureDone

      public boolean isProcedureDone(org.apache.hadoop.hbase.shaded.protobuf.generated.HBaseProtos.ProcedureDescription desc) throws IOException
      Description copied from class: MasterProcedureManager
      Check if the procedure is finished successfully
      Specified by:
      isProcedureDone in class MasterProcedureManager
      Parameters:
      desc - Procedure description
      Returns:
      true if the specified procedure is finished successfully
      Throws:
      IOException
    • toSnapshotDescription

      private org.apache.hadoop.hbase.shaded.protobuf.generated.SnapshotProtos.SnapshotDescription toSnapshotDescription(org.apache.hadoop.hbase.shaded.protobuf.generated.HBaseProtos.ProcedureDescription desc) throws IOException
      Throws:
      IOException
    • registerSnapshotProcedure

      public void registerSnapshotProcedure(org.apache.hadoop.hbase.shaded.protobuf.generated.SnapshotProtos.SnapshotDescription snapshot, long procId)
    • unregisterSnapshotProcedure

      public void unregisterSnapshotProcedure(org.apache.hadoop.hbase.shaded.protobuf.generated.SnapshotProtos.SnapshotDescription snapshot, long procId)
    • snapshotProcedureEnabled

      public boolean snapshotProcedureEnabled()
    • acquireSnapshotVerifyWorker

      Throws:
      ProcedureSuspendedException
    • releaseSnapshotVerifyWorker

    • restoreWorkers

      private void restoreWorkers()
    • getAvailableWorker

      public Integer getAvailableWorker(ServerName serverName)