Class FSUtils

java.lang.Object
org.apache.hadoop.hbase.util.FSUtils

@Private public final class FSUtils extends Object
Utility methods for interacting with the underlying file system.
  • Nested Class Summary

    Nested Classes
    Modifier and Type
    Class
    Description
    static class 
    Directory filter that doesn't include any of the directories in the specified blacklist
    static class 
    A PathFilter that only allows directories.
    static class 
    Filter for all dirs that are legal column family names.
    (package private) static class 
    A PathFilter that returns only regular files.
    static class 
    Filter for HFiles that excludes reference files.
    static class 
    Filter for HFileLinks (StoreFiles and HFiles not included).
    (package private) static interface 
    Called every so-often by storefile map builder getTableStoreFilePathMap to report progress.
    static class 
     
    static class 
     
    static class 
    Filter for all dirs that don't start with '.'
    static class 
    A PathFilter that returns usertable directories.
  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    private static final int
     
    private static final org.slf4j.Logger
     
    private static final String
     
    static final boolean
    Set to true on Windows platforms
  • Constructor Summary

    Constructors
    Modifier
    Constructor
    Description
    private
     
  • Method Summary

    Modifier and Type
    Method
    Description
    static void
    addToHDFSBlocksDistribution(HDFSBlocksDistribution blocksDistribution, org.apache.hadoop.fs.BlockLocation[] blockLocations)
    Update blocksDistribution with blockLocations
    static boolean
    checkClusterIdExists(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path rootdir, long wait)
    Checks that a cluster ID file exists in the HBase root directory
    static void
    checkDfsSafeMode(org.apache.hadoop.conf.Configuration conf)
    Check whether dfs is in safemode.
    static void
    checkFileSystemAvailable(org.apache.hadoop.fs.FileSystem fs)
    Checks to see if the specified file system is available
    static void
    checkShortCircuitReadBufferSize(org.apache.hadoop.conf.Configuration conf)
    Check if short circuit read buffer size is set and if not, set it to hbase value.
    static void
    checkVersion(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path rootdir, boolean message)
    Verifies current version of file system
    static void
    checkVersion(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path rootdir, boolean message, int wait, int retries)
    Verifies current version of file system
    computeHDFSBlocksDistribution(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.FileStatus status, long start, long length)
    Compute HDFS blocks distribution of a given file, or a portion of the file
    computeHDFSBlocksDistribution(org.apache.hadoop.hdfs.client.HdfsDataInputStream inputStream)
    Compute HDFS block distribution of a given HdfsDataInputStream.
    private static List<org.apache.hadoop.fs.Path>
    copyFiles(org.apache.hadoop.fs.FileSystem srcFS, org.apache.hadoop.fs.Path src, org.apache.hadoop.fs.FileSystem dstFS, org.apache.hadoop.fs.Path dst, org.apache.hadoop.conf.Configuration conf, ExecutorService pool, List<Future<Void>> futures)
     
    static List<org.apache.hadoop.fs.Path>
    copyFilesParallel(org.apache.hadoop.fs.FileSystem srcFS, org.apache.hadoop.fs.Path src, org.apache.hadoop.fs.FileSystem dstFS, org.apache.hadoop.fs.Path dst, org.apache.hadoop.conf.Configuration conf, int threads)
     
    static org.apache.hadoop.fs.FSDataOutputStream
    create(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path path, org.apache.hadoop.fs.permission.FsPermission perm, InetSocketAddress[] favoredNodes)
    Create the specified file on the filesystem.
    static boolean
    deleteRegionDir(org.apache.hadoop.conf.Configuration conf, RegionInfo hri)
    Delete the region directory if exists.
    static List<org.apache.hadoop.fs.FileStatus>
    filterFileStatuses(Iterator<org.apache.hadoop.fs.FileStatus> input, FileStatusFilter filter)
    Filters FileStatuses in an iterator and returns a list
    static List<org.apache.hadoop.fs.FileStatus>
    filterFileStatuses(org.apache.hadoop.fs.FileStatus[] input, FileStatusFilter filter)
    Filters FileStatuses in an array and returns a list
    static ClusterId
    getClusterId(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path rootdir)
    Returns the value of the unique cluster ID stored for this HBase instance.
    static org.apache.hadoop.hdfs.DFSHedgedReadMetrics
    getDFSHedgedReadMetrics(org.apache.hadoop.conf.Configuration c)
    Returns The DFSClient DFSHedgedReadMetrics instance or null if can't be found or not on hdfs.
    static List<org.apache.hadoop.fs.Path>
    getFamilyDirs(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path regionDir)
    Given a particular region dir, return all the familydirs inside it
    private static List<org.apache.hadoop.fs.Path>
    getFilePaths(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path dir, org.apache.hadoop.fs.PathFilter pathFilter)
     
    private static String[]
    getHostsForLocations(org.apache.hadoop.hdfs.protocol.LocatedBlock block)
     
    static List<org.apache.hadoop.fs.Path>
    getLocalTableDirs(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path rootdir)
     
    private static Set<InetSocketAddress>
    getNNAddresses(org.apache.hadoop.hdfs.DistributedFileSystem fs, org.apache.hadoop.conf.Configuration conf)
    Returns A set containing all namenode addresses of fs
    static List<org.apache.hadoop.fs.Path>
    getReferenceAndLinkFilePaths(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path familyDir)
     
    static List<org.apache.hadoop.fs.Path>
    getReferenceFilePaths(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path familyDir)
     
    getRegionDegreeLocalityMappingFromFS(org.apache.hadoop.conf.Configuration conf)
    This function is to scan the root path of the file system to get the degree of locality for each region on each of the servers having at least one block of that region.
    getRegionDegreeLocalityMappingFromFS(org.apache.hadoop.conf.Configuration conf, String desiredTable, int threadPoolSize)
    This function is to scan the root path of the file system to get the degree of locality for each region on each of the servers having at least one block of that region.
    static org.apache.hadoop.fs.Path
    getRegionDirFromRootDir(org.apache.hadoop.fs.Path rootDir, RegionInfo region)
     
    static org.apache.hadoop.fs.Path
    getRegionDirFromTableDir(org.apache.hadoop.fs.Path tableDir, String encodedRegionName)
     
    static org.apache.hadoop.fs.Path
    getRegionDirFromTableDir(org.apache.hadoop.fs.Path tableDir, RegionInfo region)
     
    static List<org.apache.hadoop.fs.Path>
    getRegionDirs(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path tableDir)
    Given a particular table dir, return all the regiondirs inside it, excluding files such as .tableinfo
    private static void
    getRegionLocalityMappingFromFS(org.apache.hadoop.conf.Configuration conf, String desiredTable, int threadPoolSize, Map<String,Map<String,Float>> regionDegreeLocalityMapping)
    This function is to scan the root path of the file system to get either the mapping between the region name and its best locality region server or the degree of locality of each region on each of the servers having at least one block of that region.
    static int
    getRegionReferenceAndLinkFileCount(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path p)
     
    static int
    getRegionReferenceFileCount(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path p)
     
    static List<org.apache.hadoop.fs.Path>
    getTableDirs(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path rootdir)
     
    getTableFragmentation(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path hbaseRootDir)
    Runs through the HBase rootdir and checks how many stores for each table have more than one file in them.
    Runs through the HBase rootdir and checks how many stores for each table have more than one file in them.
    static Map<String,org.apache.hadoop.fs.Path>
    getTableStoreFilePathMap(Map<String,org.apache.hadoop.fs.Path> map, org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path hbaseRootDir, TableName tableName)
    Runs through the HBase rootdir/tablename and creates a reverse lookup map for table StoreFile names to the full Path.
    static Map<String,org.apache.hadoop.fs.Path>
    getTableStoreFilePathMap(Map<String,org.apache.hadoop.fs.Path> resultMap, org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path hbaseRootDir, TableName tableName, org.apache.hadoop.fs.PathFilter sfFilter, ExecutorService executor, FSUtils.ProgressReporter progressReporter)
    Runs through the HBase rootdir/tablename and creates a reverse lookup map for table StoreFile names to the full Path.
    static Map<String,org.apache.hadoop.fs.Path>
    getTableStoreFilePathMap(Map<String,org.apache.hadoop.fs.Path> resultMap, org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path hbaseRootDir, TableName tableName, org.apache.hadoop.fs.PathFilter sfFilter, ExecutorService executor, HbckErrorReporter progressReporter)
    Deprecated.
    Since 2.3.0.
    static Map<String,org.apache.hadoop.fs.Path>
    getTableStoreFilePathMap(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path hbaseRootDir)
    Runs through the HBase rootdir and creates a reverse lookup map for table StoreFile names to the full Path.
    static Map<String,org.apache.hadoop.fs.Path>
    getTableStoreFilePathMap(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path hbaseRootDir, org.apache.hadoop.fs.PathFilter sfFilter, ExecutorService executor, FSUtils.ProgressReporter progressReporter)
    Runs through the HBase rootdir and creates a reverse lookup map for table StoreFile names to the full Path.
    static Map<String,org.apache.hadoop.fs.Path>
    getTableStoreFilePathMap(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path hbaseRootDir, org.apache.hadoop.fs.PathFilter sfFilter, ExecutorService executor, HbckErrorReporter progressReporter)
    Deprecated.
    Since 2.3.0.
    static int
    Returns the total overall fragmentation percentage.
    static String
    getVersion(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path rootdir)
    Verifies current version of file system
    static boolean
    isDistributedFileSystem(org.apache.hadoop.fs.FileSystem fs)
    Returns True is fs is instance of DistributedFileSystem n
    private static boolean
    isInSafeMode(org.apache.hadoop.hdfs.DistributedFileSystem dfs)
    Inquire the Active NameNode's safe mode status.
    static boolean
    isMatchingTail(org.apache.hadoop.fs.Path pathToSearch, org.apache.hadoop.fs.Path pathTail)
    Compare path component of the Path URI; e.g.
    static boolean
    isSameHdfs(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.fs.FileSystem srcFs, org.apache.hadoop.fs.FileSystem desFs)
     
    static List<org.apache.hadoop.fs.FileStatus>
    listStatusWithStatusFilter(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path dir, FileStatusFilter filter)
    Calls fs.listStatus() and treats FileNotFoundException as non-fatal This accommodates differences between hadoop versions, where hadoop 1 does not throw a FileNotFoundException, and return an empty FileStatus[] while Hadoop 2 will throw FileNotFoundException.
    static boolean
    metaRegionExists(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path rootDir)
    Checks if meta region exists
    (package private) static String
    parseVersionFrom(byte[] bytes)
    Parse the content of the ${HBASE_ROOTDIR}/hbase.version file.
    private static void
    rewriteAsPb(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path rootdir, org.apache.hadoop.fs.Path p, ClusterId cid)
     
    static void
    setClusterId(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path rootdir, ClusterId clusterId, int wait)
    Writes a new unique identifier for this cluster to the "hbase.id" file in the HBase root directory
    static void
    setupShortCircuitRead(org.apache.hadoop.conf.Configuration conf)
    Do our short circuit read setup.
    static void
    setVersion(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path rootdir)
    Sets version of file system
    static void
    setVersion(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path rootdir, int wait, int retries)
    Sets version of file system
    static void
    setVersion(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path rootdir, String version, int wait, int retries)
    Sets version of file system
    (package private) static byte[]
    Create the content to write into the ${HBASE_ROOTDIR}/hbase.version file.
    static void
    waitOnSafeMode(org.apache.hadoop.conf.Configuration conf, long wait)
    If DFS, check safe mode and if so, wait until we clear it.

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Field Details

  • Constructor Details

  • Method Details

    • isDistributedFileSystem

      public static boolean isDistributedFileSystem(org.apache.hadoop.fs.FileSystem fs) throws IOException
      Returns True is fs is instance of DistributedFileSystem n
      Throws:
      IOException
    • isMatchingTail

      public static boolean isMatchingTail(org.apache.hadoop.fs.Path pathToSearch, org.apache.hadoop.fs.Path pathTail)
      Compare path component of the Path URI; e.g. if hdfs://a/b/c and /a/b/c, it will compare the '/a/b/c' part. If you passed in 'hdfs://a/b/c and b/c, it would return true. Does not consider schema; i.e. if schemas different but path or subpath matches, the two will equate.
      Parameters:
      pathToSearch - Path we will be trying to match.
      Returns:
      True if pathTail is tail on the path of pathToSearch
    • deleteRegionDir

      public static boolean deleteRegionDir(org.apache.hadoop.conf.Configuration conf, RegionInfo hri) throws IOException
      Delete the region directory if exists.
      Returns:
      True if deleted the region directory.
      Throws:
      IOException
    • create

      public static org.apache.hadoop.fs.FSDataOutputStream create(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path path, org.apache.hadoop.fs.permission.FsPermission perm, InetSocketAddress[] favoredNodes) throws IOException
      Create the specified file on the filesystem. By default, this will:
      1. overwrite the file if it exists
      2. apply the umask in the configuration (if it is enabled)
      3. use the fs configured buffer size (or 4096 if not set)
      4. use the configured column family replication or default replication if ColumnFamilyDescriptorBuilder.DEFAULT_DFS_REPLICATION
      5. use the default block size
      6. not track progress
      Parameters:
      conf - configurations
      fs - FileSystem on which to write the file
      path - Path to the file to write
      perm - permissions
      favoredNodes - favored data nodes
      Returns:
      output stream to the created file
      Throws:
      IOException - if the file cannot be created
    • checkFileSystemAvailable

      public static void checkFileSystemAvailable(org.apache.hadoop.fs.FileSystem fs) throws IOException
      Checks to see if the specified file system is available
      Parameters:
      fs - filesystem
      Throws:
      IOException - e
    • isInSafeMode

      private static boolean isInSafeMode(org.apache.hadoop.hdfs.DistributedFileSystem dfs) throws IOException
      Inquire the Active NameNode's safe mode status.
      Parameters:
      dfs - A DistributedFileSystem object representing the underlying HDFS.
      Returns:
      whether we're in safe mode
      Throws:
      IOException
    • checkDfsSafeMode

      public static void checkDfsSafeMode(org.apache.hadoop.conf.Configuration conf) throws IOException
      Check whether dfs is in safemode.
      Throws:
      IOException
    • getVersion

      public static String getVersion(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path rootdir) throws IOException, DeserializationException
      Verifies current version of file system
      Parameters:
      fs - filesystem object
      rootdir - root hbase directory
      Returns:
      null if no version file exists, version string otherwise
      Throws:
      IOException - if the version file fails to open
      DeserializationException - if the version data cannot be translated into a version
    • parseVersionFrom

      static String parseVersionFrom(byte[] bytes) throws DeserializationException
      Parse the content of the ${HBASE_ROOTDIR}/hbase.version file.
      Parameters:
      bytes - The byte content of the hbase.version file
      Returns:
      The version found in the file as a String
      Throws:
      DeserializationException - if the version data cannot be translated into a version
    • toVersionByteArray

      static byte[] toVersionByteArray(String version)
      Create the content to write into the ${HBASE_ROOTDIR}/hbase.version file.
      Parameters:
      version - Version to persist
      Returns:
      Serialized protobuf with version content and a bit of pb magic for a prefix.
    • checkVersion

      public static void checkVersion(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path rootdir, boolean message) throws IOException, DeserializationException
      Verifies current version of file system
      Parameters:
      fs - file system
      rootdir - root directory of HBase installation
      message - if true, issues a message on System.out
      Throws:
      IOException - if the version file cannot be opened
      DeserializationException - if the contents of the version file cannot be parsed
    • checkVersion

      public static void checkVersion(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path rootdir, boolean message, int wait, int retries) throws IOException, DeserializationException
      Verifies current version of file system
      Parameters:
      fs - file system
      rootdir - root directory of HBase installation
      message - if true, issues a message on System.out
      wait - wait interval
      retries - number of times to retry
      Throws:
      IOException - if the version file cannot be opened
      DeserializationException - if the contents of the version file cannot be parsed
    • setVersion

      public static void setVersion(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path rootdir) throws IOException
      Sets version of file system
      Parameters:
      fs - filesystem object
      rootdir - hbase root
      Throws:
      IOException - e
    • setVersion

      public static void setVersion(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path rootdir, int wait, int retries) throws IOException
      Sets version of file system
      Parameters:
      fs - filesystem object
      rootdir - hbase root
      wait - time to wait for retry
      retries - number of times to retry before failing
      Throws:
      IOException - e
    • setVersion

      public static void setVersion(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path rootdir, String version, int wait, int retries) throws IOException
      Sets version of file system
      Parameters:
      fs - filesystem object
      rootdir - hbase root directory
      version - version to set
      wait - time to wait for retry
      retries - number of times to retry before throwing an IOException
      Throws:
      IOException - e
    • checkClusterIdExists

      public static boolean checkClusterIdExists(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path rootdir, long wait) throws IOException
      Checks that a cluster ID file exists in the HBase root directory
      Parameters:
      fs - the root directory FileSystem
      rootdir - the HBase root directory in HDFS
      wait - how long to wait between retries
      Returns:
      true if the file exists, otherwise false
      Throws:
      IOException - if checking the FileSystem fails
    • getClusterId

      public static ClusterId getClusterId(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path rootdir) throws IOException
      Returns the value of the unique cluster ID stored for this HBase instance.
      Parameters:
      fs - the root directory FileSystem
      rootdir - the path to the HBase root directory
      Returns:
      the unique cluster identifier
      Throws:
      IOException - if reading the cluster ID file fails
    • rewriteAsPb

      private static void rewriteAsPb(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path rootdir, org.apache.hadoop.fs.Path p, ClusterId cid) throws IOException
      Throws:
      IOException
    • setClusterId

      public static void setClusterId(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path rootdir, ClusterId clusterId, int wait) throws IOException
      Writes a new unique identifier for this cluster to the "hbase.id" file in the HBase root directory
      Parameters:
      fs - the root directory FileSystem
      rootdir - the path to the HBase root directory
      clusterId - the unique identifier to store
      wait - how long (in milliseconds) to wait between retries
      Throws:
      IOException - if writing to the FileSystem fails and no wait value
    • waitOnSafeMode

      public static void waitOnSafeMode(org.apache.hadoop.conf.Configuration conf, long wait) throws IOException
      If DFS, check safe mode and if so, wait until we clear it.
      Parameters:
      conf - configuration
      wait - Sleep between retries
      Throws:
      IOException - e
    • metaRegionExists

      public static boolean metaRegionExists(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path rootDir) throws IOException
      Checks if meta region exists
      Parameters:
      fs - file system
      rootDir - root directory of HBase installation
      Returns:
      true if exists
      Throws:
      IOException
    • computeHDFSBlocksDistribution

      public static HDFSBlocksDistribution computeHDFSBlocksDistribution(org.apache.hadoop.hdfs.client.HdfsDataInputStream inputStream) throws IOException
      Compute HDFS block distribution of a given HdfsDataInputStream. All HdfsDataInputStreams are backed by a series of LocatedBlocks, which are fetched periodically from the namenode. This method retrieves those blocks from the input stream and uses them to calculate HDFSBlockDistribution. The underlying method in DFSInputStream does attempt to use locally cached blocks, but may hit the namenode if the cache is determined to be incomplete. The method also involves making copies of all LocatedBlocks rather than return the underlying blocks themselves.
      Throws:
      IOException
    • getHostsForLocations

      private static String[] getHostsForLocations(org.apache.hadoop.hdfs.protocol.LocatedBlock block)
    • computeHDFSBlocksDistribution

      public static HDFSBlocksDistribution computeHDFSBlocksDistribution(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.FileStatus status, long start, long length) throws IOException
      Compute HDFS blocks distribution of a given file, or a portion of the file
      Parameters:
      fs - file system
      status - file status of the file
      start - start position of the portion
      length - length of the portion
      Returns:
      The HDFS blocks distribution
      Throws:
      IOException
    • addToHDFSBlocksDistribution

      public static void addToHDFSBlocksDistribution(HDFSBlocksDistribution blocksDistribution, org.apache.hadoop.fs.BlockLocation[] blockLocations) throws IOException
      Update blocksDistribution with blockLocations
      Parameters:
      blocksDistribution - the hdfs blocks distribution
      blockLocations - an array containing block location
      Throws:
      IOException
    • getTotalTableFragmentation

      public static int getTotalTableFragmentation(HMaster master) throws IOException
      Returns the total overall fragmentation percentage. Includes hbase:meta and -ROOT- as well.
      Parameters:
      master - The master defining the HBase root and file system
      Returns:
      A map for each table and its percentage (never null)
      Throws:
      IOException - When scanning the directory fails
    • getTableFragmentation

      public static Map<String,Integer> getTableFragmentation(HMaster master) throws IOException
      Runs through the HBase rootdir and checks how many stores for each table have more than one file in them. Checks -ROOT- and hbase:meta too. The total percentage across all tables is stored under the special key "-TOTAL-".
      Parameters:
      master - The master defining the HBase root and file system.
      Returns:
      A map for each table and its percentage (never null).
      Throws:
      IOException - When scanning the directory fails.
    • getTableFragmentation

      public static Map<String,Integer> getTableFragmentation(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path hbaseRootDir) throws IOException
      Runs through the HBase rootdir and checks how many stores for each table have more than one file in them. Checks -ROOT- and hbase:meta too. The total percentage across all tables is stored under the special key "-TOTAL-".
      Parameters:
      fs - The file system to use
      hbaseRootDir - The root directory to scan
      Returns:
      A map for each table and its percentage (never null)
      Throws:
      IOException - When scanning the directory fails
    • getTableDirs

      public static List<org.apache.hadoop.fs.Path> getTableDirs(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path rootdir) throws IOException
      Throws:
      IOException
    • getLocalTableDirs

      public static List<org.apache.hadoop.fs.Path> getLocalTableDirs(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path rootdir) throws IOException
      Returns:
      All the table directories under rootdir. Ignore non table hbase folders such as .logs, .oldlogs, .corrupt folders.
      Throws:
      IOException
    • getRegionDirs

      public static List<org.apache.hadoop.fs.Path> getRegionDirs(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path tableDir) throws IOException
      Given a particular table dir, return all the regiondirs inside it, excluding files such as .tableinfo
      Parameters:
      fs - A file system for the Path
      tableDir - Path to a specific table directory <hbase.rootdir>/<tabledir>
      Returns:
      List of paths to valid region directories in table dir.
      Throws:
      IOException
    • getRegionDirFromRootDir

      public static org.apache.hadoop.fs.Path getRegionDirFromRootDir(org.apache.hadoop.fs.Path rootDir, RegionInfo region)
    • getRegionDirFromTableDir

      public static org.apache.hadoop.fs.Path getRegionDirFromTableDir(org.apache.hadoop.fs.Path tableDir, RegionInfo region)
    • getRegionDirFromTableDir

      public static org.apache.hadoop.fs.Path getRegionDirFromTableDir(org.apache.hadoop.fs.Path tableDir, String encodedRegionName)
    • getFamilyDirs

      public static List<org.apache.hadoop.fs.Path> getFamilyDirs(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path regionDir) throws IOException
      Given a particular region dir, return all the familydirs inside it
      Parameters:
      fs - A file system for the Path
      regionDir - Path to a specific region directory
      Returns:
      List of paths to valid family directories in region dir.
      Throws:
      IOException
    • getReferenceFilePaths

      public static List<org.apache.hadoop.fs.Path> getReferenceFilePaths(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path familyDir) throws IOException
      Throws:
      IOException
    • getReferenceAndLinkFilePaths

      public static List<org.apache.hadoop.fs.Path> getReferenceAndLinkFilePaths(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path familyDir) throws IOException
      Throws:
      IOException
    • getFilePaths

      private static List<org.apache.hadoop.fs.Path> getFilePaths(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path dir, org.apache.hadoop.fs.PathFilter pathFilter) throws IOException
      Throws:
      IOException
    • getRegionReferenceAndLinkFileCount

      public static int getRegionReferenceAndLinkFileCount(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path p)
    • getTableStoreFilePathMap

      public static Map<String,org.apache.hadoop.fs.Path> getTableStoreFilePathMap(Map<String,org.apache.hadoop.fs.Path> map, org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path hbaseRootDir, TableName tableName) throws IOException, InterruptedException
      Runs through the HBase rootdir/tablename and creates a reverse lookup map for table StoreFile names to the full Path.
      Example...
      Key = 3944417774205889744
      Value = hdfs://localhost:51169/user/userid/-ROOT-/70236052/info/3944417774205889744
      Parameters:
      map - map to add values. If null, this method will create and populate one to return
      fs - The file system to use.
      hbaseRootDir - The root directory to scan.
      tableName - name of the table to scan.
      Returns:
      Map keyed by StoreFile name with a value of the full Path.
      Throws:
      IOException - When scanning the directory fails.
      InterruptedException
    • getTableStoreFilePathMap

      @Deprecated public static Map<String,org.apache.hadoop.fs.Path> getTableStoreFilePathMap(Map<String,org.apache.hadoop.fs.Path> resultMap, org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path hbaseRootDir, TableName tableName, org.apache.hadoop.fs.PathFilter sfFilter, ExecutorService executor, HbckErrorReporter progressReporter) throws IOException, InterruptedException
      Deprecated.
      Since 2.3.0. For removal in hbase4. Use ProgressReporter override instead.
      Runs through the HBase rootdir/tablename and creates a reverse lookup map for table StoreFile names to the full Path. Note that because this method can be called on a 'live' HBase system that we will skip files that no longer exist by the time we traverse them and similarly the user of the result needs to consider that some entries in this map may not exist by the time this call completes.
      Example...
      Key = 3944417774205889744
      Value = hdfs://localhost:51169/user/userid/-ROOT-/70236052/info/3944417774205889744
      Parameters:
      resultMap - map to add values. If null, this method will create and populate one to return
      fs - The file system to use.
      hbaseRootDir - The root directory to scan.
      tableName - name of the table to scan.
      sfFilter - optional path filter to apply to store files
      executor - optional executor service to parallelize this operation
      progressReporter - Instance or null; gets called every time we move to new region of family dir and for each store file.
      Returns:
      Map keyed by StoreFile name with a value of the full Path.
      Throws:
      IOException - When scanning the directory fails.
      InterruptedException
    • getTableStoreFilePathMap

      public static Map<String,org.apache.hadoop.fs.Path> getTableStoreFilePathMap(Map<String,org.apache.hadoop.fs.Path> resultMap, org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path hbaseRootDir, TableName tableName, org.apache.hadoop.fs.PathFilter sfFilter, ExecutorService executor, FSUtils.ProgressReporter progressReporter) throws IOException, InterruptedException
      Runs through the HBase rootdir/tablename and creates a reverse lookup map for table StoreFile names to the full Path. Note that because this method can be called on a 'live' HBase system that we will skip files that no longer exist by the time we traverse them and similarly the user of the result needs to consider that some entries in this map may not exist by the time this call completes.
      Example...
      Key = 3944417774205889744
      Value = hdfs://localhost:51169/user/userid/-ROOT-/70236052/info/3944417774205889744
      Parameters:
      resultMap - map to add values. If null, this method will create and populate one to return
      fs - The file system to use.
      hbaseRootDir - The root directory to scan.
      tableName - name of the table to scan.
      sfFilter - optional path filter to apply to store files
      executor - optional executor service to parallelize this operation
      progressReporter - Instance or null; gets called every time we move to new region of family dir and for each store file.
      Returns:
      Map keyed by StoreFile name with a value of the full Path.
      Throws:
      IOException - When scanning the directory fails.
      InterruptedException - the thread is interrupted, either before or during the activity.
    • getRegionReferenceFileCount

      public static int getRegionReferenceFileCount(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path p)
    • getTableStoreFilePathMap

      public static Map<String,org.apache.hadoop.fs.Path> getTableStoreFilePathMap(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path hbaseRootDir) throws IOException, InterruptedException
      Runs through the HBase rootdir and creates a reverse lookup map for table StoreFile names to the full Path.
      Example...
      Key = 3944417774205889744
      Value = hdfs://localhost:51169/user/userid/-ROOT-/70236052/info/3944417774205889744
      Parameters:
      fs - The file system to use.
      hbaseRootDir - The root directory to scan.
      Returns:
      Map keyed by StoreFile name with a value of the full Path.
      Throws:
      IOException - When scanning the directory fails.
      InterruptedException
    • getTableStoreFilePathMap

      @Deprecated public static Map<String,org.apache.hadoop.fs.Path> getTableStoreFilePathMap(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path hbaseRootDir, org.apache.hadoop.fs.PathFilter sfFilter, ExecutorService executor, HbckErrorReporter progressReporter) throws IOException, InterruptedException
      Runs through the HBase rootdir and creates a reverse lookup map for table StoreFile names to the full Path.
      Example...
      Key = 3944417774205889744
      Value = hdfs://localhost:51169/user/userid/-ROOT-/70236052/info/3944417774205889744
      Parameters:
      fs - The file system to use.
      hbaseRootDir - The root directory to scan.
      sfFilter - optional path filter to apply to store files
      executor - optional executor service to parallelize this operation
      progressReporter - Instance or null; gets called every time we move to new region of family dir and for each store file.
      Returns:
      Map keyed by StoreFile name with a value of the full Path.
      Throws:
      IOException - When scanning the directory fails.
      InterruptedException
    • getTableStoreFilePathMap

      public static Map<String,org.apache.hadoop.fs.Path> getTableStoreFilePathMap(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path hbaseRootDir, org.apache.hadoop.fs.PathFilter sfFilter, ExecutorService executor, FSUtils.ProgressReporter progressReporter) throws IOException, InterruptedException
      Runs through the HBase rootdir and creates a reverse lookup map for table StoreFile names to the full Path.
      Example...
      Key = 3944417774205889744
      Value = hdfs://localhost:51169/user/userid/-ROOT-/70236052/info/3944417774205889744
      Parameters:
      fs - The file system to use.
      hbaseRootDir - The root directory to scan.
      sfFilter - optional path filter to apply to store files
      executor - optional executor service to parallelize this operation
      progressReporter - Instance or null; gets called every time we move to new region of family dir and for each store file.
      Returns:
      Map keyed by StoreFile name with a value of the full Path.
      Throws:
      IOException - When scanning the directory fails.
      InterruptedException
    • filterFileStatuses

      public static List<org.apache.hadoop.fs.FileStatus> filterFileStatuses(org.apache.hadoop.fs.FileStatus[] input, FileStatusFilter filter)
      Filters FileStatuses in an array and returns a list
      Parameters:
      input - An array of FileStatuses
      filter - A required filter to filter the array
      Returns:
      A list of FileStatuses
    • filterFileStatuses

      public static List<org.apache.hadoop.fs.FileStatus> filterFileStatuses(Iterator<org.apache.hadoop.fs.FileStatus> input, FileStatusFilter filter)
      Filters FileStatuses in an iterator and returns a list
      Parameters:
      input - An iterator of FileStatuses
      filter - A required filter to filter the array
      Returns:
      A list of FileStatuses
    • listStatusWithStatusFilter

      public static List<org.apache.hadoop.fs.FileStatus> listStatusWithStatusFilter(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path dir, FileStatusFilter filter) throws IOException
      Calls fs.listStatus() and treats FileNotFoundException as non-fatal This accommodates differences between hadoop versions, where hadoop 1 does not throw a FileNotFoundException, and return an empty FileStatus[] while Hadoop 2 will throw FileNotFoundException.
      Parameters:
      fs - file system
      dir - directory
      filter - file status filter
      Returns:
      null if dir is empty or doesn't exist, otherwise FileStatus list
      Throws:
      IOException
    • getRegionDegreeLocalityMappingFromFS

      public static Map<String,Map<String,Float>> getRegionDegreeLocalityMappingFromFS(org.apache.hadoop.conf.Configuration conf) throws IOException
      This function is to scan the root path of the file system to get the degree of locality for each region on each of the servers having at least one block of that region. This is used by the tool RegionPlacementMaintainer the configuration to use
      Returns:
      the mapping from region encoded name to a map of server names to locality fraction in case of file system errors or interrupts
      Throws:
      IOException
    • getRegionDegreeLocalityMappingFromFS

      public static Map<String,Map<String,Float>> getRegionDegreeLocalityMappingFromFS(org.apache.hadoop.conf.Configuration conf, String desiredTable, int threadPoolSize) throws IOException
      This function is to scan the root path of the file system to get the degree of locality for each region on each of the servers having at least one block of that region. the configuration to use the table you wish to scan locality for the thread pool size to use
      Returns:
      the mapping from region encoded name to a map of server names to locality fraction in case of file system errors or interrupts
      Throws:
      IOException
    • getRegionLocalityMappingFromFS

      private static void getRegionLocalityMappingFromFS(org.apache.hadoop.conf.Configuration conf, String desiredTable, int threadPoolSize, Map<String,Map<String,Float>> regionDegreeLocalityMapping) throws IOException
      This function is to scan the root path of the file system to get either the mapping between the region name and its best locality region server or the degree of locality of each region on each of the servers having at least one block of that region. The output map parameters are both optional. the configuration to use the table you wish to scan locality for the thread pool size to use the map into which to put the locality degree mapping or null, must be a thread-safe implementation in case of file system errors or interrupts
      Throws:
      IOException
    • setupShortCircuitRead

      public static void setupShortCircuitRead(org.apache.hadoop.conf.Configuration conf)
      Do our short circuit read setup. Checks buffer size to use and whether to do checksumming in hbase or hdfs.
    • checkShortCircuitReadBufferSize

      public static void checkShortCircuitReadBufferSize(org.apache.hadoop.conf.Configuration conf)
      Check if short circuit read buffer size is set and if not, set it to hbase value.
    • getDFSHedgedReadMetrics

      public static org.apache.hadoop.hdfs.DFSHedgedReadMetrics getDFSHedgedReadMetrics(org.apache.hadoop.conf.Configuration c) throws IOException
      Returns The DFSClient DFSHedgedReadMetrics instance or null if can't be found or not on hdfs.
      Throws:
      IOException
    • copyFilesParallel

      public static List<org.apache.hadoop.fs.Path> copyFilesParallel(org.apache.hadoop.fs.FileSystem srcFS, org.apache.hadoop.fs.Path src, org.apache.hadoop.fs.FileSystem dstFS, org.apache.hadoop.fs.Path dst, org.apache.hadoop.conf.Configuration conf, int threads) throws IOException
      Throws:
      IOException
    • copyFiles

      private static List<org.apache.hadoop.fs.Path> copyFiles(org.apache.hadoop.fs.FileSystem srcFS, org.apache.hadoop.fs.Path src, org.apache.hadoop.fs.FileSystem dstFS, org.apache.hadoop.fs.Path dst, org.apache.hadoop.conf.Configuration conf, ExecutorService pool, List<Future<Void>> futures) throws IOException
      Throws:
      IOException
    • getNNAddresses

      private static Set<InetSocketAddress> getNNAddresses(org.apache.hadoop.hdfs.DistributedFileSystem fs, org.apache.hadoop.conf.Configuration conf)
      Returns A set containing all namenode addresses of fs
    • isSameHdfs

      public static boolean isSameHdfs(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.fs.FileSystem srcFs, org.apache.hadoop.fs.FileSystem desFs)
      Parameters:
      conf - the Configuration of HBase
      Returns:
      Whether srcFs and desFs are on same hdfs or not