Class HFileCorruptionChecker

java.lang.Object
org.apache.hadoop.hbase.util.hbck.HFileCorruptionChecker

@Private public class HFileCorruptionChecker extends Object
This class marches through all of the region's hfiles and verifies that they are all valid files. One just needs to instantiate the class, use checkTables(List<Path>) and then retrieve the corrupted hfiles (and quarantined files if in quarantining mode) The implementation currently parallelizes at the regionDir level.
  • Field Details

  • Constructor Details

  • Method Details

    • checkHFile

      protected void checkHFile(org.apache.hadoop.fs.Path p) throws IOException
      Checks a path to see if it is a valid hfile. full Path to an HFile This is a connectivity related exception
      Throws:
      IOException
    • createQuarantinePath

      org.apache.hadoop.fs.Path createQuarantinePath(org.apache.hadoop.fs.Path hFile) throws IOException
      Given a path, generates a new path to where we move a corrupted hfile (bad trailer, no trailer). Path to a corrupt hfile (assumes that it is HBASE_DIR/ table /region/cf/file)
      Returns:
      path to where corrupted files are stored. This should be HBASE_DIR/.corrupt/table/region/cf/file.
      Throws:
      IOException
    • checkColFamDir

      protected void checkColFamDir(org.apache.hadoop.fs.Path cfDir) throws IOException
      Check all files in a column family dir. column family directory
      Throws:
      IOException
    • checkMobColFamDir

      protected void checkMobColFamDir(org.apache.hadoop.fs.Path cfDir) throws IOException
      Check all files in a mob column family dir. mob column family directory
      Throws:
      IOException
    • checkMobFile

      protected void checkMobFile(org.apache.hadoop.fs.Path p) throws IOException
      Checks a path to see if it is a valid mob file. full Path to a mob file. This is a connectivity related exception
      Throws:
      IOException
    • checkMobRegionDir

      private void checkMobRegionDir(org.apache.hadoop.fs.Path regionDir) throws IOException
      Checks all the mob files of a table.
      Parameters:
      regionDir - The mob region directory
      Throws:
      IOException
    • checkRegionDir

      protected void checkRegionDir(org.apache.hadoop.fs.Path regionDir) throws IOException
      Check all column families in a region dir. region directory
      Throws:
      IOException
    • checkTableDir

      void checkTableDir(org.apache.hadoop.fs.Path tableDir) throws IOException
      Check all the regiondirs in the specified tableDir path to a table
      Throws:
      IOException
    • createMobRegionDirChecker

      private HFileCorruptionChecker.MobRegionDirChecker createMobRegionDirChecker(org.apache.hadoop.fs.Path tableDir)
      Creates an instance of MobRegionDirChecker.
      Parameters:
      tableDir - The current table directory.
      Returns:
      An instance of MobRegionDirChecker.
    • checkTables

      public void checkTables(Collection<org.apache.hadoop.fs.Path> tables) throws IOException
      Check the specified table dirs for bad hfiles.
      Throws:
      IOException
    • getFailures

      public Collection<org.apache.hadoop.fs.Path> getFailures()
      Returns the set of check failure file paths after checkTables is called.
    • getCorrupted

      public Collection<org.apache.hadoop.fs.Path> getCorrupted()
      Returns the set of corrupted file paths after checkTables is called.
    • getHFilesChecked

      public int getHFilesChecked()
      Returns number of hfiles checked in the last HfileCorruptionChecker run
    • getQuarantined

      public Collection<org.apache.hadoop.fs.Path> getQuarantined()
      Returns the set of successfully quarantined paths after checkTables is called.
    • getMissing

      public Collection<org.apache.hadoop.fs.Path> getMissing()
      Returns:
      the set of paths that were missing. Likely due to deletion/moves from compaction or flushes.
    • getFailureMobFiles

      public Collection<org.apache.hadoop.fs.Path> getFailureMobFiles()
      Returns the set of check failure mob file paths after checkTables is called.
    • getCorruptedMobFiles

      public Collection<org.apache.hadoop.fs.Path> getCorruptedMobFiles()
      Returns the set of corrupted mob file paths after checkTables is called.
    • getMobFilesChecked

      public int getMobFilesChecked()
      Returns number of mob files checked in the last HfileCorruptionChecker run
    • getQuarantinedMobFiles

      public Collection<org.apache.hadoop.fs.Path> getQuarantinedMobFiles()
      Returns the set of successfully quarantined paths after checkTables is called.
    • getMissedMobFiles

      public Collection<org.apache.hadoop.fs.Path> getMissedMobFiles()
      Returns:
      the set of paths that were missing. Likely due to table deletion or deletion/moves from compaction.
    • report

      public void report(HbckErrorReporter out)
      Print a human readable summary of hfile quarantining operations.