Class FileLink

java.lang.Object
org.apache.hadoop.hbase.io.FileLink
Direct Known Subclasses:
HFileLink, WALLink

@Private public class FileLink extends Object
The FileLink is a sort of hardlink, that allows access to a file given a set of locations.

The Problem:

  • HDFS doesn't have support for hardlinks, and this make impossible to referencing the same data blocks using different names.
  • HBase store files in one location (e.g. table/region/family/) and when the file is not needed anymore (e.g. compaction, region deletion, ...) moves it to an archive directory.
If we want to create a reference to a file, we need to remember that it can be in its original location or in the archive folder. The FileLink class tries to abstract this concept and given a set of locations it is able to switch between them making this operation transparent for the user. HFileLink is a more concrete implementation of the FileLink.

Back-references: To help the CleanerChore to keep track of the links to a particular file, during the FileLink creation, a new file is placed inside a back-reference directory. There's one back-reference directory for each file that has links, and in the directory there's one file per link.

HFileLink Example

  • /hbase/table/region-x/cf/file-k (Original File)
  • /hbase/table-cloned/region-y/cf/file-k.region-x.table (HFileLink to the original file)
  • /hbase/table-2nd-cloned/region-z/cf/file-k.region-x.table (HFileLink to the original file)
  • /hbase/.archive/table/region-x/.links-file-k/region-y.table-cloned (Back-reference to the link in table-cloned)
  • /hbase/.archive/table/region-x/.links-file-k/region-z.table-2nd-cloned (Back-reference to the link in table-2nd-cloned)
  • Nested Class Summary

    Nested Classes
    Modifier and Type
    Class
    Description
    protected static class 
    FileLink InputStream that handles the switch between the original path and the alternative locations, when the file is moved.
  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    static final String
    Define the Back-reference directory name prefix: .links-<hfile>/
    private org.apache.hadoop.fs.Path[]
     
    private static final org.slf4j.Logger
     
  • Constructor Summary

    Constructors
    Modifier
    Constructor
    Description
    protected
     
     
    FileLink(Collection<org.apache.hadoop.fs.Path> locations)
     
     
    FileLink(org.apache.hadoop.fs.Path originPath, org.apache.hadoop.fs.Path... alternativePaths)
     
  • Method Summary

    Modifier and Type
    Method
    Description
    boolean
     
    boolean
    exists(org.apache.hadoop.fs.FileSystem fs)
    Returns true if the file pointed by the link exists
    org.apache.hadoop.fs.Path
    getAvailablePath(org.apache.hadoop.fs.FileSystem fs)
    Returns the path of the first available link.
    static String
    getBackReferenceFileName(org.apache.hadoop.fs.Path dirPath)
    Get the referenced file name from the reference link directory path.
    static org.apache.hadoop.fs.Path
    getBackReferencesDir(org.apache.hadoop.fs.Path storeDir, String fileName)
    Get the directory to store the link back references
    org.apache.hadoop.fs.FileStatus
    getFileStatus(org.apache.hadoop.fs.FileSystem fs)
    Get the FileStatus of the referenced file.
    org.apache.hadoop.fs.Path[]
    Returns the locations to look for the linked file.
    static org.apache.hadoop.fs.FSDataInputStream
    getUnderlyingFileLinkInputStream(org.apache.hadoop.fs.FSDataInputStream stream)
    If the passed FSDataInputStream is backed by a FileLink, returns the underlying InputStream for the resolved link target.
    private static IOException
    handleAccessLocationException(FileLink fileLink, IOException newException, IOException previousException)
    Handle exceptions which are thrown when access locations of file link
    int
     
    static boolean
    isBackReferencesDir(org.apache.hadoop.fs.Path dirPath)
    Checks if the specified directory path is a back reference links folder.
    org.apache.hadoop.fs.FSDataInputStream
    open(org.apache.hadoop.fs.FileSystem fs)
    Open the FileLink for read.
    org.apache.hadoop.fs.FSDataInputStream
    open(org.apache.hadoop.fs.FileSystem fs, int bufferSize)
    Open the FileLink for read.
    protected void
    setLocations(org.apache.hadoop.fs.Path originPath, org.apache.hadoop.fs.Path... alternativePaths)
    NOTE: This method must be used only in the constructor! It creates a List with the specified locations for the link.
     

    Methods inherited from class java.lang.Object

    clone, finalize, getClass, notify, notifyAll, wait, wait, wait
  • Field Details

  • Constructor Details

    • FileLink

      protected FileLink()
    • FileLink

      public FileLink(org.apache.hadoop.fs.Path originPath, org.apache.hadoop.fs.Path... alternativePaths)
      Parameters:
      originPath - Original location of the file to link
      alternativePaths - Alternative locations to look for the linked file
    • FileLink

      public FileLink(Collection<org.apache.hadoop.fs.Path> locations)
      Parameters:
      locations - locations to look for the linked file
  • Method Details

    • getLocations

      public org.apache.hadoop.fs.Path[] getLocations()
      Returns the locations to look for the linked file.
    • toString

      public String toString()
      Overrides:
      toString in class Object
    • exists

      public boolean exists(org.apache.hadoop.fs.FileSystem fs) throws IOException
      Returns true if the file pointed by the link exists
      Throws:
      IOException
    • getAvailablePath

      public org.apache.hadoop.fs.Path getAvailablePath(org.apache.hadoop.fs.FileSystem fs) throws IOException
      Returns the path of the first available link.
      Throws:
      IOException
    • getFileStatus

      public org.apache.hadoop.fs.FileStatus getFileStatus(org.apache.hadoop.fs.FileSystem fs) throws IOException
      Get the FileStatus of the referenced file.
      Parameters:
      fs - FileSystem on which to get the file status
      Returns:
      InputStream for the hfile link.
      Throws:
      IOException - on unexpected error.
    • handleAccessLocationException

      private static IOException handleAccessLocationException(FileLink fileLink, IOException newException, IOException previousException) throws IOException
      Handle exceptions which are thrown when access locations of file link
      Parameters:
      fileLink - the file link
      newException - the exception caught by access the current location
      previousException - the previous exception caught by access the other locations
      Returns:
      return AccessControlException if access one of the locations caught, otherwise return FileNotFoundException. The AccessControlException is threw if user scan snapshot feature is enabled, see SnapshotScannerHDFSAclController.
      Throws:
      IOException - if the exception is neither AccessControlException nor FileNotFoundException
    • open

      public org.apache.hadoop.fs.FSDataInputStream open(org.apache.hadoop.fs.FileSystem fs) throws IOException
      Open the FileLink for read.

      It uses a wrapper of FSDataInputStream that is agnostic to the location of the file, even if the file switches between locations.

      Parameters:
      fs - FileSystem on which to open the FileLink
      Returns:
      InputStream for reading the file link.
      Throws:
      IOException - on unexpected error.
    • open

      public org.apache.hadoop.fs.FSDataInputStream open(org.apache.hadoop.fs.FileSystem fs, int bufferSize) throws IOException
      Open the FileLink for read.

      It uses a wrapper of FSDataInputStream that is agnostic to the location of the file, even if the file switches between locations.

      Parameters:
      fs - FileSystem on which to open the FileLink
      bufferSize - the size of the buffer to be used.
      Returns:
      InputStream for reading the file link.
      Throws:
      IOException - on unexpected error.
    • getUnderlyingFileLinkInputStream

      public static org.apache.hadoop.fs.FSDataInputStream getUnderlyingFileLinkInputStream(org.apache.hadoop.fs.FSDataInputStream stream)
      If the passed FSDataInputStream is backed by a FileLink, returns the underlying InputStream for the resolved link target. Otherwise, returns null.
    • setLocations

      protected void setLocations(org.apache.hadoop.fs.Path originPath, org.apache.hadoop.fs.Path... alternativePaths)
      NOTE: This method must be used only in the constructor! It creates a List with the specified locations for the link.
    • getBackReferencesDir

      public static org.apache.hadoop.fs.Path getBackReferencesDir(org.apache.hadoop.fs.Path storeDir, String fileName)
      Get the directory to store the link back references

      To simplify the reference count process, during the FileLink creation a back-reference is added to the back-reference directory of the specified file.

      Parameters:
      storeDir - Root directory for the link reference folder
      fileName - File Name with links
      Returns:
      Path for the link back references.
    • getBackReferenceFileName

      public static String getBackReferenceFileName(org.apache.hadoop.fs.Path dirPath)
      Get the referenced file name from the reference link directory path.
      Parameters:
      dirPath - Link references directory path
      Returns:
      Name of the file referenced
    • isBackReferencesDir

      public static boolean isBackReferencesDir(org.apache.hadoop.fs.Path dirPath)
      Checks if the specified directory path is a back reference links folder.
      Parameters:
      dirPath - Directory path to verify
      Returns:
      True if the specified directory is a link references folder
    • equals

      public boolean equals(Object obj)
      Overrides:
      equals in class Object
    • hashCode

      public int hashCode()
      Overrides:
      hashCode in class Object