Class AbstractFSWAL<W extends WALProvider.WriterBase>

java.lang.Object
org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL<W>
All Implemented Interfaces:
Closeable, AutoCloseable, WALFileLengthProvider, WAL
Direct Known Subclasses:
AsyncFSWAL, FSHLog

@Private public abstract class AbstractFSWAL<W extends WALProvider.WriterBase> extends Object implements WAL
Implementation of WAL to go against FileSystem; i.e. keep WALs in HDFS. Only one WAL is ever being written at a time. When a WAL hits a configured maximum size, it is rolled. This is done internal to the implementation.

As data is flushed from the MemStore to other on-disk structures (files sorted by key, hfiles), a WAL becomes obsolete. We can let go of all the log edits/entries for a given HRegion-sequence id. A bunch of work in the below is done keeping account of these region sequence ids -- what is flushed out to hfiles, and what is yet in WAL and in memory only.

It is only practical to delete entire files. Thus, we delete an entire on-disk file F when all of the edits in F have a log-sequence-id that's older (smaller) than the most-recent flush.

To read an WAL, call WALFactory.createStreamReader(FileSystem, Path) for one way read, call WALFactory.createTailingReader(FileSystem, Path, Configuration, long) for replication where we may want to tail the active WAL file.

Failure Semantic

If an exception on append or sync, roll the WAL because the current WAL is now a lame duck; any more appends or syncs will fail also with the same original exception. If we have made successful appends to the WAL and we then are unable to sync them, our current semantic is to return error to the client that the appends failed but also to abort the current context, usually the hosting server. We need to replay the WALs.
TODO: Change this semantic. A roll of WAL may be sufficient as long as we have flagged client that the append failed.
TODO: replication may pick up these last edits though they have been marked as failed append (Need to keep our own file lengths, not rely on HDFS).
  • Field Details

  • Constructor Details

  • Method Details

    • getFilenum

      public long getFilenum()
    • getFileNumFromFileName

      protected long getFileNumFromFileName(org.apache.hadoop.fs.Path fileName)
      A log file has a creation timestamp (in ms) in its file name (filenum. This helper method returns the creation timestamp from a given log file. It extracts the timestamp assuming the filename is created with the computeFilename(long filenum) method.
      Returns:
      timestamp, as in the log file name.
    • calculateMaxLogFiles

      private int calculateMaxLogFiles(org.apache.hadoop.conf.Configuration conf, long logRollSize)
    • getPreallocatedEventCount

      protected final int getPreallocatedEventCount()
    • init

      public void init() throws IOException
      Used to initialize the WAL. Usually just call rollWriter to create the first log writer.
      Throws:
      IOException
    • registerWALActionsListener

      Description copied from interface: WAL
      Registers WALActionsListener
      Specified by:
      registerWALActionsListener in interface WAL
    • unregisterWALActionsListener

      Description copied from interface: WAL
      Unregisters WALActionsListener
      Specified by:
      unregisterWALActionsListener in interface WAL
    • getCoprocessorHost

      Description copied from interface: WAL
      Returns Coprocessor host.
      Specified by:
      getCoprocessorHost in interface WAL
    • startCacheFlush

      public Long startCacheFlush(byte[] encodedRegionName, Set<byte[]> families)
      Description copied from interface: WAL
      WAL keeps track of the sequence numbers that are as yet not flushed im memstores in order to be able to do accounting to figure which WALs can be let go. This method tells WAL that some region is about to flush. The flush can be the whole region or for a column family of the region only.

      Currently, it is expected that the update lock is held for the region; i.e. no concurrent appends while we set up cache flush.

      Specified by:
      startCacheFlush in interface WAL
      families - Families to flush. May be a subset of all families in the region.
      Returns:
      Returns HConstants.NO_SEQNUM if we are flushing the whole region OR if we are flushing a subset of all families but there are no edits in those families not being flushed; in other words, this is effectively same as a flush of all of the region though we were passed a subset of regions. Otherwise, it returns the sequence id of the oldest/lowest outstanding edit.
      See Also:
    • startCacheFlush

      public Long startCacheFlush(byte[] encodedRegionName, Map<byte[],Long> familyToSeq)
      Specified by:
      startCacheFlush in interface WAL
    • completeCacheFlush

      public void completeCacheFlush(byte[] encodedRegionName, long maxFlushedSeqId)
      Description copied from interface: WAL
      Complete the cache flush.
      Specified by:
      completeCacheFlush in interface WAL
      Parameters:
      encodedRegionName - Encoded region name.
      maxFlushedSeqId - The maxFlushedSeqId for this flush. There is no edit in memory that is less that this sequence id.
      See Also:
    • abortCacheFlush

      public void abortCacheFlush(byte[] encodedRegionName)
      Description copied from interface: WAL
      Abort a cache flush. Call if the flush fails. Note that the only recovery for an aborted flush currently is a restart of the regionserver so the snapshot content dropped by the failure gets restored to the memstore.
      Specified by:
      abortCacheFlush in interface WAL
      Parameters:
      encodedRegionName - Encoded region name.
    • getEarliestMemStoreSeqNum

      public long getEarliestMemStoreSeqNum(byte[] encodedRegionName)
      Description copied from interface: WAL
      Gets the earliest unflushed sequence id in the memstore for the region.
      Specified by:
      getEarliestMemStoreSeqNum in interface WAL
      Parameters:
      encodedRegionName - The region to get the number for.
      Returns:
      The earliest/lowest/oldest sequence id if present, HConstants.NO_SEQNUM if absent.
    • getEarliestMemStoreSeqNum

      public long getEarliestMemStoreSeqNum(byte[] encodedRegionName, byte[] familyName)
      Description copied from interface: WAL
      Gets the earliest unflushed sequence id in the memstore for the store.
      Specified by:
      getEarliestMemStoreSeqNum in interface WAL
      Parameters:
      encodedRegionName - The region to get the number for.
      familyName - The family to get the number for.
      Returns:
      The earliest/lowest/oldest sequence id if present, HConstants.NO_SEQNUM if absent.
    • rollWriter

      public Map<byte[],List<byte[]>> rollWriter() throws FailedLogCloseException, IOException
      Description copied from interface: WAL
      Roll the log writer. That is, start writing log messages to a new file.

      The implementation is synchronized in order to make sure there's one rollWriter running at any given time.

      Specified by:
      rollWriter in interface WAL
      Returns:
      If lots of logs, flush the stores of returned regions so next time through we can clean logs. Returns null if nothing to flush. Names are actual region names as returned by RegionInfo.getEncodedName()
      Throws:
      FailedLogCloseException
      IOException
    • sync

      public final void sync() throws IOException
      Description copied from interface: WAL
      Sync what we have in the WAL.
      Specified by:
      sync in interface WAL
      Throws:
      IOException
    • sync

      public final void sync(long txid) throws IOException
      Description copied from interface: WAL
      Sync the WAL if the txId was not already sync'd.
      Specified by:
      sync in interface WAL
      Parameters:
      txid - Transaction id to sync to.
      Throws:
      IOException
    • sync

      public final void sync(boolean forceSync) throws IOException
      Specified by:
      sync in interface WAL
      Parameters:
      forceSync - Flag to force sync rather than flushing to the buffer. Example - Hadoop hflush vs hsync.
      Throws:
      IOException
    • sync

      public final void sync(long txid, boolean forceSync) throws IOException
      Specified by:
      sync in interface WAL
      Parameters:
      txid - Transaction id to sync to.
      forceSync - Flag to force sync rather than flushing to the buffer. Example - Hadoop hflush vs hsync.
      Throws:
      IOException
    • doSync

      protected abstract void doSync(boolean forceSync) throws IOException
      Throws:
      IOException
    • doSync

      protected abstract void doSync(long txid, boolean forceSync) throws IOException
      Throws:
      IOException
    • computeFilename

      protected org.apache.hadoop.fs.Path computeFilename(long filenum)
      This is a convenience method that computes a new filename with a given file-number.
      Parameters:
      filenum - to use
    • getCurrentFileName

      public org.apache.hadoop.fs.Path getCurrentFileName()
      This is a convenience method that computes a new filename with a given using the current WAL file-number
    • getNewPath

      private org.apache.hadoop.fs.Path getNewPath() throws IOException
      retrieve the next path to use for writing. Increments the internal filenum.
      Throws:
      IOException
    • getOldPath

      public org.apache.hadoop.fs.Path getOldPath()
    • tellListenersAboutPreLogRoll

      private void tellListenersAboutPreLogRoll(org.apache.hadoop.fs.Path oldPath, org.apache.hadoop.fs.Path newPath) throws IOException
      Tell listeners about pre log roll.
      Throws:
      IOException
    • tellListenersAboutPostLogRoll

      private void tellListenersAboutPostLogRoll(org.apache.hadoop.fs.Path oldPath, org.apache.hadoop.fs.Path newPath) throws IOException
      Tell listeners about post log roll.
      Throws:
      IOException
    • getNumRolledLogFiles

      public int getNumRolledLogFiles()
      Returns the number of rolled log files
    • getNumLogFiles

      public int getNumLogFiles()
      Returns the number of log files in use
    • findRegionsToForceFlush

      Map<byte[],List<byte[]>> findRegionsToForceFlush() throws IOException
      If the number of un-archived WAL files ('live' WALs) is greater than maximum allowed, check the first (oldest) WAL, and return those regions which should be flushed so that it can be let-go/'archived'.
      Returns:
      stores of regions (encodedRegionNames) to flush in order to archive the oldest WAL file
      Throws:
      IOException
    • markClosedAndClean

      protected final void markClosedAndClean(org.apache.hadoop.fs.Path path)
      Mark this WAL file as closed and call cleanOldLogs to see if we can archive this file.
    • cleanOldLogs

      private void cleanOldLogs()
      Archive old logs. A WAL is eligible for archiving if all its WALEdits have been flushed.

      Use synchronized because we may call this method in different threads, normally when replacing writer, and since now close writer may be asynchronous, we will also call this method in the closeExecutor, right after we actually close a WAL writer.

    • archive

      protected void archive(Pair<org.apache.hadoop.fs.Path,Long> log)
    • getWALArchivePath

      public static org.apache.hadoop.fs.Path getWALArchivePath(org.apache.hadoop.fs.Path archiveDir, org.apache.hadoop.fs.Path p)
    • archiveLogFile

      protected void archiveLogFile(org.apache.hadoop.fs.Path p) throws IOException
      Throws:
      IOException
    • logRollAndSetupWalProps

      protected final void logRollAndSetupWalProps(org.apache.hadoop.fs.Path oldPath, org.apache.hadoop.fs.Path newPath, long oldFileLen)
    • createSpan

      private io.opentelemetry.api.trace.Span createSpan(String name)
    • replaceWriter

      org.apache.hadoop.fs.Path replaceWriter(org.apache.hadoop.fs.Path oldPath, org.apache.hadoop.fs.Path newPath, W nextWriter) throws IOException
      Cleans up current writer closing it and then puts in place the passed in nextWriter.

      • In the case of creating a new WAL, oldPath will be null.
      • In the case of rolling over from one file to the next, none of the parameters will be null.
      • In the case of closing out this FSHLog with no further use newPath and nextWriter will be null.
      Parameters:
      oldPath - may be null
      newPath - may be null
      nextWriter - may be null
      Returns:
      the passed in newPath
      Throws:
      IOException - if there is a problem flushing or closing the underlying FS
    • blockOnSync

      protected final void blockOnSync(SyncFuture syncFuture) throws IOException
      Throws:
      IOException
    • ensureIOException

    • convertInterruptedExceptionToIOException

    • rollWriterInternal

      private Map<byte[],List<byte[]>> rollWriterInternal(boolean force) throws IOException
      Throws:
      IOException
    • rollWriter

      public Map<byte[],List<byte[]>> rollWriter(boolean force) throws IOException
      Description copied from interface: WAL
      Roll the log writer. That is, start writing log messages to a new file.

      The implementation is synchronized in order to make sure there's one rollWriter running at any given time. If true, force creation of a new writer even if no entries have been written to the current writer

      Specified by:
      rollWriter in interface WAL
      Returns:
      If lots of logs, flush the stores of returned regions so next time through we can clean logs. Returns null if nothing to flush. Names are actual region names as returned by RegionInfo.getEncodedName()
      Throws:
      IOException
    • getLogFileSize

      public long getLogFileSize()
      Returns the size of log files in use
    • requestLogRoll

      public void requestLogRoll()
    • getFiles

      org.apache.hadoop.fs.FileStatus[] getFiles() throws IOException
      Get the backing files associated with this WAL.
      Returns:
      may be null if there are no files.
      Throws:
      IOException
    • shutdown

      public void shutdown() throws IOException
      Description copied from interface: WAL
      Stop accepting new writes. If we have unsynced writes still in buffer, sync them. Extant edits are left in place in backing storage to be replayed later.
      Specified by:
      shutdown in interface WAL
      Throws:
      IOException
    • close

      public void close() throws IOException
      Description copied from interface: WAL
      Caller no longer needs any edits from this WAL. Implementers are free to reclaim underlying resources after this call; i.e. filesystem based WALs can archive or delete files.
      Specified by:
      close in interface AutoCloseable
      Specified by:
      close in interface Closeable
      Specified by:
      close in interface WAL
      Throws:
      IOException
    • getInflightWALCloseCount

      Returns number of WALs currently in the process of closing.
    • updateStore

      public void updateStore(byte[] encodedRegionName, byte[] familyName, Long sequenceid, boolean onlyIfGreater)
      updates the sequence number of a specific store. depending on the flag: replaces current seq number if the given seq id is bigger, or even if it is lower than existing one
      Specified by:
      updateStore in interface WAL
    • getSyncFuture

      protected final SyncFuture getSyncFuture(long sequence, boolean forceSync)
    • isLogRollRequested

      protected boolean isLogRollRequested()
    • requestLogRoll

    • getUnflushedEntriesCount

    • isUnflushedEntries

    • atHeadOfRingBufferEventHandlerAppend

      Exposed for testing only. Use to tricks like halt the ring buffer appending.
    • appendEntry

      protected final boolean appendEntry(W writer, FSWALEntry entry) throws IOException
      Throws:
      IOException
    • postAppend

      private long postAppend(WAL.Entry e, long elapsedTime) throws IOException
      Throws:
      IOException
    • postSync

      protected final void postSync(long timeInNanos, int handlerSyncs)
    • stampSequenceIdAndPublishToRingBuffer

      protected final long stampSequenceIdAndPublishToRingBuffer(RegionInfo hri, WALKeyImpl key, WALEdit edits, boolean inMemstore, com.lmax.disruptor.RingBuffer<RingBufferTruck> ringBuffer) throws IOException
      Throws:
      IOException
    • toString

      public String toString()
      Description copied from interface: WAL
      Human readable identifying information about the state of this WAL. Implementors are encouraged to include information appropriate for debugging. Consumers are advised not to rely on the details of the returned String; it does not have a defined structure.
      Specified by:
      toString in interface WAL
      Overrides:
      toString in class Object
    • getLogFileSizeIfBeingWritten

      public OptionalLong getLogFileSizeIfBeingWritten(org.apache.hadoop.fs.Path path)
      if the given path is being written currently, then return its length.

      This is used by replication to prevent replicating unacked log entries. See https://issues.apache.org/jira/browse/HBASE-14004 for more details.

      Specified by:
      getLogFileSizeIfBeingWritten in interface WALFileLengthProvider
    • appendData

      public long appendData(RegionInfo info, WALKeyImpl key, WALEdit edits) throws IOException
      Description copied from interface: WAL
      Append a set of data edits to the WAL. 'Data' here means that the content in the edits will also have transitioned through the memstore.

      The WAL is not flushed/sync'd after this transaction completes BUT on return this edit must have its region edit/sequence id assigned else it messes up our unification of mvcc and sequenceid. On return key will have the region edit/sequence id filled in.

      Specified by:
      appendData in interface WAL
      Parameters:
      info - the regioninfo associated with append
      key - Modified by this call; we add to it this edits region edit/sequence id.
      edits - Edits to append. MAY CONTAIN NO EDITS for case where we want to get an edit sequence id that is after all currently appended edits.
      Returns:
      Returns a 'transaction id' and key will have the region edit/sequence id in it.
      Throws:
      IOException
      See Also:
    • appendMarker

      public long appendMarker(RegionInfo info, WALKeyImpl key, WALEdit edits) throws IOException
      Description copied from interface: WAL
      Append an operational 'meta' event marker edit to the WAL. A marker meta edit could be a FlushDescriptor, a compaction marker, or a region event marker; e.g. region open or region close. The difference between a 'marker' append and a 'data' append as in WAL.appendData(RegionInfo, WALKeyImpl, WALEdit)is that a marker will not have transitioned through the memstore.

      The WAL is not flushed/sync'd after this transaction completes BUT on return this edit must have its region edit/sequence id assigned else it messes up our unification of mvcc and sequenceid. On return key will have the region edit/sequence id filled in.

      Specified by:
      appendMarker in interface WAL
      Parameters:
      info - the regioninfo associated with append
      key - Modified by this call; we add to it this edits region edit/sequence id.
      edits - Edits to append. MAY CONTAIN NO EDITS for case where we want to get an edit sequence id that is after all currently appended edits.
      Returns:
      Returns a 'transaction id' and key will have the region edit/sequence id in it.
      Throws:
      IOException
      See Also:
    • append

      protected abstract long append(RegionInfo info, WALKeyImpl key, WALEdit edits, boolean inMemstore) throws IOException
      Append a set of edits to the WAL.

      The WAL is not flushed/sync'd after this transaction completes BUT on return this edit must have its region edit/sequence id assigned else it messes up our unification of mvcc and sequenceid. On return key will have the region edit/sequence id filled in.

      NOTE: This appends, at a time that is usually after this call returns, starts a mvcc transaction by calling 'begin' wherein which we assign this update a sequenceid. At assignment time, we stamp all the passed in Cells inside WALEdit with their sequenceId. You must 'complete' the transaction this mvcc transaction by calling MultiVersionConcurrencyControl#complete(...) or a variant otherwise mvcc will get stuck. Do it in the finally of a try/finally block within which this appends lives and any subsequent operations like sync or update of memstore, etc. Get the WriteEntry to pass mvcc out of the passed in WALKey walKey parameter. Be warned that the WriteEntry is not immediately available on return from this method. It WILL be available subsequent to a sync of this append; otherwise, you will just have to wait on the WriteEntry to get filled in.

      Parameters:
      info - the regioninfo associated with append
      key - Modified by this call; we add to it this edits region edit/sequence id.
      edits - Edits to append. MAY CONTAIN NO EDITS for case where we want to get an edit sequence id that is after all currently appended edits.
      inMemstore - Always true except for case where we are writing a region event meta marker edit, for example, a compaction completion record into the WAL or noting a Region Open event. In these cases the entry is just so we can finish an unfinished compaction after a crash when the new Server reads the WAL on recovery, etc. These transition event 'Markers' do not go via the memstore. When memstore is false, we presume a Marker event edit.
      Returns:
      Returns a 'transaction id' and key will have the region edit/sequence id in it.
      Throws:
      IOException
    • doAppend

      protected abstract void doAppend(W writer, FSWALEntry entry) throws IOException
      Throws:
      IOException
    • createWriterInstance

      protected abstract W createWriterInstance(org.apache.hadoop.fs.Path path) throws IOException, CommonFSUtils.StreamLacksCapabilityException
      Throws:
      IOException
      CommonFSUtils.StreamLacksCapabilityException
    • doReplaceWriter

      protected abstract void doReplaceWriter(org.apache.hadoop.fs.Path oldPath, org.apache.hadoop.fs.Path newPath, W nextWriter) throws IOException
      Notice that you need to clear the rollRequested flag in this method, as the new writer will begin to work before returning from this method. If we clear the flag after returning from this call, we may miss a roll request. The implementation class should choose a proper place to clear the rollRequested flag, so we do not miss a roll request, typically before you start writing to the new writer.
      Throws:
      IOException
    • doShutdown

      protected abstract void doShutdown() throws IOException
      Throws:
      IOException
    • doCheckLogLowReplication

      protected abstract boolean doCheckLogLowReplication()
    • doCheckSlowSync

      protected boolean doCheckSlowSync()
      Returns true if we exceeded the slow sync roll threshold over the last check interval
    • checkLogLowReplication

      public void checkLogLowReplication(long checkInterval)
    • getPipeline

      abstract org.apache.hadoop.hdfs.protocol.DatanodeInfo[] getPipeline()
      This method gets the pipeline for the current WAL.
    • getLogReplication

      abstract int getLogReplication()
      This method gets the datanode replication count for the current WAL.
    • split

      private static void split(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.fs.Path p) throws IOException
      Throws:
      IOException
    • getWriter

    • usage

      private static void usage()
    • main

      public static void main(String[] args) throws IOException
      Pass one or more log file names, and it will either dump out a text version on stdout or split the specified log files.
      Throws:
      IOException