Class SequenceIdAccounting

java.lang.Object
org.apache.hadoop.hbase.regionserver.wal.SequenceIdAccounting

@Private public class SequenceIdAccounting extends Object
Accounting of sequence ids per region and then by column family. So we can keep our accounting current, call startCacheFlush and then finishedCacheFlush or abortCacheFlush so this instance can keep abreast of the state of sequence id persistence. Also call update per append.

For the implementation, we assume that all the encodedRegionName passed in are gotten by RegionInfo.getEncodedNameAsBytes(). So it is safe to use it as a hash key. And for family name, we use ImmutableByteArray as key. This is because hash based map is much faster than RBTree or CSLM and here we are on the critical write path. See HBASE-16278 for more details.

  • Field Details

    • LOG

      private static final org.slf4j.Logger LOG
    • tieLock

      private final Object tieLock
      This lock ties all operations on flushingSequenceIds and lowestUnflushedSequenceIds Maps. lowestUnflushedSequenceIds has the lowest outstanding sequence ids EXCEPT when flushing. When we flush, the current lowest set for the region/column family are moved (atomically because of this lock) to flushingSequenceIds.

      The two Maps are tied by this locking object EXCEPT when we go to update the lowest entry; see lowestUnflushedSequenceIds. In here is a putIfAbsent call on lowestUnflushedSequenceIds. In this latter case, we will add this lowest sequence id if we find that there is no entry for the current column family. There will be no entry only if we just came up OR we have moved aside current set of lowest sequence ids because the current set are being flushed (by putting them into flushingSequenceIds). This is how we pick up the next 'lowest' sequence id per region per column family to be used figuring what is in the next flush.

    • lowestUnflushedSequenceIds

      Map of encoded region names and family names to their OLDEST -- i.e. their first, the longest-lived, their 'earliest', the 'lowest' -- sequence id.

      When we flush, the current lowest sequence ids get cleared and added to flushingSequenceIds. The next append that comes in, is then added here to lowestUnflushedSequenceIds as the next lowest sequenceid.

      If flush fails, currently server is aborted so no need to restore previous sequence ids.

      Needs to be concurrent Maps because we use putIfAbsent updating oldest.

    • flushingSequenceIds

      Map of encoded region names and family names to their lowest or OLDEST sequence/edit id currently being flushed out to hfiles. Entries are moved here from lowestUnflushedSequenceIds while the lock tieLock is held (so movement between the Maps is atomic).
    • highestSequenceIds

      private Map<byte[],Long> highestSequenceIds

      Map of region encoded names to the latest/highest region sequence id. Updated on each call to append.

      This map uses byte[] as the key, and uses reference equality. It works in our use case as we use RegionInfo.getEncodedNameAsBytes() as keys. For a given region, it always returns the same array.

  • Constructor Details

  • Method Details

    • getLowestSequenceId

      public long getLowestSequenceId(byte[] encodedRegionName)
      Returns the lowest unflushed sequence id for the region.
      Returns:
      Lowest outstanding unflushed sequenceid for encodedRegionName. Will return HConstants.NO_SEQNUM when none.
    • getLowestSequenceId

      long getLowestSequenceId(byte[] encodedRegionName, byte[] familyName)
      Returns:
      Lowest outstanding unflushed sequenceid for encodedRegionname and familyName. Returned sequenceid may be for an edit currently being flushed.
    • resetHighest

      Map<byte[],Long> resetHighest()
      Reset the accounting of highest sequenceid by regionname.
      Returns:
      Return the previous accounting Map of regions to the last sequence id written into each.
    • update

      void update(byte[] encodedRegionName, Set<byte[]> families, long sequenceid, boolean lowest)
      We've been passed a new sequenceid for the region. Set it as highest seen for this region and if we are to record oldest, or lowest sequenceids, save it as oldest seen if nothing currently older.
      Parameters:
      lowest - Whether to keep running account of oldest sequence id.
    • onRegionClose

      void onRegionClose(byte[] encodedRegionName)
      Clear all the records of the given region as it is going to be closed.

      We will call this once we get the region close marker. We need this because that, if we use Durability.ASYNC_WAL, after calling startCacheFlush, we may still get some ongoing wal entries that has not been processed yet, this will lead to orphan records in the lowestUnflushedSequenceIds and then cause too many WAL files.

      See HBASE-23157 for more details.

    • updateStore

      void updateStore(byte[] encodedRegionName, byte[] familyName, Long sequenceId, boolean onlyIfGreater)
      Update the store sequence id, e.g., upon executing in-memory compaction
    • getOrCreateLowestSequenceIds

    • getLowestSequenceId

      private static long getLowestSequenceId(Map<?,Long> sequenceids)
      Parameters:
      sequenceids - Map to search for lowest value.
      Returns:
      Lowest value found in sequenceids.
    • flattenToLowestSequenceId

      private <T extends Map<?, Long>> Map<byte[],Long> flattenToLowestSequenceId(Map<byte[],T> src)
      Returns:
      New Map that has same keys as src but instead of a Map for a value, it instead has found the smallest sequence id and it returns that as the value instead.
    • startCacheFlush

      Long startCacheFlush(byte[] encodedRegionName, Set<byte[]> families)
      Parameters:
      encodedRegionName - Region to flush.
      families - Families to flush. May be a subset of all families in the region.
      Returns:
      Returns HConstants.NO_SEQNUM if we are flushing the whole region OR if we are flushing a subset of all families but there are no edits in those families not being flushed; in other words, this is effectively same as a flush of all of the region though we were passed a subset of regions. Otherwise, it returns the sequence id of the oldest/lowest outstanding edit.
    • startCacheFlush

      Long startCacheFlush(byte[] encodedRegionName, Map<byte[],Long> familyToSeq)
    • completeCacheFlush

      void completeCacheFlush(byte[] encodedRegionName, long maxFlushedSeqId)
    • abortCacheFlush

      void abortCacheFlush(byte[] encodedRegionName)
    • areAllLower

      boolean areAllLower(Map<byte[],Long> sequenceids, Collection<byte[]> keysBlocking)
      See if passed sequenceids are lower -- i.e. earlier -- than any outstanding sequenceids, sequenceids we are holding on to in this accounting instance.
      Parameters:
      sequenceids - Keyed by encoded region name. Cannot be null (doesn't make sense for it to be null).
      keysBlocking - An optional collection that is used to return the specific keys that are causing this method to return false.
      Returns:
      true if all sequenceids are lower, older than, the old sequenceids in this instance.
    • findLower

      Map<byte[],List<byte[]>> findLower(Map<byte[],Long> sequenceids)
      Iterates over the given Map and compares sequence ids with corresponding entries in lowestUnflushedSequenceIds. If a region in lowestUnflushedSequenceIds has a sequence id less than that passed in sequenceids then return it.
      Parameters:
      sequenceids - Sequenceids keyed by encoded region name.
      Returns:
      stores of regions found in this instance with sequence ids less than those passed in.