org.apache.hadoop.hbase.regionserver.KeyValueHeap

All Implemented Interfaces:: Closeable, AutoCloseable, InternalScanner, KeyValueScanner, Shipper

Direct Known Subclasses:: ReversedKeyValueHeap

@Private public class KeyValueHeap extends NonReversedNonLazyKeyValueScanner implements KeyValueScanner, InternalScanner

Implements a heap merge across any number of KeyValueScanners.

Implements KeyValueScanner itself.

This class is used at the Region level to merge across Stores and at the Store level to merge across the memstore and StoreFiles.

In the Region case, we also need InternalScanner.next(List), so this class also implements InternalScanner. WARNING: As is, if you try to use this as an InternalScanner at the Store level, you will get runtime exceptions.

Nested Class Summary

Nested Classes

Modifier and Type

Class

Description

protected static class

KeyValueHeap.KVScannerComparator
Field Summary

Fields

Modifier and Type

Field

Description

protected KeyValueHeap.KVScannerComparator

comparator

protected KeyValueScanner

current

The current sub-scanner, i.e.

protected PriorityQueue<KeyValueScanner>

heap

private static final org.slf4j.Logger

LOG

protected List<KeyValueScanner>

scannersForDelayedClose

Fields inherited from interface org.apache.hadoop.hbase.regionserver.KeyValueScanner
NO_NEXT_INDEXED_KEY
Constructor Summary

Constructors

Constructor

Description

KeyValueHeap(List<? extends KeyValueScanner> scanners, CellComparator comparator)

Constructor.

KeyValueHeap(List<? extends KeyValueScanner> scanners, KeyValueHeap.KVScannerComparator comparator)

Constructor.
Method Summary

Modifier and Type

Method

Description

void

close()

Close the KeyValue scanner.

private boolean

generalizedSeek(boolean isLazy, Cell seekKey, boolean forward, boolean useBloom)

(package private) KeyValueScanner

getCurrentForTesting()

PriorityQueue<KeyValueScanner>

getHeap()

Returns the current Heap

Cell

getNextIndexedKey()

(package private) boolean

isLatestCellFromMemstore()

Cell

next()

Return the next Cell in this scanner, iterating the scanner

boolean

next(List<Cell> result, ScannerContext scannerContext)

Gets the next row of keys from the top-most scanner.

Cell

peek()

Look at the next Cell in this scanner, but do not iterate scanner.

protected KeyValueScanner

pollRealKV()

Fetches the top sub-scanner from the priority queue, ensuring that a real seek has been done on it.

void

recordBlockSize(IntConsumer blockSizeConsumer)

Record the size of the current block in bytes, passing as an argument to the blockSizeConsumer.

boolean

requestSeek(Cell key, boolean forward, boolean useBloom)

Similar to KeyValueScanner.seek(org.apache.hadoop.hbase.Cell) (or KeyValueScanner.reseek(org.apache.hadoop.hbase.Cell) if forward is true) but only does a seek operation after checking that it is really necessary for the row/column combination specified by the kv parameter.

boolean

reseek(Cell seekKey)

This function is identical to the seek(Cell) function except that scanner.seek(seekKey) is changed to scanner.reseek(seekKey).

boolean

seek(Cell seekKey)

Seeks all scanners at or below the specified seek key.

void

shipped()

Called after a batch of rows scanned and set to be returned to client.

Methods inherited from class org.apache.hadoop.hbase.regionserver.NonReversedNonLazyKeyValueScanner
backwardSeek, seekToLastRow, seekToPreviousRow

Methods inherited from class org.apache.hadoop.hbase.regionserver.NonLazyKeyValueScanner
doRealSeek, enforceSeek, getFilePath, isFileScanner, realSeekDone, shouldUseScanner

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface org.apache.hadoop.hbase.regionserver.InternalScanner
next

Methods inherited from interface org.apache.hadoop.hbase.regionserver.KeyValueScanner
backwardSeek, enforceSeek, getFilePath, getScannerOrder, isFileScanner, realSeekDone, seekToLastRow, seekToPreviousRow, shouldUseScanner

Field Details
- LOG
  
  private static final org.slf4j.Logger LOG
- heap
  
  protected PriorityQueue<KeyValueScanner> heap
- scannersForDelayedClose
  
  protected List<KeyValueScanner> scannersForDelayedClose
- current
  
  protected KeyValueScanner current
  
  The current sub-scanner, i.e. the one that contains the next key/value to return to the client. This scanner is NOT included in heap (but we frequently add it back to the heap and pull the new winner out). We maintain an invariant that the current sub-scanner has already done a real seek, and that current.peek() is always a real key/value (or null) except for the fake last-key-on-row-column supplied by the multi-column Bloom filter optimization, which is OK to propagate to StoreScanner. In order to ensure that, always use pollRealKV() to update current.
- comparator
  
  protected KeyValueHeap.KVScannerComparator comparator
Constructor Details
- KeyValueHeap
  
  public KeyValueHeap(List<? extends KeyValueScanner> scanners, CellComparator comparator) throws IOException
  
  Constructor. This KeyValueHeap will handle closing of passed in KeyValueScanners.
  
  Throws:
  
  IOException
- KeyValueHeap
  
  KeyValueHeap(List<? extends KeyValueScanner> scanners, KeyValueHeap.KVScannerComparator comparator) throws IOException
  
  Constructor.
  
  Throws:
  
  IOException
Method Details
- peek
  
  public Cell peek()
  
  Description copied from interface: KeyValueScanner
  
  Look at the next Cell in this scanner, but do not iterate scanner. NOTICE: The returned cell has not been passed into ScanQueryMatcher. So it may not be what the user need.
  
  Specified by:
  
  peek in interface KeyValueScanner
  
  Returns:
  
  the next Cell
- isLatestCellFromMemstore
  
  boolean isLatestCellFromMemstore()
- recordBlockSize
  
  public void recordBlockSize(IntConsumer blockSizeConsumer)
  
  Description copied from interface: KeyValueScanner
  
  Record the size of the current block in bytes, passing as an argument to the blockSizeConsumer. Implementations should ensure that blockSizeConsumer is only called once per block.
  
  Specified by:
  
  recordBlockSize in interface KeyValueScanner
  
  Overrides:
  
  recordBlockSize in class NonLazyKeyValueScanner
  
  Parameters:
  
  blockSizeConsumer - to be called with block size in bytes, once per block.
- next
  
  public Cell next() throws IOException
  
  Description copied from interface: KeyValueScanner
  
  Return the next Cell in this scanner, iterating the scanner
  
  Specified by:
  
  next in interface KeyValueScanner
  
  Returns:
  
  the next Cell
  
  Throws:
  
  IOException
- next
  
  public boolean next(List<Cell> result, ScannerContext scannerContext) throws IOException
  
  Gets the next row of keys from the top-most scanner.
  This method takes care of updating the heap.
  This can ONLY be called when you are using Scanners that implement InternalScanner as well as KeyValueScanner (a StoreScanner).
  
  Specified by:
  
  next in interface InternalScanner
  
  Parameters:
  
  result - return output array
  
  Returns:
  
  true if more rows exist after this one, false if scanner is done
  
  Throws:
  
  IOException - e
- close
  
  public void close()
  
  Description copied from interface: KeyValueScanner
  
  Close the KeyValue scanner.
  
  Specified by:
  
  close in interface AutoCloseable
  
  Specified by:
  
  close in interface Closeable
  
  Specified by:
  
  close in interface InternalScanner
  
  Specified by:
  
  close in interface KeyValueScanner
- seek
  
  public boolean seek(Cell seekKey) throws IOException
  
  Seeks all scanners at or below the specified seek key. If we earlied-out of a row, we may end up skipping values that were never reached yet. Rather than iterating down, we want to give the opportunity to re-seek.
  As individual scanners may run past their ends, those scanners are automatically closed and removed from the heap.
  This function (and reseek(Cell)) does not do multi-column Bloom filter and lazy-seek optimizations. To enable those, call requestSeek(Cell, boolean, boolean).
  
  Specified by:
  
  seek in interface KeyValueScanner
  
  Parameters:
  
  seekKey - KeyValue to seek at or after
  
  Returns:
  
  true if KeyValues exist at or after specified key, false if not
  
  Throws:
  
  IOException
- reseek
  
  public boolean reseek(Cell seekKey) throws IOException
  
  This function is identical to the seek(Cell) function except that scanner.seek(seekKey) is changed to scanner.reseek(seekKey).
  
  Specified by:
  
  reseek in interface KeyValueScanner
  
  Parameters:
  
  seekKey - seek value (should be non-null)
  
  Returns:
  
  true if scanner has values left, false if end of scanner
  
  Throws:
  
  IOException
- requestSeek
  
  public boolean requestSeek(Cell key, boolean forward, boolean useBloom) throws IOException
  
  Similar to KeyValueScanner.seek(org.apache.hadoop.hbase.Cell) (or KeyValueScanner.reseek(org.apache.hadoop.hbase.Cell) if forward is true) but only does a seek operation after checking that it is really necessary for the row/column combination specified by the kv parameter. This function was added to avoid unnecessary disk seeks by checking row-column Bloom filters before a seek on multi-column get/scan queries, and to optimize by looking up more recent files first.
  
  Specified by:
  
  requestSeek in interface KeyValueScanner
  
  Overrides:
  
  requestSeek in class NonLazyKeyValueScanner
  
  forward - do a forward-only "reseek" instead of a random-access seek
  
  useBloom - whether to enable multi-column Bloom filter optimization
  
  Throws:
  
  IOException
- generalizedSeek
  
  private boolean generalizedSeek(boolean isLazy, Cell seekKey, boolean forward, boolean useBloom) throws IOException
  
  Parameters:
  
  isLazy - whether we are trying to seek to exactly the given row/col. Enables Bloom filter and most-recent-file-first optimizations for multi-column get/scan queries.
  
  seekKey - key to seek to
  
  forward - whether to seek forward (also known as reseek)
  
  useBloom - whether to optimize seeks using Bloom filters
  
  Throws:
  
  IOException
- pollRealKV
  
  protected KeyValueScanner pollRealKV() throws IOException
  
  Fetches the top sub-scanner from the priority queue, ensuring that a real seek has been done on it. Works by fetching the top sub-scanner, and if it has not done a real seek, making it do so (which will modify its top KV), putting it back, and repeating this until success. Relies on the fact that on a lazy seek we set the current key of a StoreFileScanner to a KV that is not greater than the real next KV to be read from that file, so the scanner that bubbles up to the top of the heap will have global next KV in this scanner heap if (1) it has done a real seek and (2) its KV is the top among all top KVs (some of which are fake) in the scanner heap.
  
  Throws:
  
  IOException
- getHeap
  
  public PriorityQueue<KeyValueScanner> getHeap()
  
  Returns the current Heap
- getCurrentForTesting
  
  KeyValueScanner getCurrentForTesting()
- getNextIndexedKey
  
  public Cell getNextIndexedKey()
  
  Specified by:
  
  getNextIndexedKey in interface KeyValueScanner
  
  Overrides:
  
  getNextIndexedKey in class NonLazyKeyValueScanner
  
  Returns:
  
  the next key in the index, usually the first key of next block OR a key that falls between last key of current block and first key of next block.. see HFileWriterImpl#getMidpoint, or null if not known.
- shipped
  
  public void shipped() throws IOException
  
  Description copied from interface: Shipper
  
  Called after a batch of rows scanned and set to be returned to client. Any in between cleanup can be done here.
  
  Specified by:
  
  shipped in interface Shipper
  
  Overrides:
  
  shipped in class NonLazyKeyValueScanner
  
  Throws:
  
  IOException

Class KeyValueHeap

Nested Class Summary

Field Summary

Fields inherited from interface org.apache.hadoop.hbase.regionserver.KeyValueScanner

Constructor Summary

Method Summary

Methods inherited from class org.apache.hadoop.hbase.regionserver.NonReversedNonLazyKeyValueScanner

Methods inherited from class org.apache.hadoop.hbase.regionserver.NonLazyKeyValueScanner

Methods inherited from class java.lang.Object

Methods inherited from interface org.apache.hadoop.hbase.regionserver.InternalScanner

Methods inherited from interface org.apache.hadoop.hbase.regionserver.KeyValueScanner

Field Details

LOG

heap

scannersForDelayedClose

current

comparator

Constructor Details

KeyValueHeap

KeyValueHeap

Method Details

peek

isLatestCellFromMemstore

recordBlockSize

next

next

close

seek

reseek

requestSeek

generalizedSeek

pollRealKV

getHeap

getCurrentForTesting

getNextIndexedKey

shipped