Class KeyValueHeap

All Implemented Interfaces:
Closeable, AutoCloseable, InternalScanner, KeyValueScanner, Shipper
Direct Known Subclasses:
ReversedKeyValueHeap

Implements a heap merge across any number of KeyValueScanners.

Implements KeyValueScanner itself.

This class is used at the Region level to merge across Stores and at the Store level to merge across the memstore and StoreFiles.

In the Region case, we also need InternalScanner.next(List), so this class also implements InternalScanner. WARNING: As is, if you try to use this as an InternalScanner at the Store level, you will get runtime exceptions.

  • Field Details

    • LOG

      private static final org.slf4j.Logger LOG
    • heap

    • scannersForDelayedClose

    • current

      The current sub-scanner, i.e. the one that contains the next key/value to return to the client. This scanner is NOT included in heap (but we frequently add it back to the heap and pull the new winner out). We maintain an invariant that the current sub-scanner has already done a real seek, and that current.peek() is always a real key/value (or null) except for the fake last-key-on-row-column supplied by the multi-column Bloom filter optimization, which is OK to propagate to StoreScanner. In order to ensure that, always use pollRealKV() to update current.
    • comparator

  • Constructor Details

  • Method Details

    • peek

      public Cell peek()
      Description copied from interface: KeyValueScanner
      Look at the next Cell in this scanner, but do not iterate scanner. NOTICE: The returned cell has not been passed into ScanQueryMatcher. So it may not be what the user need.
      Specified by:
      peek in interface KeyValueScanner
      Returns:
      the next Cell
    • isLatestCellFromMemstore

    • recordBlockSize

      public void recordBlockSize(IntConsumer blockSizeConsumer)
      Description copied from interface: KeyValueScanner
      Record the size of the current block in bytes, passing as an argument to the blockSizeConsumer. Implementations should ensure that blockSizeConsumer is only called once per block.
      Specified by:
      recordBlockSize in interface KeyValueScanner
      Overrides:
      recordBlockSize in class NonLazyKeyValueScanner
      Parameters:
      blockSizeConsumer - to be called with block size in bytes, once per block.
    • next

      public Cell next() throws IOException
      Description copied from interface: KeyValueScanner
      Return the next Cell in this scanner, iterating the scanner
      Specified by:
      next in interface KeyValueScanner
      Returns:
      the next Cell
      Throws:
      IOException
    • next

      public boolean next(List<Cell> result, ScannerContext scannerContext) throws IOException
      Gets the next row of keys from the top-most scanner.

      This method takes care of updating the heap.

      This can ONLY be called when you are using Scanners that implement InternalScanner as well as KeyValueScanner (a StoreScanner).

      Specified by:
      next in interface InternalScanner
      Parameters:
      result - return output array
      Returns:
      true if more rows exist after this one, false if scanner is done
      Throws:
      IOException - e
    • close

      public void close()
      Description copied from interface: KeyValueScanner
      Close the KeyValue scanner.
      Specified by:
      close in interface AutoCloseable
      Specified by:
      close in interface Closeable
      Specified by:
      close in interface InternalScanner
      Specified by:
      close in interface KeyValueScanner
    • seek

      public boolean seek(Cell seekKey) throws IOException
      Seeks all scanners at or below the specified seek key. If we earlied-out of a row, we may end up skipping values that were never reached yet. Rather than iterating down, we want to give the opportunity to re-seek.

      As individual scanners may run past their ends, those scanners are automatically closed and removed from the heap.

      This function (and reseek(Cell)) does not do multi-column Bloom filter and lazy-seek optimizations. To enable those, call requestSeek(Cell, boolean, boolean).

      Specified by:
      seek in interface KeyValueScanner
      Parameters:
      seekKey - KeyValue to seek at or after
      Returns:
      true if KeyValues exist at or after specified key, false if not
      Throws:
      IOException
    • reseek

      public boolean reseek(Cell seekKey) throws IOException
      This function is identical to the seek(Cell) function except that scanner.seek(seekKey) is changed to scanner.reseek(seekKey).
      Specified by:
      reseek in interface KeyValueScanner
      Parameters:
      seekKey - seek value (should be non-null)
      Returns:
      true if scanner has values left, false if end of scanner
      Throws:
      IOException
    • requestSeek

      public boolean requestSeek(Cell key, boolean forward, boolean useBloom) throws IOException
      Similar to KeyValueScanner.seek(org.apache.hadoop.hbase.Cell) (or KeyValueScanner.reseek(org.apache.hadoop.hbase.Cell) if forward is true) but only does a seek operation after checking that it is really necessary for the row/column combination specified by the kv parameter. This function was added to avoid unnecessary disk seeks by checking row-column Bloom filters before a seek on multi-column get/scan queries, and to optimize by looking up more recent files first.
      Specified by:
      requestSeek in interface KeyValueScanner
      Overrides:
      requestSeek in class NonLazyKeyValueScanner
      forward - do a forward-only "reseek" instead of a random-access seek
      useBloom - whether to enable multi-column Bloom filter optimization
      Throws:
      IOException
    • generalizedSeek

      private boolean generalizedSeek(boolean isLazy, Cell seekKey, boolean forward, boolean useBloom) throws IOException
      Parameters:
      isLazy - whether we are trying to seek to exactly the given row/col. Enables Bloom filter and most-recent-file-first optimizations for multi-column get/scan queries.
      seekKey - key to seek to
      forward - whether to seek forward (also known as reseek)
      useBloom - whether to optimize seeks using Bloom filters
      Throws:
      IOException
    • pollRealKV

      Fetches the top sub-scanner from the priority queue, ensuring that a real seek has been done on it. Works by fetching the top sub-scanner, and if it has not done a real seek, making it do so (which will modify its top KV), putting it back, and repeating this until success. Relies on the fact that on a lazy seek we set the current key of a StoreFileScanner to a KV that is not greater than the real next KV to be read from that file, so the scanner that bubbles up to the top of the heap will have global next KV in this scanner heap if (1) it has done a real seek and (2) its KV is the top among all top KVs (some of which are fake) in the scanner heap.
      Throws:
      IOException
    • getHeap

      Returns the current Heap
    • getCurrentForTesting

    • getNextIndexedKey

      Specified by:
      getNextIndexedKey in interface KeyValueScanner
      Overrides:
      getNextIndexedKey in class NonLazyKeyValueScanner
      Returns:
      the next key in the index, usually the first key of next block OR a key that falls between last key of current block and first key of next block.. see HFileWriterImpl#getMidpoint, or null if not known.
    • shipped

      public void shipped() throws IOException
      Description copied from interface: Shipper
      Called after a batch of rows scanned and set to be returned to client. Any in between cleanup can be done here.
      Specified by:
      shipped in interface Shipper
      Overrides:
      shipped in class NonLazyKeyValueScanner
      Throws:
      IOException