Class StoreFileScanner
java.lang.Object
org.apache.hadoop.hbase.regionserver.StoreFileScanner
- All Implemented Interfaces:
Closeable
,AutoCloseable
,KeyValueScanner
,Shipper
@LimitedPrivate("Phoenix")
@Evolving
public class StoreFileScanner
extends Object
implements KeyValueScanner
KeyValueScanner adaptor over the Reader. It also provides hooks into bloom filter things.
-
Field Summary
Modifier and TypeFieldDescriptionprivate final boolean
private boolean
private Cell
private boolean
private Cell
private final boolean
private final boolean
private final HFileScanner
private final boolean
private Cell
private final StoreFileReader
private final long
private boolean
private final long
private static LongAdder
private boolean
Fields inherited from interface org.apache.hadoop.hbase.regionserver.KeyValueScanner
NO_NEXT_INDEXED_KEY
-
Constructor Summary
ConstructorDescriptionStoreFileScanner
(StoreFileReader reader, HFileScanner hfs, boolean useMVCC, boolean hasMVCC, long readPt, long scannerOrder, boolean canOptimizeForNonNullColumn, boolean isFastSeekingEncoding) Implements aKeyValueScanner
on top of the specifiedHFileScanner
-
Method Summary
Modifier and TypeMethodDescriptionboolean
backwardSeek
(Cell key) Seek the scanner at or before the row of specified Cell, it firstly tries to seek the scanner at or after the specified Cell, return if peek KeyValue of scanner has the same row with specified Cell, otherwise seek the scanner at the first Cell of the row which is the previous row of specified KeyValuevoid
close()
Close the KeyValue scanner.void
Does the real seek operation in case it was skipped by seekToRowCol(KeyValue, boolean) (TODO: Whats this?).(package private) CellComparator
org.apache.hadoop.fs.Path
(package private) StoreFileReader
long
Get the order of this KeyValueScanner.static List<StoreFileScanner>
getScannersForCompaction
(Collection<HStoreFile> files, boolean canUseDropBehind, long readPt) Get scanners for compaction.static List<StoreFileScanner>
getScannersForStoreFiles
(Collection<HStoreFile> files, boolean cacheBlocks, boolean usePread, boolean isCompaction, boolean useDropBehind, long readPt) Return an array of scanners corresponding to the given set of store files.static List<StoreFileScanner>
getScannersForStoreFiles
(Collection<HStoreFile> files, boolean cacheBlocks, boolean usePread, boolean isCompaction, boolean canUseDrop, ScanQueryMatcher matcher, long readPt) Return an array of scanners corresponding to the given set of store files, And set the ScanQueryMatcher for each store file scanner for further optimization(package private) static final long
(package private) static final void
boolean
Returns true if this is a file scanner.private boolean
next()
Return the next Cell in this scanner, iterating the scannerpeek()
Look at the next Cell in this scanner, but do not iterate scanner.boolean
We optimize our store scanners by checking the most recent store file first, so we sometimes pretend we have done a seek but delay it until the store scanner bubbles up to the top of the key-value heap.void
recordBlockSize
(IntConsumer blockSizeConsumer) Record the size of the current block in bytes, passing as an argument to the blockSizeConsumer.boolean
requestSeek
(Cell kv, boolean forward, boolean useBloom) Pretend we have done a seek but don't do it yet, if possible.boolean
Reseek the scanner at or after the specified KeyValue.private boolean
reseekAtOrAfter
(Cell seekKey) (package private) static boolean
reseekAtOrAfter
(HFileScanner s, Cell k) boolean
Seek the scanner at or after the specified KeyValue.private boolean
seekAtOrAfter
(Cell seekKey) static boolean
seekAtOrAfter
(HFileScanner s, Cell k) Returns false if not found or if k is after the end.private boolean
seekBefore
(Cell seekKey) private void
seekBeforeAndSaveKeyToPreviousRow
(Cell seekKey) Seeks before the seek target cell and saves the location topreviousRow
.boolean
Seek the scanner at the first KeyValue of last rowboolean
seekToPreviousRow
(Cell originalKey) Seek the scanner at the first Cell of the row which is the previous row of specified keyprivate boolean
seekToPreviousRowStateless
(Cell originalKey) This variant of theseekToPreviousRow(Cell)
method requires two seeks.private boolean
This variant of theseekToPreviousRow(Cell)
method requires one seek and one reseek.private boolean
seekToPreviousRowWithoutHint
(Cell originalKey) This variant of theseekToPreviousRow(Cell)
method requires two seeks and one reseek.protected void
setCurrentCell
(Cell newVal) void
shipped()
Called after a batch of rows scanned and set to be returned to client.boolean
shouldUseScanner
(Scan scan, HStore store, long oldestUnexpiredTS) Allows to filter out scanners (both StoreFile and memstore) that we don't want to use based on criteria such as Bloom filters and timestamp ranges.protected boolean
private boolean
toString()
-
Field Details
-
reader
-
hfs
-
cur
-
closed
-
realSeekDone
-
delayedReseek
-
delayedSeekKV
-
enforceMVCC
-
hasMVCCInfo
-
stopSkippingKVsIfNextRow
-
previousRow
-
isFastSeekingEncoding
-
seekCount
-
canOptimizeForNonNullColumn
-
readPt
-
scannerOrder
-
-
Constructor Details
-
StoreFileScanner
public StoreFileScanner(StoreFileReader reader, HFileScanner hfs, boolean useMVCC, boolean hasMVCC, long readPt, long scannerOrder, boolean canOptimizeForNonNullColumn, boolean isFastSeekingEncoding) Implements aKeyValueScanner
on top of the specifiedHFileScanner
- Parameters:
useMVCC
- If true, scanner will filter out updates with MVCC larger thanreadPt
.readPt
- MVCC value to use to filter out the updates newer than this scanner.hasMVCC
- Set to true if underlying store file reader has MVCC info.scannerOrder
- Order of the scanner relative to other scanners. SeeKeyValueScanner.getScannerOrder()
.canOptimizeForNonNullColumn
-true
if we can make sure there is no null column, otherwisefalse
. This is a hint for optimization.isFastSeekingEncoding
-true
if the data block encoding can seek quickly from the beginning of a block (i.e. RIV1), otherwisefalse
. This is a hint for optimization.
-
-
Method Details
-
getScannersForStoreFiles
public static List<StoreFileScanner> getScannersForStoreFiles(Collection<HStoreFile> files, boolean cacheBlocks, boolean usePread, boolean isCompaction, boolean useDropBehind, long readPt) throws IOException Return an array of scanners corresponding to the given set of store files.- Throws:
IOException
-
getScannersForStoreFiles
public static List<StoreFileScanner> getScannersForStoreFiles(Collection<HStoreFile> files, boolean cacheBlocks, boolean usePread, boolean isCompaction, boolean canUseDrop, ScanQueryMatcher matcher, long readPt) throws IOException Return an array of scanners corresponding to the given set of store files, And set the ScanQueryMatcher for each store file scanner for further optimization- Throws:
IOException
-
getScannersForCompaction
public static List<StoreFileScanner> getScannersForCompaction(Collection<HStoreFile> files, boolean canUseDropBehind, long readPt) throws IOException Get scanners for compaction. We will create a separated reader for each store file to avoid contention with normal read request.- Throws:
IOException
-
toString
-
peek
Description copied from interface:KeyValueScanner
Look at the next Cell in this scanner, but do not iterate scanner. NOTICE: The returned cell has not been passed into ScanQueryMatcher. So it may not be what the user need.- Specified by:
peek
in interfaceKeyValueScanner
- Returns:
- the next Cell
-
next
Description copied from interface:KeyValueScanner
Return the next Cell in this scanner, iterating the scanner- Specified by:
next
in interfaceKeyValueScanner
- Returns:
- the next Cell
- Throws:
IOException
-
seek
Description copied from interface:KeyValueScanner
Seek the scanner at or after the specified KeyValue.- Specified by:
seek
in interfaceKeyValueScanner
- Parameters:
key
- seek value- Returns:
- true if scanner has values left, false if end of scanner
- Throws:
IOException
-
reseek
Description copied from interface:KeyValueScanner
Reseek the scanner at or after the specified KeyValue. This method is guaranteed to seek at or after the required key only if the key comes after the current position of the scanner. Should not be used to seek to a key which may come before the current position.- Specified by:
reseek
in interfaceKeyValueScanner
- Parameters:
key
- seek value (should be non-null)- Returns:
- true if scanner has values left, false if end of scanner
- Throws:
IOException
-
setCurrentCell
- Throws:
IOException
-
skipKVsNewerThanReadpoint
- Throws:
IOException
-
close
Description copied from interface:KeyValueScanner
Close the KeyValue scanner.- Specified by:
close
in interfaceAutoCloseable
- Specified by:
close
in interfaceCloseable
- Specified by:
close
in interfaceKeyValueScanner
-
seekAtOrAfter
Returns false if not found or if k is after the end.- Throws:
IOException
-
reseekAtOrAfter
- Throws:
IOException
-
getScannerOrder
Description copied from interface:KeyValueScanner
Get the order of this KeyValueScanner. This is only relevant for StoreFileScanners. This is required for comparing multiple files to find out which one has the latest data. StoreFileScanners are ordered from 0 (oldest) to newest in increasing order.- Specified by:
getScannerOrder
in interfaceKeyValueScanner
- See Also:
-
requestSeek
Pretend we have done a seek but don't do it yet, if possible. The hope is that we find requested columns in more recent files and won't have to seek in older files. Creates a fake key/value with the given row/column and the highest (most recent) possible timestamp we might get from this file. When users of such "lazy scanner" need to know the next KV precisely (e.g. when this scanner is at the top of the heap), they runenforceSeek()
.Note that this function does guarantee that the current KV of this scanner will be advanced to at least the given KV. Because of this, it does have to do a real seek in cases when the seek timestamp is older than the highest timestamp of the file, e.g. when we are trying to seek to the next row/column and use OLDEST_TIMESTAMP in the seek key.
- Specified by:
requestSeek
in interfaceKeyValueScanner
forward
- do a forward-only "reseek" instead of a random-access seekuseBloom
- whether to enable multi-column Bloom filter optimization- Throws:
IOException
-
getReader
-
getComparator
-
realSeekDone
Description copied from interface:KeyValueScanner
We optimize our store scanners by checking the most recent store file first, so we sometimes pretend we have done a seek but delay it until the store scanner bubbles up to the top of the key-value heap. This method is then used to ensure the top store file scanner has done a seek operation.- Specified by:
realSeekDone
in interfaceKeyValueScanner
-
enforceSeek
Description copied from interface:KeyValueScanner
Does the real seek operation in case it was skipped by seekToRowCol(KeyValue, boolean) (TODO: Whats this?). Note that this function should be never called on scanners that always do real seek operations (i.e. most of the scanners). The easiest way to achieve this is to callKeyValueScanner.realSeekDone()
first.- Specified by:
enforceSeek
in interfaceKeyValueScanner
- Throws:
IOException
-
isFileScanner
Description copied from interface:KeyValueScanner
Returns true if this is a file scanner. Otherwise a memory scanner is assumed.- Specified by:
isFileScanner
in interfaceKeyValueScanner
-
recordBlockSize
Description copied from interface:KeyValueScanner
Record the size of the current block in bytes, passing as an argument to the blockSizeConsumer. Implementations should ensure that blockSizeConsumer is only called once per block.- Specified by:
recordBlockSize
in interfaceKeyValueScanner
- Parameters:
blockSizeConsumer
- to be called with block size in bytes, once per block.
-
getFilePath
- Specified by:
getFilePath
in interfaceKeyValueScanner
- Returns:
- the file path if this is a file scanner, otherwise null.
- See Also:
-
getSeekCount
-
instrument
-
shouldUseScanner
Description copied from interface:KeyValueScanner
Allows to filter out scanners (both StoreFile and memstore) that we don't want to use based on criteria such as Bloom filters and timestamp ranges.- Specified by:
shouldUseScanner
in interfaceKeyValueScanner
- Parameters:
scan
- the scan that we are selecting scanners forstore
- the store we are performing the scan on.oldestUnexpiredTS
- the oldest timestamp we are interested in for this query, based on TTL- Returns:
- true if the scanner should be included in the query
-
seekToPreviousRow
Description copied from interface:KeyValueScanner
Seek the scanner at the first Cell of the row which is the previous row of specified key- Specified by:
seekToPreviousRow
in interfaceKeyValueScanner
- Parameters:
originalKey
- seek value- Returns:
- true if the scanner at the first valid Cell of previous row, false if not existing such Cell
- Throws:
IOException
-
seekToPreviousRowWithHint
This variant of theseekToPreviousRow(Cell)
method requires one seek and one reseek. This method maintains state inpreviousRow
which only makes sense in the context of a sequential row-by-row reverse scan.previousRow
should be reset if that is not the case. The reasoning for why this method is faster thanseekToPreviousRowStateless(Cell)
is that seeks are slower as they need to start from the beginning of the file, while reseeks go forward from the current position.- Throws:
IOException
-
seekToPreviousRowWithoutHint
This variant of theseekToPreviousRow(Cell)
method requires two seeks and one reseek. The extra expense/seek is with the intent of speeding up subsequent calls by using theseekToPreviousRowWithHint()
which this method seeds the state for by settingpreviousRow
- Throws:
IOException
-
seekToPreviousRowStateless
This variant of theseekToPreviousRow(Cell)
method requires two seeks. It should be used if the cost for seeking is lower i.e. when using a fast seeking data block encoding like RIV1.- Throws:
IOException
-
seekBefore
- Throws:
IOException
-
seekBeforeAndSaveKeyToPreviousRow
Seeks before the seek target cell and saves the location topreviousRow
. If there doesn't exist a KV in this file before the seek target cell, reposition the scanner at the beginning of the storefile (in preparation to a reseek at or after the seek key) and set thepreviousRow
to null. IfpreviousRow
is ever non-null and then transitions to being null again via this method, that's because there doesn't exist a row before the seek target in the storefile (i.e. we're at the beginning of the storefile)- Throws:
IOException
-
seekAtOrAfter
- Throws:
IOException
-
reseekAtOrAfter
- Throws:
IOException
-
isStillAtSeekTargetAfterSkippingNewerKvs
- Throws:
IOException
-
skipKvsNewerThanReadpointReversed
- Throws:
IOException
-
seekToLastRow
Description copied from interface:KeyValueScanner
Seek the scanner at the first KeyValue of last row- Specified by:
seekToLastRow
in interfaceKeyValueScanner
- Returns:
- true if scanner has values left, false if the underlying data is empty
- Throws:
IOException
-
backwardSeek
Description copied from interface:KeyValueScanner
Seek the scanner at or before the row of specified Cell, it firstly tries to seek the scanner at or after the specified Cell, return if peek KeyValue of scanner has the same row with specified Cell, otherwise seek the scanner at the first Cell of the row which is the previous row of specified KeyValue- Specified by:
backwardSeek
in interfaceKeyValueScanner
- Parameters:
key
- seek KeyValue- Returns:
- true if the scanner is at the valid KeyValue, false if such KeyValue does not exist
- Throws:
IOException
-
getNextIndexedKey
- Specified by:
getNextIndexedKey
in interfaceKeyValueScanner
- Returns:
- the next key in the index, usually the first key of next block OR a key that falls between last key of current block and first key of next block.. see HFileWriterImpl#getMidpoint, or null if not known.
-
shipped
Description copied from interface:Shipper
Called after a batch of rows scanned and set to be returned to client. Any in between cleanup can be done here.- Specified by:
shipped
in interfaceShipper
- Throws:
IOException
-