Package org.apache.hadoop.hbase.io.hfile
Class HFileWriterImpl
java.lang.Object
org.apache.hadoop.hbase.io.hfile.HFileWriterImpl
- All Implemented Interfaces:
Closeable
,AutoCloseable
,HFile.Writer
,CellSink
,ShipperListener
Common functionality needed by all versions of
HFile
writers.-
Field Summary
Modifier and TypeFieldDescriptionprivate List<HFileBlock.BlockWritable>
Additional data items to be written to the "load-on-open" section.protected final HFileDataBlockEncoder
The data block encoding which will be used.protected HFileBlock.Writer
block writerprotected final CacheConfig
Cache configuration for caching data on write.protected final boolean
True if we opened theoutputStream
(and so will close it).private HFileBlockIndex.BlockIndexWriter
private final int
Block size limit after encoding, used to unify encoded block Cache entry sizeprotected long
Total # of key/value entries, i.e.protected HFileInfo
A "file info" block: a key-value map of file-wide metadata.protected ExtendedCell
First cell in a block.private long
The offset of the first data block or -1 if the file is empty.protected final HFileContext
protected final HFileIndexBlockEncoder
private List<InlineBlockWriter>
Inline block writers for multi-level block index and compound Blooms.static final int
Version for KeyValue which includes memstore timestampstatic final byte[]
KeyValue version in FileInfoprotected byte[]
Key of the biggest cell.protected ExtendedCell
The Cell previously appended.private ExtendedCell
The last(stop) Cell of the previous data block.protected long
The offset of the last data block or 0 if the file is empty.protected long
Len of the biggest cell.private static final org.slf4j.Logger
protected long
private int
private HFileBlockIndex.BlockIndexWriter
protected List<org.apache.hadoop.io.Writable>
Writable
s representing meta block data.protected List<byte[]>
Meta block names.protected final String
Name for this object used when logging or in toString.protected org.apache.hadoop.fs.FSDataOutputStream
FileSystem stream to write into.protected final org.apache.hadoop.fs.Path
May be null if we were passed a stream.protected long
Used for calculating the average key length.protected long
Total uncompressed bytes, maybe calculate a compression ratio later.protected long
Used for calculating the average value length.static final String
if this feature is enabled, preCalculate encoded data size before real encoding happensprivate static final long
Fields inherited from interface org.apache.hadoop.hbase.io.hfile.HFile.Writer
MAX_MEMSTORE_TS_KEY
-
Constructor Summary
ConstructorDescriptionHFileWriterImpl
(org.apache.hadoop.conf.Configuration conf, CacheConfig cacheConf, org.apache.hadoop.fs.Path path, org.apache.hadoop.fs.FSDataOutputStream outputStream, HFileContext fileContext) -
Method Summary
Modifier and TypeMethodDescriptionprivate void
addBloomFilter
(BloomFilterWriter bfw, BlockType blockType) void
Store delete family Bloom filter in the file, which is only supported in HFile V2.void
Store general Bloom filter in the file.void
Adds an inline block writer such as a multi-level block index writer or a compound Bloom filter writer.void
append
(ExtendedCell cell) Add key/value to file.void
appendFileInfo
(byte[] k, byte[] v) Add to the file info.void
appendMetaBlock
(String metaBlockName, org.apache.hadoop.io.Writable content) Add a meta block to the end of the file.void
The action that needs to be performed beforeShipper.shipped()
is performedprivate BlockCacheKey
buildCacheBlockKey
(long offset, BlockType blockType) protected void
At a block boundary, write all the inline blocks and opens new block.protected boolean
Checks that the given Cell's key does not violate the key order.protected void
checkValue
(byte[] value, int offset, int length) Checks the given value for validity.void
close()
static Compression.Algorithm
compressionByName
(String algoName) protected static org.apache.hadoop.fs.FSDataOutputStream
createOutputStream
(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path path, InetSocketAddress[] favoredNodes) A helper method to create HFile output streams in constructorsprivate void
doCacheOnWrite
(long offset) Caches the last written HFile block.private void
Clean up the data block that is currently being written.protected void
finishClose
(FixedFileTrailer trailer) protected void
protected void
finishInit
(org.apache.hadoop.conf.Configuration conf) Additional initialization stepsReturn the file context for the HFile this writer belongs toprivate String
getLexicalErrorMessage
(Cell cell) protected int
static ExtendedCell
getMidpoint
(CellComparator comparator, ExtendedCell left, ExtendedCell right) Try to return a Cell that falls betweenleft
andright
but that is shorter; i.e.private static byte[]
getMinimumMidpointArray
(byte[] leftArray, int leftOffset, int leftLength, byte[] rightArray, int rightOffset, int rightLength) Try to get a byte array that falls between left and right as short as possible with lexicographical order;private static byte[]
getMinimumMidpointArray
(ByteBuffer left, int leftOffset, int leftLength, ByteBuffer right, int rightOffset, int rightLength) Try to create a new byte array that falls between left and right as short as possible with lexicographical order.protected int
org.apache.hadoop.fs.Path
getPath()
Returns Path or null if we were passed a stream rather than a Path.long
getPos()
protected void
newBlock()
Ready a new block for writing.toString()
protected final void
writeFileInfo
(FixedFileTrailer trailer, DataOutputStream out) Sets the file info offset in the trailer, finishes up populating fields in the file info, and writes the file info into the given data output.private void
writeInlineBlocks
(boolean closing) Gives inline block writers an opportunity to contribute blocks.
-
Field Details
-
LOG
-
UNSET
- See Also:
-
UNIFIED_ENCODED_BLOCKSIZE_RATIO
if this feature is enabled, preCalculate encoded data size before real encoding happens- See Also:
-
encodedBlockSizeLimit
Block size limit after encoding, used to unify encoded block Cache entry size -
lastCell
The Cell previously appended. Becomes the last cell in the file. -
outputStream
FileSystem stream to write into. -
closeOutputStream
True if we opened theoutputStream
(and so will close it). -
fileInfo
A "file info" block: a key-value map of file-wide metadata. -
entryCount
Total # of key/value entries, i.e. how many times add() was called. -
totalKeyLength
Used for calculating the average key length. -
totalValueLength
Used for calculating the average value length. -
lenOfBiggestCell
Len of the biggest cell. -
keyOfBiggestCell
Key of the biggest cell. -
totalUncompressedBytes
Total uncompressed bytes, maybe calculate a compression ratio later. -
metaNames
Meta block names. -
metaData
Writable
s representing meta block data. -
firstCellInBlock
First cell in a block. This reference should be short-lived since we write hfiles in a burst. -
path
May be null if we were passed a stream. -
cacheConf
Cache configuration for caching data on write. -
name
Name for this object used when logging or in toString. Is either the result of a toString on stream or else name of passed file Path. -
blockEncoder
The data block encoding which will be used.NoOpDataBlockEncoder.INSTANCE
if there is no encoding. -
indexBlockEncoder
-
hFileContext
-
maxTagsLength
-
KEY_VALUE_VERSION
KeyValue version in FileInfo -
KEY_VALUE_VER_WITH_MEMSTORE
Version for KeyValue which includes memstore timestamp- See Also:
-
inlineBlockWriters
Inline block writers for multi-level block index and compound Blooms. -
blockWriter
block writer -
dataBlockIndexWriter
-
metaBlockIndexWriter
-
firstDataBlockOffset
The offset of the first data block or -1 if the file is empty. -
lastDataBlockOffset
The offset of the last data block or 0 if the file is empty. -
lastCellOfPreviousBlock
The last(stop) Cell of the previous data block. This reference should be short-lived since we write hfiles in a burst. -
additionalLoadOnOpenData
Additional data items to be written to the "load-on-open" section. -
maxMemstoreTS
-
-
Constructor Details
-
HFileWriterImpl
public HFileWriterImpl(org.apache.hadoop.conf.Configuration conf, CacheConfig cacheConf, org.apache.hadoop.fs.Path path, org.apache.hadoop.fs.FSDataOutputStream outputStream, HFileContext fileContext)
-
-
Method Details
-
appendFileInfo
Add to the file info. All added key/value pairs can be obtained usingHFile.Reader.getHFileInfo()
.- Specified by:
appendFileInfo
in interfaceHFile.Writer
- Parameters:
k
- Keyv
- Value- Throws:
IOException
- in case the key or the value are invalid
-
writeFileInfo
protected final void writeFileInfo(FixedFileTrailer trailer, DataOutputStream out) throws IOException Sets the file info offset in the trailer, finishes up populating fields in the file info, and writes the file info into the given data output. The reason the data output is not alwaysoutputStream
is that we store file info as a block in version 2.- Parameters:
trailer
- fixed file trailerout
- the data output to write the file info to- Throws:
IOException
-
getPos
- Throws:
IOException
-
checkKey
Checks that the given Cell's key does not violate the key order.- Parameters:
cell
- Cell whose key to check.- Returns:
- true if the key is duplicate
- Throws:
IOException
- if the key or the key order is wrong
-
getLexicalErrorMessage
-
checkValue
Checks the given value for validity.- Throws:
IOException
-
getPath
Returns Path or null if we were passed a stream rather than a Path.- Specified by:
getPath
in interfaceHFile.Writer
-
toString
-
compressionByName
-
createOutputStream
protected static org.apache.hadoop.fs.FSDataOutputStream createOutputStream(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path path, InetSocketAddress[] favoredNodes) throws IOException A helper method to create HFile output streams in constructors- Throws:
IOException
-
finishInit
Additional initialization steps -
checkBlockBoundary
At a block boundary, write all the inline blocks and opens new block.- Throws:
IOException
-
finishBlock
Clean up the data block that is currently being written.- Throws:
IOException
-
getMidpoint
public static ExtendedCell getMidpoint(CellComparator comparator, ExtendedCell left, ExtendedCell right) Try to return a Cell that falls betweenleft
andright
but that is shorter; i.e. takes up less space. This trick is used building HFile block index. Its an optimization. It does not always work. In this case we'll just return theright
cell.- Returns:
- A cell that sorts between
left
andright
.
-
getMinimumMidpointArray
private static byte[] getMinimumMidpointArray(byte[] leftArray, int leftOffset, int leftLength, byte[] rightArray, int rightOffset, int rightLength) Try to get a byte array that falls between left and right as short as possible with lexicographical order;- Returns:
- Return a new array that is between left and right and minimally sized else just return null if left == right.
-
getMinimumMidpointArray
private static byte[] getMinimumMidpointArray(ByteBuffer left, int leftOffset, int leftLength, ByteBuffer right, int rightOffset, int rightLength) Try to create a new byte array that falls between left and right as short as possible with lexicographical order.- Returns:
- Return a new array that is between left and right and minimally sized else just return null if left == right.
-
writeInlineBlocks
Gives inline block writers an opportunity to contribute blocks.- Throws:
IOException
-
doCacheOnWrite
Caches the last written HFile block.- Parameters:
offset
- the offset of the block we want to cache. Used to determine the cache key.
-
buildCacheBlockKey
-
newBlock
Ready a new block for writing.- Throws:
IOException
-
appendMetaBlock
Add a meta block to the end of the file. Call before close(). Metadata blocks are expensive. Fill one with a bunch of serialized data rather than do a metadata block per metadata instance. If metadata is small, consider adding to file info usingappendFileInfo(byte[], byte[])
name of the block will call readFields to get data later (DO NOT REUSE)- Specified by:
appendMetaBlock
in interfaceHFile.Writer
-
close
- Specified by:
close
in interfaceAutoCloseable
- Specified by:
close
in interfaceCloseable
- Throws:
IOException
-
addInlineBlockWriter
Description copied from interface:HFile.Writer
Adds an inline block writer such as a multi-level block index writer or a compound Bloom filter writer.- Specified by:
addInlineBlockWriter
in interfaceHFile.Writer
-
addGeneralBloomFilter
Description copied from interface:HFile.Writer
Store general Bloom filter in the file. This does not deal with Bloom filter internals but is necessary, since Bloom filters are stored differently in HFile version 1 and version 2.- Specified by:
addGeneralBloomFilter
in interfaceHFile.Writer
-
addDeleteFamilyBloomFilter
Description copied from interface:HFile.Writer
Store delete family Bloom filter in the file, which is only supported in HFile V2.- Specified by:
addDeleteFamilyBloomFilter
in interfaceHFile.Writer
-
addBloomFilter
-
getFileContext
Description copied from interface:HFile.Writer
Return the file context for the HFile this writer belongs to- Specified by:
getFileContext
in interfaceHFile.Writer
-
append
Add key/value to file. Keys must be added in an order that agrees with the Comparator passed on construction. Cell to add. Cannot be empty nor null.- Specified by:
append
in interfaceCellSink
- Parameters:
cell
- the cell to be added- Throws:
IOException
-
beforeShipped
Description copied from interface:ShipperListener
The action that needs to be performed beforeShipper.shipped()
is performed- Specified by:
beforeShipped
in interfaceShipperListener
- Throws:
IOException
-
getLastCell
-
finishFileInfo
- Throws:
IOException
-
getMajorVersion
-
getMinorVersion
-
finishClose
- Throws:
IOException
-