java.lang.Object

org.apache.hadoop.hbase.io.hfile.HFileBlock.Writer

All Implemented Interfaces:: ShipperListener

Enclosing class:: HFileBlock

static class HFileBlock.Writer extends Object implements ShipperListener

Unified version 2 HFile block writer. The intended usage pattern is as follows:

Construct an HFileBlock.Writer, providing a compression algorithm.
Call startWriting(org.apache.hadoop.hbase.io.hfile.BlockType) and get a data stream to write to.
Write your data into the stream.
Call Writer#writeHeaderAndData(FSDataOutputStream) as many times as you need to. store the serialized block into an external stream.
Repeat to write more blocks.

Nested Class Summary

Nested Classes

Modifier and Type

Class

Description

private static enum

HFileBlock.Writer.State
Field Summary

Fields

Modifier and Type

Field

Description

private final ByteBuffAllocator

allocator

private ByteArrayOutputStream

baosInMemory

The stream we use to accumulate data into a block in an uncompressed format.

private BlockType

blockType

Current block type.

private BlockCompressedSizePredicator

compressedSizePredicator

private final HFileDataBlockEncoder

dataBlockEncoder

Data block encoder used for data blocks

private HFileBlockEncodingContext

dataBlockEncodingCtx

private HFileBlockDefaultEncodingContext

defaultBlockEncodingCtx

block encoding context for non-data blocks

private HFileContext

fileContext

Meta data that holds information about the hfileblock

private int

maxSizeUnCompressed

private ByteArrayOutputStream

onDiskBlockBytesWithHeader

Bytes to be written to the file system, including the header.

private byte[]

onDiskChecksum

The size of the checksum data on disk.

private long

prevOffset

The offset of the previous block of the same type

private long[]

prevOffsetByType

Offset of previous block by block type.

private long

startOffset

Current block's start offset in the HFile.

private HFileBlock.Writer.State

state

Writer state.

private DataOutputStream

userDataStream

A stream that we write uncompressed bytes to, which compresses them and writes them to baosInMemory.
Constructor Summary

Constructors

Constructor

Description

Writer(org.apache.hadoop.conf.Configuration conf, HFileDataBlockEncoder dataBlockEncoder, HFileContext fileContext)

Writer(org.apache.hadoop.conf.Configuration conf, HFileDataBlockEncoder dataBlockEncoder, HFileContext fileContext, ByteBuffAllocator allocator, int maxSizeUnCompressed)
Method Summary

Modifier and Type

Method

Description

void

beforeShipped()

The action that needs to be performed before Shipper.shipped() is performed

int

blockSizeWritten()

Returns the number of bytes written into the current block so far, or zero if not writing the block at the moment.

boolean

checkBoundariesWithPredicate()

private ByteBuff

cloneOnDiskBufferWithHeader()

Clones the header followed by the on-disk (compressed/encoded/encrypted) data.

(package private) ByteBuff

cloneUncompressedBufferWithHeader()

Clones the header followed by the uncompressed data, even if using compression.

int

encodedBlockSizeWritten()

Returns the number of bytes written into the current block so far, or zero if not writing the block at the moment.

(package private) void

ensureBlockReady()

Transitions the block writer from the "writing" state to the "block ready" state.

private void

expectState(HFileBlock.Writer.State expectedState)

private void

finishBlock()

Finish up writing of the block.

protected void

finishBlockAndWriteHeaderAndData(DataOutputStream out)

Writes the header and the compressed data of this block (or uncompressed data when not using compression) into the given stream.

(package private) HFileBlock

getBlockForCaching(CacheConfig cacheConf)

Creates a new HFileBlock.

(package private) EncodingState

getEncodingState()

(package private) byte[]

getHeaderAndDataForTest()

Returns the header or the compressed data (or uncompressed data when not using compression) as a byte array.

(package private) int

getOnDiskSizeWithHeader()

Returns the on-disk size of the block.

(package private) int

getOnDiskSizeWithoutHeader()

Returns the on-disk size of the data portion of the block.

int

getUncompressedSizeWithHeader()

The uncompressed size of the block data, including header size.

(package private) int

getUncompressedSizeWithoutHeader()

The uncompressed size of the block data.

(package private) boolean

isWriting()

Returns true if a block is being written

private void

putHeader(byte[] dest, int offset, int onDiskSize, int uncompressedSize, int onDiskDataSize)

Put the header into the given byte array at the given offset.

private void

putHeader(ByteArrayOutputStream dest, int onDiskSize, int uncompressedSize, int onDiskDataSize)

private void

putHeader(ByteBuff buff, int onDiskSize, int uncompressedSize, int onDiskDataSize)

(package private) void

release()

Releases resources used by this writer.

(package private) DataOutputStream

startWriting(BlockType newBlockType)

Starts writing into the block.

(package private) void

write(Cell cell)

Writes the Cell to this block

(package private) void

writeBlock(HFileBlock.BlockWritable bw, org.apache.hadoop.fs.FSDataOutputStream out)

Takes the given HFileBlock.BlockWritable instance, creates a new block of its appropriate type, writes the writable into this block, and flushes the block into the output stream.

(package private) void

writeHeaderAndData(org.apache.hadoop.fs.FSDataOutputStream out)

Similar to writeHeaderAndData(FSDataOutputStream), but records the offset of this block so that it can be referenced in the next block of the same type.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Details
- maxSizeUnCompressed
  
  private int maxSizeUnCompressed
- compressedSizePredicator
  
  private BlockCompressedSizePredicator compressedSizePredicator
- state
  
  private HFileBlock.Writer.State state
  
  Writer state. Used to ensure the correct usage protocol.
- dataBlockEncoder
  
  private final HFileDataBlockEncoder dataBlockEncoder
  
  Data block encoder used for data blocks
- dataBlockEncodingCtx
  
  private HFileBlockEncodingContext dataBlockEncodingCtx
- defaultBlockEncodingCtx
  
  private HFileBlockDefaultEncodingContext defaultBlockEncodingCtx
  
  block encoding context for non-data blocks
- baosInMemory
  
  private ByteArrayOutputStream baosInMemory
  
  The stream we use to accumulate data into a block in an uncompressed format. We reset this stream at the end of each block and reuse it. The header is written as the first HConstants.HFILEBLOCK_HEADER_SIZE bytes into this stream.
- blockType
  
  private BlockType blockType
  
  Current block type. Set in startWriting(BlockType). Could be changed in finishBlock() from BlockType.DATA to BlockType.ENCODED_DATA.
- userDataStream
  
  private DataOutputStream userDataStream
  
  A stream that we write uncompressed bytes to, which compresses them and writes them to baosInMemory.
- onDiskBlockBytesWithHeader
  
  private ByteArrayOutputStream onDiskBlockBytesWithHeader
  
  Bytes to be written to the file system, including the header. Compressed if compression is turned on. It also includes the checksum data that immediately follows the block data. (header + data + checksums)
- onDiskChecksum
  
  private byte[] onDiskChecksum
  
  The size of the checksum data on disk. It is used only if data is not compressed. If data is compressed, then the checksums are already part of onDiskBytesWithHeader. If data is uncompressed, then this variable stores the checksum data for this block.
- startOffset
  
  private long startOffset
  
  Current block's start offset in the HFile. Set in writeHeaderAndData(FSDataOutputStream).
- prevOffsetByType
  
  private long[] prevOffsetByType
  
  Offset of previous block by block type. Updated when the next block is started.
- prevOffset
  
  private long prevOffset
  
  The offset of the previous block of the same type
- fileContext
  
  private HFileContext fileContext
  
  Meta data that holds information about the hfileblock
- allocator
  
  private final ByteBuffAllocator allocator
Constructor Details
- Writer
  
  public Writer(org.apache.hadoop.conf.Configuration conf, HFileDataBlockEncoder dataBlockEncoder, HFileContext fileContext)
  
  Parameters:
  
  dataBlockEncoder - data block encoding algorithm to use
- Writer
  
  public Writer(org.apache.hadoop.conf.Configuration conf, HFileDataBlockEncoder dataBlockEncoder, HFileContext fileContext, ByteBuffAllocator allocator, int maxSizeUnCompressed)
Method Details
- beforeShipped
  
  public void beforeShipped()
  
  Description copied from interface: ShipperListener
  
  The action that needs to be performed before Shipper.shipped() is performed
  
  Specified by:
  
  beforeShipped in interface ShipperListener
- getEncodingState
  
  EncodingState getEncodingState()
- startWriting
  
  DataOutputStream startWriting(BlockType newBlockType) throws IOException
  
  Starts writing into the block. The previous block's data is discarded.
  
  Returns:
  
  the stream the user can write their data into
  
  Throws:
  
  IOException
- write
  
  void write(Cell cell) throws IOException
  
  Writes the Cell to this block
  
  Throws:
  
  IOException
- ensureBlockReady
  
  void ensureBlockReady() throws IOException
  
  Transitions the block writer from the "writing" state to the "block ready" state. Does nothing if a block is already finished.
  
  Throws:
  
  IOException
- checkBoundariesWithPredicate
  
  public boolean checkBoundariesWithPredicate()
- finishBlock
  
  private void finishBlock() throws IOException
  
  Finish up writing of the block. Flushes the compressing stream (if using compression), fills out the header, does any compression/encryption of bytes to flush out to disk, and manages the cache on write content, if applicable. Sets block write state to "block ready".
  
  Throws:
  
  IOException
- putHeader
  
  private void putHeader(byte[] dest, int offset, int onDiskSize, int uncompressedSize, int onDiskDataSize)
  
  Put the header into the given byte array at the given offset.
  
  Parameters:
  
  onDiskSize - size of the block on disk header + data + checksum
  
  uncompressedSize - size of the block after decompression (but before optional data block decoding) including header
  
  onDiskDataSize - size of the block on disk with header and data but not including the checksums
- putHeader
  
  private void putHeader(ByteBuff buff, int onDiskSize, int uncompressedSize, int onDiskDataSize)
- putHeader
  
  private void putHeader(ByteArrayOutputStream dest, int onDiskSize, int uncompressedSize, int onDiskDataSize)
- writeHeaderAndData
  
  void writeHeaderAndData(org.apache.hadoop.fs.FSDataOutputStream out) throws IOException
  
  Similar to writeHeaderAndData(FSDataOutputStream), but records the offset of this block so that it can be referenced in the next block of the same type.
  
  Throws:
  
  IOException
- finishBlockAndWriteHeaderAndData
  
  protected void finishBlockAndWriteHeaderAndData(DataOutputStream out) throws IOException
  
  Writes the header and the compressed data of this block (or uncompressed data when not using compression) into the given stream. Can be called in the "writing" state or in the "block ready" state. If called in the "writing" state, transitions the writer to the "block ready" state.
  
  Parameters:
  
  out - the output stream to write the
  
  Throws:
  
  IOException
- getHeaderAndDataForTest
  
  byte[] getHeaderAndDataForTest() throws IOException
  
  Returns the header or the compressed data (or uncompressed data when not using compression) as a byte array. Can be called in the "writing" state or in the "block ready" state. If called in the "writing" state, transitions the writer to the "block ready" state. This returns the header + data + checksums stored on disk.
  
  Returns:
  
  header and data as they would be stored on disk in a byte array
  
  Throws:
  
  IOException
- release
  
  void release()
  
  Releases resources used by this writer.
- getOnDiskSizeWithoutHeader
  
  int getOnDiskSizeWithoutHeader()
  
  Returns the on-disk size of the data portion of the block. This is the compressed size if compression is enabled. Can only be called in the "block ready" state. Header is not compressed, and its size is not included in the return value.
  
  Returns:
  
  the on-disk size of the block, not including the header.
- getOnDiskSizeWithHeader
  
  int getOnDiskSizeWithHeader()
  
  Returns the on-disk size of the block. Can only be called in the "block ready" state.
  
  Returns:
  
  the on-disk size of the block ready to be written, including the header size, the data and the checksum data.
- getUncompressedSizeWithoutHeader
  
  int getUncompressedSizeWithoutHeader()
  
  The uncompressed size of the block data. Does not include header size.
- getUncompressedSizeWithHeader
  
  public int getUncompressedSizeWithHeader()
  
  The uncompressed size of the block data, including header size.
- isWriting
  
  boolean isWriting()
  
  Returns true if a block is being written
- encodedBlockSizeWritten
  
  public int encodedBlockSizeWritten()
  
  Returns the number of bytes written into the current block so far, or zero if not writing the block at the moment. Note that this will return zero in the "block ready" state as well.
  
  Returns:
  
  the number of bytes written
- blockSizeWritten
  
  public int blockSizeWritten()
  
  Returns the number of bytes written into the current block so far, or zero if not writing the block at the moment. Note that this will return zero in the "block ready" state as well.
  
  Returns:
  
  the number of bytes written
- cloneUncompressedBufferWithHeader
  
  ByteBuff cloneUncompressedBufferWithHeader()
  
  Clones the header followed by the uncompressed data, even if using compression. This is needed for storing uncompressed blocks in the block cache. Can be called in the "writing" state or the "block ready" state. Returns only the header and data, does not include checksum data.
  
  Returns:
  
  Returns an uncompressed block ByteBuff for caching on write
- cloneOnDiskBufferWithHeader
  
  private ByteBuff cloneOnDiskBufferWithHeader()
  
  Clones the header followed by the on-disk (compressed/encoded/encrypted) data. This is needed for storing packed blocks in the block cache. Returns only the header and data, Does not include checksum data.
  
  Returns:
  
  Returns a copy of block bytes for caching on write
- expectState
  
  private void expectState(HFileBlock.Writer.State expectedState)
- writeBlock
  
  void writeBlock(HFileBlock.BlockWritable bw, org.apache.hadoop.fs.FSDataOutputStream out) throws IOException
  
  Takes the given HFileBlock.BlockWritable instance, creates a new block of its appropriate type, writes the writable into this block, and flushes the block into the output stream. The writer is instructed not to buffer uncompressed bytes for cache-on-write.
  
  Parameters:
  
  bw - the block-writable object to write as a block
  
  out - the file system output stream
  
  Throws:
  
  IOException
- getBlockForCaching
  
  HFileBlock getBlockForCaching(CacheConfig cacheConf)
  
  Creates a new HFileBlock. Checksums have already been validated, so the byte buffer passed into the constructor of this newly created block does not have checksum data even though the header minor version is MINOR_VERSION_WITH_CHECKSUM. This is indicated by setting a 0 value in bytesPerChecksum. This method copies the on-disk or uncompressed data to build the HFileBlock which is used only while writing blocks and caching.
  TODO: Should there be an option where a cache can ask that hbase preserve block checksums for checking after a block comes out of the cache? Otehrwise, cache is responsible for blocks being wholesome (ECC memory or if file-backed, it does checksumming).

Class HFileBlock.Writer

Nested Class Summary

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Details

maxSizeUnCompressed

compressedSizePredicator

state

dataBlockEncoder

dataBlockEncodingCtx

defaultBlockEncodingCtx

baosInMemory

blockType

userDataStream

onDiskBlockBytesWithHeader

onDiskChecksum

startOffset

prevOffsetByType

prevOffset

fileContext

allocator

Constructor Details

Writer

Writer

Method Details

beforeShipped

getEncodingState

startWriting

write

ensureBlockReady

checkBoundariesWithPredicate

finishBlock

putHeader

putHeader

putHeader

writeHeaderAndData

finishBlockAndWriteHeaderAndData

getHeaderAndDataForTest

release

getOnDiskSizeWithoutHeader

getOnDiskSizeWithHeader

getUncompressedSizeWithoutHeader

getUncompressedSizeWithHeader

isWriting

encodedBlockSizeWritten

blockSizeWritten

cloneUncompressedBufferWithHeader

cloneOnDiskBufferWithHeader

expectState

writeBlock

getBlockForCaching