Class HFileBlock.Writer

java.lang.Object
org.apache.hadoop.hbase.io.hfile.HFileBlock.Writer
All Implemented Interfaces:
ShipperListener
Enclosing class:
HFileBlock

static class HFileBlock.Writer extends Object implements ShipperListener
Unified version 2 HFile block writer. The intended usage pattern is as follows:
  1. Construct an HFileBlock.Writer, providing a compression algorithm.
  2. Call startWriting(org.apache.hadoop.hbase.io.hfile.BlockType) and get a data stream to write to.
  3. Write your data into the stream.
  4. Call Writer#writeHeaderAndData(FSDataOutputStream) as many times as you need to. store the serialized block into an external stream.
  5. Repeat to write more blocks.

  • Field Details

  • Constructor Details

  • Method Details

    • beforeShipped

      public void beforeShipped()
      Description copied from interface: ShipperListener
      The action that needs to be performed before Shipper.shipped() is performed
      Specified by:
      beforeShipped in interface ShipperListener
    • getEncodingState

    • startWriting

      Starts writing into the block. The previous block's data is discarded.
      Returns:
      the stream the user can write their data into
      Throws:
      IOException
    • write

      void write(ExtendedCell cell) throws IOException
      Writes the Cell to this block
      Throws:
      IOException
    • ensureBlockReady

      Transitions the block writer from the "writing" state to the "block ready" state. Does nothing if a block is already finished.
      Throws:
      IOException
    • checkBoundariesWithPredicate

      public boolean checkBoundariesWithPredicate()
    • finishBlock

      private void finishBlock() throws IOException
      Finish up writing of the block. Flushes the compressing stream (if using compression), fills out the header, does any compression/encryption of bytes to flush out to disk, and manages the cache on write content, if applicable. Sets block write state to "block ready".
      Throws:
      IOException
    • putHeader

      private void putHeader(byte[] dest, int offset, int onDiskSize, int uncompressedSize, int onDiskDataSize)
      Put the header into the given byte array at the given offset.
      Parameters:
      onDiskSize - size of the block on disk header + data + checksum
      uncompressedSize - size of the block after decompression (but before optional data block decoding) including header
      onDiskDataSize - size of the block on disk with header and data but not including the checksums
    • putHeader

      private void putHeader(ByteBuff buff, int onDiskSize, int uncompressedSize, int onDiskDataSize)
    • putHeader

      private void putHeader(ByteArrayOutputStream dest, int onDiskSize, int uncompressedSize, int onDiskDataSize)
    • writeHeaderAndData

      void writeHeaderAndData(org.apache.hadoop.fs.FSDataOutputStream out) throws IOException
      Similar to writeHeaderAndData(FSDataOutputStream), but records the offset of this block so that it can be referenced in the next block of the same type.
      Throws:
      IOException
    • finishBlockAndWriteHeaderAndData

      Writes the header and the compressed data of this block (or uncompressed data when not using compression) into the given stream. Can be called in the "writing" state or in the "block ready" state. If called in the "writing" state, transitions the writer to the "block ready" state.
      Parameters:
      out - the output stream to write the
      Throws:
      IOException
    • getHeaderAndDataForTest

      Returns the header or the compressed data (or uncompressed data when not using compression) as a byte array. Can be called in the "writing" state or in the "block ready" state. If called in the "writing" state, transitions the writer to the "block ready" state. This returns the header + data + checksums stored on disk.
      Returns:
      header and data as they would be stored on disk in a byte array
      Throws:
      IOException
    • release

      void release()
      Releases resources used by this writer.
    • getOnDiskSizeWithoutHeader

      Returns the on-disk size of the data portion of the block. This is the compressed size if compression is enabled. Can only be called in the "block ready" state. Header is not compressed, and its size is not included in the return value.
      Returns:
      the on-disk size of the block, not including the header.
    • getOnDiskSizeWithHeader

      Returns the on-disk size of the block. Can only be called in the "block ready" state.
      Returns:
      the on-disk size of the block ready to be written, including the header size, the data and the checksum data.
    • getUncompressedSizeWithoutHeader

      The uncompressed size of the block data. Does not include header size.
    • getUncompressedSizeWithHeader

      The uncompressed size of the block data, including header size.
    • isWriting

      boolean isWriting()
      Returns true if a block is being written
    • encodedBlockSizeWritten

      Returns the number of bytes written into the current block so far, or zero if not writing the block at the moment. Note that this will return zero in the "block ready" state as well.
      Returns:
      the number of bytes written
    • blockSizeWritten

      public int blockSizeWritten()
      Returns the number of bytes written into the current block so far, or zero if not writing the block at the moment. Note that this will return zero in the "block ready" state as well.
      Returns:
      the number of bytes written
    • cloneUncompressedBufferWithHeader

      Clones the header followed by the uncompressed data, even if using compression. This is needed for storing uncompressed blocks in the block cache. Can be called in the "writing" state or the "block ready" state. Returns only the header and data, does not include checksum data.
      Returns:
      Returns an uncompressed block ByteBuff for caching on write
    • cloneOnDiskBufferWithHeader

      Clones the header followed by the on-disk (compressed/encoded/encrypted) data. This is needed for storing packed blocks in the block cache. Returns only the header and data, Does not include checksum data.
      Returns:
      Returns a copy of block bytes for caching on write
    • expectState

      private void expectState(HFileBlock.Writer.State expectedState)
    • writeBlock

      void writeBlock(HFileBlock.BlockWritable bw, org.apache.hadoop.fs.FSDataOutputStream out) throws IOException
      Takes the given HFileBlock.BlockWritable instance, creates a new block of its appropriate type, writes the writable into this block, and flushes the block into the output stream. The writer is instructed not to buffer uncompressed bytes for cache-on-write.
      Parameters:
      bw - the block-writable object to write as a block
      out - the file system output stream
      Throws:
      IOException
    • getBlockForCaching

      Creates a new HFileBlock. Checksums have already been validated, so the byte buffer passed into the constructor of this newly created block does not have checksum data even though the header minor version is MINOR_VERSION_WITH_CHECKSUM. This is indicated by setting a 0 value in bytesPerChecksum. This method copies the on-disk or uncompressed data to build the HFileBlock which is used only while writing blocks and caching.

      TODO: Should there be an option where a cache can ask that hbase preserve block checksums for checking after a block comes out of the cache? Otehrwise, cache is responsible for blocks being wholesome (ECC memory or if file-backed, it does checksumming).