Class DataBlockEncodingTool
java.lang.Object
org.apache.hadoop.hbase.regionserver.DataBlockEncodingTool
Tests various algorithms for key compression on an existing HFile. Useful for testing, debugging
and benchmarking.
-
Nested Class Summary
-
Field Summary
Modifier and TypeFieldDescriptionprivate static int
private static int
private static final double
private List<org.apache.hadoop.hbase.io.encoding.EncodedDataBlock>
private final org.apache.hadoop.hbase.io.compress.Compression.Algorithm
private final String
private final org.apache.hadoop.io.compress.Compressor
private final org.apache.hadoop.conf.Configuration
private final org.apache.hadoop.io.compress.Decompressor
private static final int
How many first runs should not be included in the benchmark.private static final int
How many times to run the benchmark.private static final org.apache.hadoop.hbase.io.compress.Compression.Algorithm
Compression algorithm to use if not specified on the command lineprivate static final DecimalFormat
private static final boolean
private static final String
private static final org.slf4j.Logger
private static final double
private static final double
private static final String
Number of first runs of every benchmark to omit from statisticsprivate static final String
Number of times to run each benchmarkprivate static final String
What compression algorithm to testprivate static final String
HFile name to be used in benchmarkprivate static final String
Maximum number of key/value pairs to process in a single benchmark runprivate static final String
Whether to run a benchmark to measure read throughputprivate static final String
If this is specified, no correctness testing will be doneprivate static final String
private byte[]
private long
private long
private long
private long
private long
private static boolean
private boolean
-
Constructor Summary
ConstructorDescriptionDataBlockEncodingTool
(org.apache.hadoop.conf.Configuration conf, String compressionAlgorithmName) -
Method Summary
Modifier and TypeMethodDescriptionvoid
benchmarkAlgorithm
(org.apache.hadoop.hbase.io.compress.Compression.Algorithm algorithm, String name, byte[] buffer, int offset, int length) Check decompress performance of a given algorithm and print it.void
Benchmark codec's speed.private void
benchmarkDefaultCompression
(int totalSize, byte[] rawBuffer) private int
benchmarkEncoder
(int previousTotalSize, org.apache.hadoop.hbase.io.encoding.EncodedDataBlock codec) Benchmark compression/decompression throughput.void
checkStatistics
(org.apache.hadoop.hbase.regionserver.KeyValueScanner scanner, int kvLimit) Check statistics for given HFile for different data block encoders.void
Display statistics of different compression algorithms.static void
A command line interface to benchmarks.private void
outputSavings
(String caption, long part, long whole) private static void
outputTuple
(String caption, String format, Object... values) private void
outputTuplePct
(String caption, long size) private static void
printBenchmarkResult
(int totalSize, List<Long> durationsInNanoSec, DataBlockEncodingTool.Manipulation manipulation) private static void
printUsage
(org.apache.hbase.thirdparty.org.apache.commons.cli.Options options) static void
testCodecs
(org.apache.hadoop.conf.Configuration conf, int kvLimit, String hfilePath, String compressionName, boolean doBenchmark, boolean doVerify) Test a data block encoder on the given HFile.boolean
verifyCodecs
(org.apache.hadoop.hbase.regionserver.KeyValueScanner scanner, int kvLimit) Verify if all data block encoders are working properly.
-
Field Details
-
LOG
-
includesMemstoreTS
- See Also:
-
DEFAULT_BENCHMARK_N_TIMES
How many times to run the benchmark. More times means better data in terms of statistics but slower execution. Has to be strictly larger thanDEFAULT_BENCHMARK_N_OMIT
.- See Also:
-
DEFAULT_BENCHMARK_N_OMIT
How many first runs should not be included in the benchmark. Done in order to exclude setup cost.- See Also:
-
OPT_HFILE_NAME
HFile name to be used in benchmark- See Also:
-
OPT_KV_LIMIT
Maximum number of key/value pairs to process in a single benchmark run- See Also:
-
OPT_MEASURE_THROUGHPUT
Whether to run a benchmark to measure read throughput- See Also:
-
OPT_OMIT_CORRECTNESS_TEST
If this is specified, no correctness testing will be done- See Also:
-
OPT_COMPRESSION_ALGORITHM
What compression algorithm to test- See Also:
-
OPT_BENCHMARK_N_TIMES
Number of times to run each benchmark- See Also:
-
OPT_BENCHMARK_N_OMIT
Number of first runs of every benchmark to omit from statistics- See Also:
-
DEFAULT_COMPRESSION
Compression algorithm to use if not specified on the command line -
DELIMITED_DECIMAL_FORMAT
-
PCT_FORMAT
- See Also:
-
INT_FORMAT
- See Also:
-
benchmarkNTimes
-
benchmarkNOmit
-
conf
-
codecs
-
totalPrefixLength
-
totalKeyLength
-
totalValueLength
-
totalKeyRedundancyLength
-
totalCFLength
-
rawKVs
-
useHBaseChecksum
-
compressionAlgorithmName
-
compressionAlgorithm
-
compressor
-
decompressor
-
USE_TAG
-
BYTES_IN_MB
- See Also:
-
NS_IN_SEC
- See Also:
-
MB_SEC_COEF
- See Also:
-
-
Constructor Details
-
DataBlockEncodingTool
public DataBlockEncodingTool(org.apache.hadoop.conf.Configuration conf, String compressionAlgorithmName) - Parameters:
compressionAlgorithmName
- What kind of algorithm should be used as baseline for comparison (e.g. lzo, gz).
-
-
Method Details
-
checkStatistics
public void checkStatistics(org.apache.hadoop.hbase.regionserver.KeyValueScanner scanner, int kvLimit) throws IOException Check statistics for given HFile for different data block encoders.- Parameters:
scanner
- Of file which will be compressed.kvLimit
- Maximal count of KeyValue which will be processed.- Throws:
IOException
- thrown if scanner is invalid
-
verifyCodecs
public boolean verifyCodecs(org.apache.hadoop.hbase.regionserver.KeyValueScanner scanner, int kvLimit) throws IOException Verify if all data block encoders are working properly.- Parameters:
scanner
- Of file which was compressed.kvLimit
- Maximal count of KeyValue which will be processed.- Returns:
- true if all data block encoders compressed/decompressed correctly.
- Throws:
IOException
- thrown if scanner is invalid
-
benchmarkCodecs
Benchmark codec's speed.- Throws:
IOException
-
benchmarkEncoder
private int benchmarkEncoder(int previousTotalSize, org.apache.hadoop.hbase.io.encoding.EncodedDataBlock codec) Benchmark compression/decompression throughput.- Parameters:
previousTotalSize
- Total size used for verification. Use -1 if unknown.codec
- Tested encoder.- Returns:
- Size of uncompressed data.
-
benchmarkDefaultCompression
- Throws:
IOException
-
benchmarkAlgorithm
public void benchmarkAlgorithm(org.apache.hadoop.hbase.io.compress.Compression.Algorithm algorithm, String name, byte[] buffer, int offset, int length) throws IOException Check decompress performance of a given algorithm and print it.- Parameters:
algorithm
- Compression algorithm.name
- Name of algorithm.buffer
- Buffer to be compressed.offset
- Position of the beginning of the data.length
- Length of data in buffer.- Throws:
IOException
-
printBenchmarkResult
private static void printBenchmarkResult(int totalSize, List<Long> durationsInNanoSec, DataBlockEncodingTool.Manipulation manipulation) -
outputTuple
-
displayStatistics
Display statistics of different compression algorithms.- Throws:
IOException
-
outputTuplePct
-
outputSavings
-
testCodecs
public static void testCodecs(org.apache.hadoop.conf.Configuration conf, int kvLimit, String hfilePath, String compressionName, boolean doBenchmark, boolean doVerify) throws IOException Test a data block encoder on the given HFile. Output results to console.- Parameters:
kvLimit
- The limit of KeyValue which will be analyzed.hfilePath
- an HFile path on the file system.compressionName
- Compression algorithm used for comparison.doBenchmark
- Run performance benchmarks.doVerify
- Verify correctness.- Throws:
IOException
- When pathName is incorrect.
-
printUsage
-
main
A command line interface to benchmarks. Parses command-line arguments and runs the appropriate benchmarks.- Parameters:
args
- Should have length at least 1 and holds the file path to HFile.- Throws:
IOException
- If you specified the wrong file.
-