org.apache.hadoop.hbase.util.AbstractHBaseTool

org.apache.hadoop.hbase.IntegrationTestBase

org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList

All Implemented Interfaces:: org.apache.hadoop.conf.Configurable, org.apache.hadoop.util.Tool

Direct Known Subclasses:: IntegrationTestBigLinkedListWithVisibility, IntegrationTestReplication

public class IntegrationTestBigLinkedList extends IntegrationTestBase

This is an integration test borrowed from goraci, written by Keith Turner, which is in turn inspired by the Accumulo test called continous ingest (ci). The original source code can be found here:

https://github.com/keith-turner/goraci
https://github.com/enis/goraci/

Apache Accumulo [0] has a simple test suite that verifies that data is not lost at scale. This test suite is called continuous ingest. This test runs many ingest clients that continually create linked lists containing 25 million nodes. At some point the clients are stopped and a map reduce job is run to ensure no linked list has a hole. A hole indicates data was lost.

The nodes in the linked list are random. This causes each linked list to spread across the table. Therefore if one part of a table loses data, then it will be detected by references in another part of the table.

THE ANATOMY OF THE TEST

Below is rough sketch of how data is written. For specific details look at the Generator code.

Write out 1 million nodes (1M is the configurable 'width' mentioned below)
Flush the client
Write out 1 million that reference previous million
If this is the 25th set of 1 million nodes, then update 1st set of million to point to last (25 is configurable; its the 'wrap multiplier' referred to below)
goto 1

The key is that nodes only reference flushed nodes. Therefore a node should never reference a missing node, even if the ingest client is killed at any point in time.

When running this test suite w/ Accumulo there is a script running in parallel called the Aggitator that randomly and continuously kills server processes. The outcome was that many data loss bugs were found in Accumulo by doing this. This test suite can also help find bugs that impact uptime and stability when run for days or weeks.

This test suite consists the following

a few Java programs
a little helper script to run the java programs
a maven script to build it

When generating data, its best to have each map task generate a multiple of 25 million. The reason for this is that circular linked list are generated every 25M. Not generating a multiple in 25M will result in some nodes in the linked list not having references. The loss of an unreferenced node can not be detected.

Below is a description of the Java programs

Generator - A map only job that generates data. As stated previously, its best to generate data in multiples of 25M. An option is also available to allow concurrent walkers to select and walk random flushed loops during this phase.
Verify - A map reduce job that looks for holes. Look at the counts after running. REFERENCED and UNREFERENCED are ok, any UNDEFINED counts are bad. Do not run at the same time as the Generator.
Walker - A standalone program that start following a linked list and emits timing info.
Print - A standalone program that prints nodes in the linked list
Delete - A standalone program that deletes a single node

This class can be run as a unit test, as an integration test, or from the command line

ex:

 ./hbase org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList
    loop 2 1 100000 /temp 1 1000 50 1 0

Nested Class Summary

Nested Classes

Modifier and Type

Class

Description

(package private) static class

IntegrationTestBigLinkedList.CINode

private static class

IntegrationTestBigLinkedList.Clean

private static class

IntegrationTestBigLinkedList.Delete

A stand alone program that deletes a single node.

(package private) static class

IntegrationTestBigLinkedList.Generator

A Map only job that generates random linked list and stores them.

(package private) static class

IntegrationTestBigLinkedList.Loop

Executes Generate and Verify in a loop.

private static class

IntegrationTestBigLinkedList.Print

A stand alone program that prints out portions of a list created by IntegrationTestBigLinkedList.Generator

(package private) static class

IntegrationTestBigLinkedList.Search

Tool to search missing rows in WALs and hfiles.

(package private) static class

IntegrationTestBigLinkedList.Verify

A Map Reduce job that verifies that the linked lists generated by IntegrationTestBigLinkedList.Generator do not have any holes.

private static class

IntegrationTestBigLinkedList.Walker

A stand alone program that follows a linked list created by IntegrationTestBigLinkedList.Generator and prints timing info.

(package private) static class

IntegrationTestBigLinkedList.WalkerBase

Nested classes/interfaces inherited from class org.apache.hadoop.hbase.util.AbstractHBaseTool
org.apache.hadoop.hbase.util.AbstractHBaseTool.OptionsOrderComparator
Field Summary

Fields

Modifier and Type

Field

Description

private static byte[]

BIG_FAMILY_NAME

protected static final byte[]

COLUMN_CLIENT

protected static final byte[]

COLUMN_COUNT

protected static final byte[]

COLUMN_PREV

private static final int

CONCURRENT_WALKER_DEFAULT

private static final String

CONCURRENT_WALKER_KEY

protected static String

DEFAULT_TABLE_NAME

protected static byte[]

FAMILY_NAME

private static final String

GENERATOR_NUM_MAPPERS_KEY

private static final String

GENERATOR_NUM_ROWS_PER_MAP_KEY

How many rows to write per map task.

private static final String

GENERATOR_WIDTH_KEY

private static final String

GENERATOR_WRAP_KEY

private static final int

MISSING_ROWS_TO_LOG

protected static final byte[]

NO_KEY

protected int

NUM_SLAVES_BASE

protected String[]

otherArgs

private static final int

ROWKEY_LENGTH

protected static String

TABLE_NAME_KEY

private static byte[]

TINY_FAMILY_NAME

protected String

toRun

private static final int

WIDTH_DEFAULT

private static final int

WRAP_DEFAULT

The 'wrap multipler' default.

Fields inherited from class org.apache.hadoop.hbase.IntegrationTestBase
CHAOS_MONKEY_PROPS, monkey, MONKEY_LONG_OPT, monkeyProps, monkeyToUse, NO_CLUSTER_CLEANUP_LONG_OPT, noClusterCleanUp, util

Fields inherited from class org.apache.hadoop.hbase.util.AbstractHBaseTool
cmdLineArgs, conf, EXIT_FAILURE, EXIT_SUCCESS, LONG_HELP_OPTION, options, SHORT_HELP_OPTION
Constructor Summary

Constructors

Constructor

Description

IntegrationTestBigLinkedList()
Method Summary

Modifier and Type

Method

Description

void

cleanUpCluster()

private static IntegrationTestBigLinkedList.CINode

getCINode(org.apache.hadoop.hbase.client.Result result, IntegrationTestBigLinkedList.CINode node)

protected Set<String>

getColumnFamilies()

Provides the name of the CFs that are protected from random Chaos monkey activity (alter)

org.apache.hadoop.hbase.TableName

getTablename()

Provides the name of the table that is protected from random Chaos monkey activity

(package private) static org.apache.hadoop.hbase.TableName

getTableName(org.apache.hadoop.conf.Configuration conf)

private static boolean

isMultiUnevenColumnFamilies(org.apache.hadoop.conf.Configuration conf)

static void

main(String[] args)

private void

printCommands()

protected void

processOptions(org.apache.hbase.thirdparty.org.apache.commons.cli.CommandLine cmd)

int

runTestFromCommandLine()

private static void

setJobConf(org.apache.hadoop.mapreduce.Job job, int numMappers, long numNodes, Integer width, Integer wrapMultiplier, Integer numWalkers)

static void

setJobScannerConf(org.apache.hadoop.mapreduce.Job job)

void

setUpCluster()

void

testContinuousIngest()

private void

usage()

Methods inherited from class org.apache.hadoop.hbase.IntegrationTestBase
addOptions, cleanUp, cleanUpMonkey, cleanUpMonkey, doWork, getConf, getDefaultMonkeyFactory, getTestingUtil, loadMonkeyProperties, processBaseOptions, setUp, setUpMonkey, startMonkey

Methods inherited from class org.apache.hadoop.hbase.util.AbstractHBaseTool
addOption, addOptNoArg, addOptNoArg, addOptWithArg, addOptWithArg, addRequiredOption, addRequiredOptWithArg, addRequiredOptWithArg, doStaticMain, getOptionAsDouble, getOptionAsInt, getOptionAsInt, getOptionAsLong, getOptionAsLong, newParser, parseArgs, parseInt, parseLong, printUsage, printUsage, processOldArgs, run, setConf

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Details
- NO_KEY
  
  protected static final byte[] NO_KEY
- TABLE_NAME_KEY
  
  protected static String TABLE_NAME_KEY
- DEFAULT_TABLE_NAME
  
  protected static String DEFAULT_TABLE_NAME
- FAMILY_NAME
  
  protected static byte[] FAMILY_NAME
- BIG_FAMILY_NAME
  
  private static byte[] BIG_FAMILY_NAME
- TINY_FAMILY_NAME
  
  private static byte[] TINY_FAMILY_NAME
- COLUMN_PREV
  
  protected static final byte[] COLUMN_PREV
- COLUMN_CLIENT
  
  protected static final byte[] COLUMN_CLIENT
- COLUMN_COUNT
  
  protected static final byte[] COLUMN_COUNT
- GENERATOR_NUM_ROWS_PER_MAP_KEY
  
  private static final String GENERATOR_NUM_ROWS_PER_MAP_KEY
  
  How many rows to write per map task. This has to be a multiple of 25M
  See Also:
  
  Constant Field Values
- GENERATOR_NUM_MAPPERS_KEY
  
  private static final String GENERATOR_NUM_MAPPERS_KEY
  See Also:
  
  Constant Field Values
- GENERATOR_WIDTH_KEY
  
  private static final String GENERATOR_WIDTH_KEY
  See Also:
  
  Constant Field Values
- GENERATOR_WRAP_KEY
  
  private static final String GENERATOR_WRAP_KEY
  See Also:
  
  Constant Field Values
- CONCURRENT_WALKER_KEY
  
  private static final String CONCURRENT_WALKER_KEY
  See Also:
  
  Constant Field Values
- NUM_SLAVES_BASE
  
  protected int NUM_SLAVES_BASE
- MISSING_ROWS_TO_LOG
  
  private static final int MISSING_ROWS_TO_LOG
  See Also:
  
  Constant Field Values
- WIDTH_DEFAULT
  
  private static final int WIDTH_DEFAULT
  See Also:
  
  Constant Field Values
- WRAP_DEFAULT
  
  private static final int WRAP_DEFAULT
  
  The 'wrap multipler' default.
  See Also:
  
  Constant Field Values
- ROWKEY_LENGTH
  
  private static final int ROWKEY_LENGTH
  See Also:
  
  Constant Field Values
- CONCURRENT_WALKER_DEFAULT
  
  private static final int CONCURRENT_WALKER_DEFAULT
  See Also:
  
  Constant Field Values
- toRun
  
  protected String toRun
- otherArgs
  
  protected String[] otherArgs
Constructor Details
- IntegrationTestBigLinkedList
  
  public IntegrationTestBigLinkedList()
Method Details
- getTableName
  
  static org.apache.hadoop.hbase.TableName getTableName(org.apache.hadoop.conf.Configuration conf)
- getCINode
  
  private static IntegrationTestBigLinkedList.CINode getCINode(org.apache.hadoop.hbase.client.Result result, IntegrationTestBigLinkedList.CINode node)
- setUpCluster
  
  public void setUpCluster() throws Exception
  
  Specified by:
  
  setUpCluster in class IntegrationTestBase
  
  Throws:
  
  Exception
- cleanUpCluster
  
  public void cleanUpCluster() throws Exception
  
  Overrides:
  
  cleanUpCluster in class IntegrationTestBase
  
  Throws:
  
  Exception
- isMultiUnevenColumnFamilies
  
  private static boolean isMultiUnevenColumnFamilies(org.apache.hadoop.conf.Configuration conf)
- testContinuousIngest
  
  public void testContinuousIngest() throws IOException, Exception
  
  Throws:
  
  IOException
  
  Exception
- usage
  
  private void usage()
- printCommands
  
  private void printCommands()
- processOptions
  
  protected void processOptions(org.apache.hbase.thirdparty.org.apache.commons.cli.CommandLine cmd)
  
  Overrides:
  
  processOptions in class IntegrationTestBase
- runTestFromCommandLine
  
  public int runTestFromCommandLine() throws Exception
  
  Specified by:
  
  runTestFromCommandLine in class IntegrationTestBase
  
  Throws:
  
  Exception
- getTablename
  
  public org.apache.hadoop.hbase.TableName getTablename()
  
  Description copied from class: IntegrationTestBase
  
  Provides the name of the table that is protected from random Chaos monkey activity
  
  Specified by:
  
  getTablename in class IntegrationTestBase
  
  Returns:
  
  table to not delete.
- getColumnFamilies
  
  protected Set<String> getColumnFamilies()
  
  Description copied from class: IntegrationTestBase
  
  Provides the name of the CFs that are protected from random Chaos monkey activity (alter)
  
  Specified by:
  
  getColumnFamilies in class IntegrationTestBase
  
  Returns:
  
  set of cf names to protect.
- setJobConf
  
  private static void setJobConf(org.apache.hadoop.mapreduce.Job job, int numMappers, long numNodes, Integer width, Integer wrapMultiplier, Integer numWalkers)
- setJobScannerConf
  
  public static void setJobScannerConf(org.apache.hadoop.mapreduce.Job job)
- main
  
  public static void main(String[] args) throws Exception
  
  Throws:
  
  Exception

Class IntegrationTestBigLinkedList

THE ANATOMY OF THE TEST

Below is a description of the Java programs

Nested Class Summary

Nested classes/interfaces inherited from class org.apache.hadoop.hbase.util.AbstractHBaseTool

Field Summary

Fields inherited from class org.apache.hadoop.hbase.IntegrationTestBase

Fields inherited from class org.apache.hadoop.hbase.util.AbstractHBaseTool

Constructor Summary

Method Summary

Methods inherited from class org.apache.hadoop.hbase.IntegrationTestBase

Methods inherited from class org.apache.hadoop.hbase.util.AbstractHBaseTool

Methods inherited from class java.lang.Object

Field Details

NO_KEY

TABLE_NAME_KEY

DEFAULT_TABLE_NAME

FAMILY_NAME

BIG_FAMILY_NAME

TINY_FAMILY_NAME

COLUMN_PREV

COLUMN_CLIENT

COLUMN_COUNT

GENERATOR_NUM_ROWS_PER_MAP_KEY

GENERATOR_NUM_MAPPERS_KEY

GENERATOR_WIDTH_KEY

GENERATOR_WRAP_KEY

CONCURRENT_WALKER_KEY

NUM_SLAVES_BASE

MISSING_ROWS_TO_LOG

WIDTH_DEFAULT

WRAP_DEFAULT

ROWKEY_LENGTH

CONCURRENT_WALKER_DEFAULT

toRun

otherArgs

Constructor Details

IntegrationTestBigLinkedList

Method Details

getTableName

getCINode

setUpCluster

cleanUpCluster

isMultiUnevenColumnFamilies

testContinuousIngest

usage

printCommands

processOptions

runTestFromCommandLine

getTablename

getColumnFamilies

setJobConf

setJobScannerConf

main