Class IntegrationTestBigLinkedList

java.lang.Object
org.apache.hadoop.hbase.util.AbstractHBaseTool
org.apache.hadoop.hbase.IntegrationTestBase
org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList
All Implemented Interfaces:
org.apache.hadoop.conf.Configurable, org.apache.hadoop.util.Tool
Direct Known Subclasses:
IntegrationTestBigLinkedListWithVisibility, IntegrationTestReplication

This is an integration test borrowed from goraci, written by Keith Turner, which is in turn inspired by the Accumulo test called continous ingest (ci). The original source code can be found here:

  • https://github.com/keith-turner/goraci
  • https://github.com/enis/goraci/

Apache Accumulo [0] has a simple test suite that verifies that data is not lost at scale. This test suite is called continuous ingest. This test runs many ingest clients that continually create linked lists containing 25 million nodes. At some point the clients are stopped and a map reduce job is run to ensure no linked list has a hole. A hole indicates data was lost.

The nodes in the linked list are random. This causes each linked list to spread across the table. Therefore if one part of a table loses data, then it will be detected by references in another part of the table.

THE ANATOMY OF THE TEST

Below is rough sketch of how data is written. For specific details look at the Generator code.

  1. Write out 1 million nodes (1M is the configurable 'width' mentioned below)
  2. Flush the client
  3. Write out 1 million that reference previous million
  4. If this is the 25th set of 1 million nodes, then update 1st set of million to point to last (25 is configurable; its the 'wrap multiplier' referred to below)
  5. goto 1

The key is that nodes only reference flushed nodes. Therefore a node should never reference a missing node, even if the ingest client is killed at any point in time.

When running this test suite w/ Accumulo there is a script running in parallel called the Aggitator that randomly and continuously kills server processes. The outcome was that many data loss bugs were found in Accumulo by doing this. This test suite can also help find bugs that impact uptime and stability when run for days or weeks.

This test suite consists the following

  • a few Java programs
  • a little helper script to run the java programs
  • a maven script to build it

When generating data, its best to have each map task generate a multiple of 25 million. The reason for this is that circular linked list are generated every 25M. Not generating a multiple in 25M will result in some nodes in the linked list not having references. The loss of an unreferenced node can not be detected.

Below is a description of the Java programs

  • Generator - A map only job that generates data. As stated previously, its best to generate data in multiples of 25M. An option is also available to allow concurrent walkers to select and walk random flushed loops during this phase.
  • Verify - A map reduce job that looks for holes. Look at the counts after running. REFERENCED and UNREFERENCED are ok, any UNDEFINED counts are bad. Do not run at the same time as the Generator.
  • Walker - A standalone program that start following a linked list and emits timing info.
  • Print - A standalone program that prints nodes in the linked list
  • Delete - A standalone program that deletes a single node
This class can be run as a unit test, as an integration test, or from the command line

ex:

 ./hbase org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList
    loop 2 1 100000 /temp 1 1000 50 1 0