Class GroupingTableMapper

java.lang.Object
org.apache.hadoop.mapreduce.Mapper<ImmutableBytesWritable,Result,KEYOUT,VALUEOUT>
org.apache.hadoop.hbase.mapreduce.TableMapper<ImmutableBytesWritable,Result>
org.apache.hadoop.hbase.mapreduce.GroupingTableMapper
All Implemented Interfaces:
org.apache.hadoop.conf.Configurable

@Public public class GroupingTableMapper extends TableMapper<ImmutableBytesWritable,Result> implements org.apache.hadoop.conf.Configurable
Extract grouping columns from input record.
  • Nested Class Summary

    Nested classes/interfaces inherited from class org.apache.hadoop.mapreduce.Mapper

    org.apache.hadoop.mapreduce.Mapper.Context
  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    protected byte[][]
    The grouping columns.
    private org.apache.hadoop.conf.Configuration
    The current configuration.
    static final String
    JobConf parameter to specify the columns used to produce the key passed to collect from the map phase.
  • Constructor Summary

    Constructors
    Constructor
    Description
     
  • Method Summary

    Modifier and Type
    Method
    Description
    createGroupKey(byte[][] vals)
    Create a key by concatenating multiple column values.
    protected byte[][]
    Extract columns values from the current record.
    org.apache.hadoop.conf.Configuration
    Returns the current configuration.
    static void
    initJob(String table, Scan scan, String groupColumns, Class<? extends TableMapper> mapper, org.apache.hadoop.mapreduce.Job job)
    Use this before submitting a TableMap job.
    void
    map(ImmutableBytesWritable key, Result value, org.apache.hadoop.mapreduce.Mapper<ImmutableBytesWritable,Result,ImmutableBytesWritable,Result>.org.apache.hadoop.mapreduce.Mapper.Context context)
    Extract the grouping columns from value to construct a new key.
    void
    setConf(org.apache.hadoop.conf.Configuration configuration)
    Sets the configuration.

    Methods inherited from class org.apache.hadoop.mapreduce.Mapper

    cleanup, run, setup

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Field Details

    • GROUP_COLUMNS

      public static final String GROUP_COLUMNS
      JobConf parameter to specify the columns used to produce the key passed to collect from the map phase.
      See Also:
    • columns

      protected byte[][] columns
      The grouping columns.
    • conf

      private org.apache.hadoop.conf.Configuration conf
      The current configuration.
  • Constructor Details

  • Method Details

    • initJob

      public static void initJob(String table, Scan scan, String groupColumns, Class<? extends TableMapper> mapper, org.apache.hadoop.mapreduce.Job job) throws IOException
      Use this before submitting a TableMap job. It will appropriately set up the job.
      Parameters:
      table - The table to be processed.
      scan - The scan with the columns etc.
      groupColumns - A space separated list of columns used to form the key used in collect.
      mapper - The mapper class.
      job - The current job.
      Throws:
      IOException - When setting up the job fails.
    • map

      public void map(ImmutableBytesWritable key, Result value, org.apache.hadoop.mapreduce.Mapper<ImmutableBytesWritable,Result,ImmutableBytesWritable,Result>.org.apache.hadoop.mapreduce.Mapper.Context context) throws IOException, InterruptedException
      Extract the grouping columns from value to construct a new key. Pass the new key and value to reduce. If any of the grouping columns are not found in the value, the record is skipped.
      Overrides:
      map in class org.apache.hadoop.mapreduce.Mapper<ImmutableBytesWritable,Result,ImmutableBytesWritable,Result>
      Parameters:
      key - The current key.
      value - The current value.
      context - The current context.
      Throws:
      IOException - When writing the record fails.
      InterruptedException - When the job is aborted.
    • extractKeyValues

      protected byte[][] extractKeyValues(Result r)
      Extract columns values from the current record. This method returns null if any of the columns are not found.

      Override this method if you want to deal with nulls differently.

      Parameters:
      r - The current values.
      Returns:
      Array of byte values.
    • createGroupKey

      protected ImmutableBytesWritable createGroupKey(byte[][] vals)
      Create a key by concatenating multiple column values.

      Override this function in order to produce different types of keys.

      Parameters:
      vals - The current key/values.
      Returns:
      A key generated by concatenating multiple column values.
    • getConf

      public org.apache.hadoop.conf.Configuration getConf()
      Returns the current configuration.
      Specified by:
      getConf in interface org.apache.hadoop.conf.Configurable
      Returns:
      The current configuration.
      See Also:
      • Configurable.getConf()
    • setConf

      public void setConf(org.apache.hadoop.conf.Configuration configuration)
      Sets the configuration. This is used to set up the grouping details.
      Specified by:
      setConf in interface org.apache.hadoop.conf.Configurable
      Parameters:
      configuration - The configuration to set.
      See Also:
      • Configurable.setConf(org.apache.hadoop.conf.Configuration)