org.apache.hadoop.hbase.client.coprocessor (Apache HBase 2.6.0 API)

package org.apache.hadoop.hbase.client.coprocessor

Provides client classes for invoking Coprocessor RPC protocols

Overview
Example Usage

Overview

The coprocessor framework provides a way for custom code to run in place on the HBase region servers with each of a table's regions. These client classes enable applications to communicate with coprocessor instances via custom RPC protocols.

In order to provide a custom RPC protocol to clients, a coprocessor implementation must:

Define a protocol buffer Service and supporting Message types for the RPC methods. See the protocol buffer guide for more details on defining services.
Generate the Service and Message code using the protoc compiler
Implement the generated Service interface in your coprocessor class and implement the org.apache.hadoop.hbase.coprocessor.CoprocessorService interface. The org.apache.hadoop.hbase.coprocessor.CoprocessorService#getService() method should return a reference to the Endpoint's protocol buffer Service instance.

Clients may then call the defined service methods on coprocessor instances via the Table.coprocessorService(byte[]), Table.coprocessorService(Class, byte[], byte[], org.apache.hadoop.hbase.client.coprocessor.Batch.Call), and Table.coprocessorService(Class, byte[], byte[], org.apache.hadoop.hbase.client.coprocessor.Batch.Call, org.apache.hadoop.hbase.client.coprocessor.Batch.Callback) methods.

Since coprocessor Service instances are associated with individual regions within the table, the client RPC calls must ultimately identify which regions should be used in the Service method invocations. Since regions are seldom handled directly in client code and the region names may change over time, the coprocessor RPC calls use row keys to identify which regions should be used for the method invocations. Clients can call coprocessor Service methods against either:

a single region - calling Table.coprocessorService(byte[]) with a single row key. This returns a CoprocessorRpcChannel instance which communicates with the region containing the given row key (even if the row does not exist) as the RPC endpoint. Clients can then use the CoprocessorRpcChannel instance in creating a new Service stub to call RPC methods on the region's coprocessor.
a range of regions - calling Table.coprocessorService(Class, byte[], byte[], org.apache.hadoop.hbase.client.coprocessor.Batch.Call) or Table.coprocessorService(Class, byte[], byte[], org.apache.hadoop.hbase.client.coprocessor.Batch.Call, org.apache.hadoop.hbase.client.coprocessor.Batch.Callback) with a starting row key and an ending row key. All regions in the table from the region containing the start row key to the region containing the end row key (inclusive), will we used as the RPC endpoints.

Note that the row keys passed as parameters to the Table methods are not passed directly to the coprocessor Service implementations. They are only used to identify the regions for endpoints of the remote calls.

The Batch class defines two interfaces used for coprocessor Service invocations against multiple regions. Clients implement Batch.Call to call methods of the actual coprocessor Service instance. The interface's call() method will be called once per selected region, passing the Service instance for the region as a parameter. Clients can optionally implement Batch.Callback to be notified of the results from each region invocation as they complete. The instance's Batch.Callback.update(byte[], byte[], Object) method will be called with the Batch.Call.call(Object) return value from each region.

Example usage

To start with, let's use a fictitious coprocessor, RowCountEndpoint that counts the number of rows and key-values in each region where it is running. For clients to query this information, the coprocessor defines the following protocol buffer service:

message CountRequest {
}

message CountResponse {
  required int64 count = 1 [default = 0];
}

service RowCountService {
  rpc getRowCount(CountRequest)
    returns (CountResponse);
  rpc getKeyValueCount(CountRequest)
    returns (CountResponse);
}

Next run the protoc compiler on the .proto file to generate Java code for the Service interface. The generated RowCountService interface should look something like:

public static abstract class RowCountService
  implements com.google.protobuf.Service {
  ...
  public interface Interface {
    public abstract void getRowCount(
        com.google.protobuf.RpcController controller,
        org.apache.hadoop.hbase.coprocessor.example.generated.ExampleProtos.CountRequest request,
        com.google.protobuf.RpcCallback<org.apache.hadoop.hbase.coprocessor.example.generated.ExampleProtos.CountResponse> done);

    public abstract void getKeyValueCount(
        com.google.protobuf.RpcController controller,
        org.apache.hadoop.hbase.coprocessor.example.generated.ExampleProtos.CountRequest request,
        com.google.protobuf.RpcCallback<org.apache.hadoop.hbase.coprocessor.example.generated.ExampleProtos.CountResponse> done);
  }
}

Our coprocessor Service will need to implement this interface and the org.apache.hadoop.hbase.coprocessor.CoprocessorService in order to be registered correctly as an endpoint. For the sake of simplicity the server-side implementation is omitted. To see the implementing code, please see the org.apache.hadoop.hbase.coprocessor.example.RowCountEndpoint class in the HBase source code.

Now we need a way to access the results that RowCountService is making available. If we want to find the row count for all regions, we could use:

Connection connection = ConnectionFactory.createConnection(conf);
Table table = connection.getTable(TableName.valueOf("mytable"));
final ExampleProtos.CountRequest request = ExampleProtos.CountRequest.getDefaultInstance();
Map<byte[],Long> results = table.coprocessorService(
    ExampleProtos.RowCountService.class, // the protocol interface we're invoking
    null, null,                          // start and end row keys
    new Batch.Call<ExampleProtos.RowCountService,Long>() {
        public Long call(ExampleProtos.RowCountService counter) throws IOException {
          BlockingRpcCallback<ExampleProtos.CountResponse> rpcCallback =
              new BlockingRpcCallback<ExampleProtos.CountResponse>();
          counter.getRowCount(null, request, rpcCallback);
          ExampleProtos.CountResponse response = rpcCallback.get();
          return response.hasCount() ? response.getCount() : 0;
        }
    });

This will return a java.util.Map of the counter.getRowCount() result for the RowCountService instance running in each region of mytable, keyed by the region name.

By implementing Batch.Call as an anonymous class, we can invoke RowCountService methods directly against the Batch.Call.call(Object) method's argument. Calling Table.coprocessorService(Class, byte[], byte[], org.apache.hadoop.hbase.client.coprocessor.Batch.Call) will take care of invoking Batch.Call.call() against our anonymous class with the RowCountService instance for each table region.

Implementing Batch.Call also allows you to perform additional processing against each region's Service instance. For example, if you would like to combine row count and key-value count for each region:

Connection connection = ConnectionFactory.createConnection(conf);
Table table = connection.getTable(TableName.valueOf("mytable"));
// combine row count and kv count for region
final ExampleProtos.CountRequest request = ExampleProtos.CountRequest.getDefaultInstance();
Map<byte[],Long> results = table.coprocessorService(
    ExampleProtos.RowCountService.class, // the protocol interface we're invoking
    null, null,                          // start and end row keys
    new Batch.Call<ExampleProtos.RowCountService,Pair<Long,Long>>() {
       public Long call(ExampleProtos.RowCountService counter) throws IOException {
         BlockingRpcCallback<ExampleProtos.CountResponse> rowCallback =
             new BlockingRpcCallback<ExampleProtos.CountResponse>();
         counter.getRowCount(null, request, rowCallback);

         BlockingRpcCallback<ExampleProtos.CountResponse> kvCallback =
             new BlockingRpcCallback<ExampleProtos.CountResponse>();
         counter.getKeyValueCount(null, request, kvCallback);

         ExampleProtos.CountResponse rowResponse = rowCallback.get();
         ExampleProtos.CountResponse kvResponse = kvCallback.get();
         return new Pair(rowResponse.hasCount() ? rowResponse.getCount() : 0,
             kvResponse.hasCount() ? kvResponse.getCount() : 0);
    }
});

Related Packages

Package

Description

org.apache.hadoop.hbase.client

Provides HBase Client
Class

Description

Batch

A collection of interfaces and utilities used for interacting with custom RPC interfaces exposed by Coprocessors.

Batch.Call<T,R>

Defines a unit of work to be executed.

Batch.Callback<R>

Defines a generic callback to be triggered for each Batch.Call.call(Object) result.

BigDecimalColumnInterpreter

ColumnInterpreter for doing Aggregation's with BigDecimal columns.

DoubleColumnInterpreter

a concrete column interpreter implementation.

LongColumnInterpreter

a concrete column interpreter implementation.

RowProcessorClient

Convenience class that is used to make RowProcessorEndpoint invocations.

Package org.apache.hadoop.hbase.client.coprocessor

Overview

Example usage