Class WARCOutputFormat
java.lang.Object
org.apache.hadoop.mapreduce.OutputFormat<K,V>
org.apache.hadoop.mapreduce.lib.output.FileOutputFormat<org.apache.hadoop.io.NullWritable,WARCWritable>
org.apache.hadoop.hbase.test.util.warc.WARCOutputFormat
public class WARCOutputFormat
extends org.apache.hadoop.mapreduce.lib.output.FileOutputFormat<org.apache.hadoop.io.NullWritable,WARCWritable>
Hadoop OutputFormat for mapreduce jobs ('new' API) that want to write data to WARC files. Usage:
```java Job job = new Job(getConf()); job.setOutputFormatClass(WARCOutputFormat.class);
job.setOutputKeyClass(NullWritable.class); job.setOutputValueClass(WARCWritable.class);
FileOutputFormat.setCompressOutput(job, true); ``` The tasks generating the output (usually the
reducers, but may be the mappers if there are no reducers) should use `NullWritable.get()` as the
output key, and the
WARCWritable
as the output value.-
Nested Class Summary
Nested classes/interfaces inherited from class org.apache.hadoop.mapreduce.lib.output.FileOutputFormat
org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.Counter
-
Field Summary
Fields inherited from class org.apache.hadoop.mapreduce.lib.output.FileOutputFormat
BASE_OUTPUT_NAME, COMPRESS, COMPRESS_CODEC, COMPRESS_TYPE, OUTDIR, PART
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionorg.apache.hadoop.mapreduce.RecordWriter<org.apache.hadoop.io.NullWritable,
WARCWritable> getRecordWriter
(org.apache.hadoop.mapreduce.TaskAttemptContext context) Creates a new output file in WARC format, and returns a RecordWriter for writing to it.Methods inherited from class org.apache.hadoop.mapreduce.lib.output.FileOutputFormat
checkOutputSpecs, getCompressOutput, getDefaultWorkFile, getOutputCommitter, getOutputCompressorClass, getOutputName, getOutputPath, getPathForWorkFile, getUniqueFile, getWorkOutputPath, setCompressOutput, setOutputCompressorClass, setOutputName, setOutputPath
-
Constructor Details
-
WARCOutputFormat
public WARCOutputFormat()
-
-
Method Details
-
getRecordWriter
public org.apache.hadoop.mapreduce.RecordWriter<org.apache.hadoop.io.NullWritable,WARCWritable> getRecordWriter(org.apache.hadoop.mapreduce.TaskAttemptContext context) throws IOException, InterruptedException Creates a new output file in WARC format, and returns a RecordWriter for writing to it.- Specified by:
getRecordWriter
in classorg.apache.hadoop.mapreduce.lib.output.FileOutputFormat<org.apache.hadoop.io.NullWritable,
WARCWritable> - Throws:
IOException
InterruptedException
-