Class BloomFilterUtil

java.lang.Object
org.apache.hadoop.hbase.util.BloomFilterUtil

@Private public final class BloomFilterUtil extends Object
Utility methods related to BloomFilters
  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    static final byte[]
    Bit-value lookup array to prevent doing the same work over and over
    static final double
    Used in computing the optimal Bloom filter size.
    static final String
     
    private static Random
    A random number generator to use for "fake lookups" when testing to estimate the ideal false positive rate.
    static final String
    Record separator for the Bloom filter statistics human-readable string
  • Constructor Summary

    Constructors
    Modifier
    Constructor
    Description
    private
    Private constructor to keep this class from being instantiated.
  • Method Summary

    Modifier and Type
    Method
    Description
    static double
    actualErrorRate(long maxKeys, long bitSize, int functionCount)
    Computes the actual error rate for the given number of elements, number of bits, and number of hash functions.
    (package private) static boolean
    checkBit(int pos, ByteBuff bloomBuf, int bloomOffset)
    Check if bit at specified index is 1.
    static long
    computeBitSize(long maxKeys, double errorRate)
     
    static int
    computeFoldableByteSize(long bitSize, int foldFactor)
    Increases the given byte size of a Bloom filter until it can be folded by the given factor.
    static long
    computeMaxKeys(long bitSize, double errorRate, int hashCount)
    The maximum number of keys we can put into a Bloom filter of a certain size to get the given error rate, with the given number of hash functions.
    static boolean
    contains(byte[] buf, int offset, int length, ByteBuff bloomBuf, int bloomOffset, int bloomSize, Hash hash, int hashCount)
     
    static boolean
    contains(Cell cell, ByteBuff bloomBuf, int bloomOffset, int bloomSize, Hash hash, int hashCount, BloomType type)
     
    private static <T> boolean
    contains(ByteBuff bloomBuf, int bloomOffset, int bloomSize, Hash hash, int hashCount, HashKey<T> hashKey)
     
    createBySize(int byteSizeHint, double errorRate, int hashType, int foldFactor, BloomType bloomType)
    Creates a Bloom filter chunk of the given size.
    static String
    A human-readable string with statistics for the given Bloom filter.
    static byte[]
    getBloomFilterParam(BloomType bloomFilterType, org.apache.hadoop.conf.Configuration conf)
     
    static long
    idealMaxKeys(long bitSize, double errorRate)
    The maximum number of keys we can put into a Bloom filter of a certain size to maintain the given error rate, assuming the number of hash functions is chosen optimally and does not even have to be an integer (hence the "ideal" in the function name).
    static int
    optimalFunctionCount(int maxKeys, long bitSize)
     
    static void
    Sets a random generator to be used for look-ups instead of computing hashes.
    static String
     

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Field Details

    • STATS_RECORD_SEP

      public static final String STATS_RECORD_SEP
      Record separator for the Bloom filter statistics human-readable string
      See Also:
    • LOG2_SQUARED

      public static final double LOG2_SQUARED
      Used in computing the optimal Bloom filter size. This approximately equals 0.480453.
    • randomGeneratorForTest

      A random number generator to use for "fake lookups" when testing to estimate the ideal false positive rate.
    • PREFIX_LENGTH_KEY

      public static final String PREFIX_LENGTH_KEY
      See Also:
    • bitvals

      public static final byte[] bitvals
      Bit-value lookup array to prevent doing the same work over and over
  • Constructor Details

    • BloomFilterUtil

      private BloomFilterUtil()
      Private constructor to keep this class from being instantiated.
  • Method Details

    • computeBitSize

      public static long computeBitSize(long maxKeys, double errorRate)
      Returns:
      the number of bits for a Bloom filter than can hold the given number of keys and provide the given error rate, assuming that the optimal number of hash functions is used and it does not have to be an integer.
    • setRandomGeneratorForTest

      public static void setRandomGeneratorForTest(Random random)
      Sets a random generator to be used for look-ups instead of computing hashes. Can be used to simulate uniformity of accesses better in a test environment. Should not be set in a real environment where correctness matters!

      This gets used in contains(ByteBuff, int, int, Hash, int, HashKey)

      Parameters:
      random - The random number source to use, or null to compute actual hashes
    • idealMaxKeys

      public static long idealMaxKeys(long bitSize, double errorRate)
      The maximum number of keys we can put into a Bloom filter of a certain size to maintain the given error rate, assuming the number of hash functions is chosen optimally and does not even have to be an integer (hence the "ideal" in the function name).
      Returns:
      maximum number of keys that can be inserted into the Bloom filter
      See Also:
    • computeMaxKeys

      public static long computeMaxKeys(long bitSize, double errorRate, int hashCount)
      The maximum number of keys we can put into a Bloom filter of a certain size to get the given error rate, with the given number of hash functions.
      Returns:
      the maximum number of keys that can be inserted in a Bloom filter to maintain the target error rate, if the number of hash functions is provided.
    • actualErrorRate

      public static double actualErrorRate(long maxKeys, long bitSize, int functionCount)
      Computes the actual error rate for the given number of elements, number of bits, and number of hash functions. Taken directly from the Wikipedia Bloom filter article.
      Returns:
      the actual error rate
    • computeFoldableByteSize

      public static int computeFoldableByteSize(long bitSize, int foldFactor)
      Increases the given byte size of a Bloom filter until it can be folded by the given factor.
      Returns:
      Foldable byte size
    • optimalFunctionCount

      public static int optimalFunctionCount(int maxKeys, long bitSize)
    • createBySize

      public static BloomFilterChunk createBySize(int byteSizeHint, double errorRate, int hashType, int foldFactor, BloomType bloomType)
      Creates a Bloom filter chunk of the given size.
      Parameters:
      byteSizeHint - the desired number of bytes for the Bloom filter bit array. Will be increased so that folding is possible.
      errorRate - target false positive rate of the Bloom filter
      hashType - Bloom filter hash function type
      Returns:
      the new Bloom filter of the desired size
    • contains

      public static boolean contains(byte[] buf, int offset, int length, ByteBuff bloomBuf, int bloomOffset, int bloomSize, Hash hash, int hashCount)
    • contains

      private static <T> boolean contains(ByteBuff bloomBuf, int bloomOffset, int bloomSize, Hash hash, int hashCount, HashKey<T> hashKey)
    • contains

      public static boolean contains(Cell cell, ByteBuff bloomBuf, int bloomOffset, int bloomSize, Hash hash, int hashCount, BloomType type)
    • checkBit

      static boolean checkBit(int pos, ByteBuff bloomBuf, int bloomOffset)
      Check if bit at specified index is 1.
      Parameters:
      pos - index of bit
      Returns:
      true if bit at specified index is 1, false if 0.
    • formatStats

      public static String formatStats(BloomFilterBase bloomFilter)
      A human-readable string with statistics for the given Bloom filter.
      Parameters:
      bloomFilter - the Bloom filter to output statistics for;
      Returns:
      a string consisting of "<key>: <value>" parts separated by STATS_RECORD_SEP.
    • toString

      public static String toString(BloomFilterChunk bloomFilter)
    • getBloomFilterParam

      public static byte[] getBloomFilterParam(BloomType bloomFilterType, org.apache.hadoop.conf.Configuration conf) throws IllegalArgumentException
      Throws:
      IllegalArgumentException