diff options
| author | Charles.Forsyth <devnull@localhost> | 2006-12-22 20:52:35 +0000 |
|---|---|---|
| committer | Charles.Forsyth <devnull@localhost> | 2006-12-22 20:52:35 +0000 |
| commit | 46439007cf417cbd9ac8049bb4122c890097a0fa (patch) | |
| tree | 6fdb25e5f3a2b6d5657eb23b35774b631d4d97e4 /man/2/bloomfilter | |
| parent | 37da2899f40661e3e9631e497da8dc59b971cbd0 (diff) | |
20060303-partial
Diffstat (limited to 'man/2/bloomfilter')
| -rw-r--r-- | man/2/bloomfilter | 89 |
1 files changed, 89 insertions, 0 deletions
diff --git a/man/2/bloomfilter b/man/2/bloomfilter new file mode 100644 index 00000000..fe6046dd --- /dev/null +++ b/man/2/bloomfilter @@ -0,0 +1,89 @@ +.TH BLOOMFILTER 2 +.SH NAME +Bloomfilter \- Bloom filters +.SH SYNOPSIS +.EX +include "sets.m"; +include "bloomfilter.m"; +bloomfilter := load Bloomfilter Bloomfilter->PATH; + +init: fn(); +filter: fn(d: array of byte, logm, k: int): Sets->Set; +.EE +.SH DESCRIPTION +A Bloom filter is a method of representing a set to support probabilistic +membership queries. It uses independent hash functions of members +of the set to set elements of a bit-vector. +.I Init +should be called first to initialise the module. +.I Filter +returns a Set +.I s +representing the Bloom filter for the single-member +set +.RI { d }. +.I K +independent hash functions are used, each of range +.RI "[0, 2^" logm ), +to return a Bloom filter +.RI 2^ logm +bits wide. It is an error if +.I logm +is less than 3 or greater than 30. +.PP +Bloom filters can be combined by set union. +The set represented by Bloom filter +.I a +is not a subset of another +.I b +if there are any members in +.I a +that are not in +.IR b . +Together, +.IR logm , +.IR k , +and +.IR n +(the number of members in the set) +determine the +.I "false positve" rate +(the probability that a membership test will not eliminate +a member that is not in fact in the set). +The probability of a +.I "false positive" +is approximately (1-e^(-\fIkn\fP/(2^\fIlogm\fP))^\fIk\fP. +For a given false positive rate, +.IR f , +a useful formula to determine appropriate parameters +is: \fIk\fP=ceil(-log₂(\fIf\fP)), +and \fIlogm\fP=ceil(log₂(\fInk\fP)). +.SH EXAMPLES +Create a 128 bit-wide bloom filter +.I f +representing all the elements +in the string array +.IR elems , +with +.IR k =6. +.EX + A, B, None: import Sets; + for(i:=0; i<len elems; i++) + f = f.X(A|B, filter(array of byte elems[i], 7, 6)); +.EE +Test whether the string +.I s +is a member of +.IR f . +If there were 12 elements in +.IR elems , +the probability of a false positive would be +approximately 0.0063. +.EX + if(filter(array of byte s, 7, 6).X(A&~B, f).eq(None)) + sys->print("'%s' might be a member of f\\n", s); +.EE +.SH SOURCE +.B /appl/lib/bloomfilter.b +.SH SEE ALSO +.IR sets (2) |
