A probabilistic algorithm uses the outcome of a random process to influence execution, which can be helpful for certain kinds of predictions
Why would this ever be useful? Sometimes calculating the best choice takes too much time, and estimating it always introduces bias
For example, determining information about individuals in a large population can be done quicker if we use random sampling
Definition: A probabilistic Turing machine is a type of nondeterministic Turing machine in which each nondeterministic step is called a coin-flip step and has two legal next moves
We assign a probability to each branch of ‘s computation on input as where is the number of coin-flip steps that occur on branch
Then,
Definition: decides language A with probability if and
We also consider error probability bounds as a function of , like which is exponentially small error
We measure time and space complexity as with any nondeterministic Turing machine, taking the worst case computation branch on each input
Definition: is the class of languages decidable by probabilistic polynomial time Turing machines with an error probability of
This class is equivalent for any by virtue of the amplification lemma
Lemma: Let be a fixed constant with , then for any polynomial , a probabilistic Turing machine with error has an equivalent machine with error
The gist of this result is we can simply repeat the same computation times and take the majority result, which reduces the error and ultimately still runs in polynomial time
Primality
While a polynomial time algorithm for testing primality exists, the most practical algorithms are probabilistic
The most obvious way to test primality is by testing for factors, but this is not known to be possible in probabilistic polynomial time
Definition: For any greater than 1, we say that two numbers are equivalent modulo p if they differ by a multiple of , in which case we write
Every number is equivalent modulo to some member of the set , and for convenient we also write
The main idea of this algorithm uses Fermat’s little theorem
Theorem: If is prime and , then
We use this to derive a Fermat test, where we say passes the Fermat test at if
Definition: A number is pseudoprime if it passes Fermat tests for all smaller a’s relatively prime to it
The pseudoprime numbers are identical to the prime numbers with the exception of the infrequent Carmichael numbers
“On input :
- Select randomly in
- Compute for each
- Accept if all computed values are 1”
Because a number that is not pseudoprime it must fail at least half of the tests (not shown here), the probability of a false positive is at most
Modular exponentiation is computable in polynomial time, so this is a probabilistic polynomial time algorithm
To convert this to a full primality algorithm, we use the principle that the number 1 has exactly two square roots, 1 and -1, modulo any prime , but will always have four or more for many composite numbers, including all Carmichael numbers
Because of this, if passes the Fermat test at , , and we can calculate the square root of by dividing the degree of the exponent by 2 until we get a different number (starting with )
“On input :
- If is even, accept if and reject otherwise
- Select randomly in
- For each from 1 to :
- Compute and reject if different from 1
- Let where is odd and compute
- If some element of this sequence is not 1, find the last element that is not 1 and reject if that element is not
- If the test has not rejected at this point, accept”
Let’s demonstrate that this works as intended, and the maximum error probability is
We say is a (compositeness) witness if either test 1 or 2 rejects (stage 4 and 6 respectively)
First, we see that if is an odd prime number, then
cannot be a test 1 witness because of Fermat’s little theorem
cannot be a test 2 witness because this would imply some , which leads to , meaning cannot be prime
To prove the error bound, we use the Chinese remainder theorem, which says that if and are relatively prime, then each number corresponds to a unique pair such that and
We show that for a random by finding a unique witness for each nonwitness
In every nonwitness, the test 2 sequence consists of either all 1s or all 1s along with a -1 at some position
Among all nonwitnesses with a -1 in some test 2 position, we find the witness in which -1 appears in the largest position , such that
Because is composite, either is the power of a prime or we can write as the product of and relatively prime
The Chinese remainder theorem tells us that some number exists such that,
Therefore,
Since , we see that and , so is a witness
Now, note that for each nonwitness , and , due to the way we chose
Then, and , so is a witness
If and are distinct nonwitnesses, then , since if then
Thus we’ve proven our analysis in the case where is not a prime number
Otherwise with prime and , in which case we let
which is by definition equivalent to
Hence is a test 1 witness, because if , then
Then we obtain our other witnesses as above, and our bound is proven
Theorem:
This specific algorithm has a one-sided error, in the sense that rejections are absolute
This is a common feature, so there is a special complexity class
Definition: is the class of language decidable by probabilistic polynomial time Turing machines where inputs in the language are accepted with a probability of at least , and inputs not in the language are rejected with a probability of
Our primality algorithm shows that
Again, we can use a probability amplification technique on these algorithms
Read-Once Branching Programs
We examine branching programs as an interesting example of a problem that can apparently only be solved in polynomial time with probabilism
Definition: A branching program is a directed acyclic graph where all query nodes are labeled by variables with two outgoing edges labeled 0 and 1, except for two output nodes
A branching program determines a Boolean function on the values of the query nodes
Branching programs are related to the class in a way that is analogous to the relationship between Boolean circuits and the class
The problem of testing equivalence for branching programs is -complete
Here we consider a read-once branching program, which can query each variable at most one time on any execution path
Theorem: is in
Our algorithm works by assigning random values to the variables
Instead of Boolean values, we modify our scheme to handle non-Boolean assignments to the variables and allow the values to propagate through the graph, summing the edges flowing into a single node, such that the result is a polynomial
Let be a finite field with at least elements (the coefficient is arbitrary), where is the number of variables of the read-once branching programs we’re investigating
“On input :
- Select at random from
- Accept if ”
We show this decides with an error probability of at most
Because is read-once, we may write as a sum of product terms where each is or 1, and where each product term corresponds to a path in from the start node to the output node labeled 1
We can then rewrite into an equivalent polynomial such that no product terms contain , by repeatedly splitting terms into the sum of a term with and
The result is that contains a product term for each assignment on which evaluates to 1
If and are equivalent then the polynomials and must be the same, and therefore and are equal on every assignment
To show the probability that rejects nonequivalent branching programs, we need some results on polynomials
First, a degree- polynomial on a single variable either has at most roots or is everywhere equal to 0 (we can prove this simply by induction)
Lemma: Now let be a finite field with elements and let be a nonzero polynomial on , with degrees at most ; if are selected randomly in , then
We prove this inductively
For , has at most roots so the probability that is one of them is at most
Assume true for
Let be one of ‘s variables, and let be the polynomial comprising the terms of containing the terms of containing but where has been factored out
Then,
If then either all evaluate to 0 or some doesn’t and is a root of the single variable polynomial obtained by evaluating on
The probability that all evaluate to 0 is at most the probability that some evaluates to 0, which by our induction hypothesis is at most
If some doesn’t evaluate to 0, then this reduces to a nonzero polynomial in the single variable , which our base case shows has probability at most
Thus the probability that is a root of the polynomial is at most proving our lemma
This means the probability that our random assignment is a root of is at most , meaning rejects with a probability of at least
Pseudorandom Generators
It’s important to note that true randomness is generally difficult or impossible to obtain
Practical implementations use pseudorandom generators, which are deterministic algorithms whose output appears random
Pseudorandom generators can never be truly random, but may satisfy various statistical tests
Proving an algorithm works equally well with pseudorandom generators is difficult, and may not always hold
Pseudorandom generators have been devised that can produce results indistinguishable from truly random results by any test that operates in polynomial time, assuming that one-way functions exist