Parallel Computation

A parallel computer can perform multiple operations simultaneously, as opposed to sequential computers

We focus on massive parallelism

In theoretical work, the Parallel Random Access Machine is also used, where idealized processors with a simple instruction set interact via shared memory

Here, we look at a simpler model based on Boolean circuits

We take each gate to be an individual processor, and so define the processor complexity of a circuit to be its size, and the parallel time complexity to be its depth

As in our exploration of Circuit Complexity, we look at circuit families to handle variable sized inputs

To make sure this corresponds to the standard PRAM computation model, we also impose a uniformity requirement, so that all circuits in a family can be easily obtained

Definition: A family of circuits $(C_{0}, C_{1}, \dots)$ is uniform if some log space transducer $T$ outputs $⟨ C_{n} ⟩$ when $T$ ‘s input is $1^{n}$

We consider the simultaneous size-depth circuit complexity $(f (n), g (n))$

Example:
Say $A$ is the language over ${0, 1}$ of all strings with an odd number of $1$ s

We can test membership with a standard $x \oplus y$ gate, which can be written with standard operations as $(x \land \neg y) \lor (\neg x \land y)$

To generalize to $n$ variables, we could make a sequence of circuits, which uses $(O (n), O (n))$ size-depth

Or even better, we can construct a binary tree of $\oplus$ gates, for a size-depth complexity of $(O (n), O (lo g n))$

Example:
The Boolean matrix multiplication function takes $2 m^{2} = n$ variables representing $A = {a_{ik}}$ and $B = {b_{ik}}$ , and outputs $m^{2}$ values representing $C = {c_{ik}}$ where $c_{ik} = ⋁_{j} (a_{ij} \land b_{jk})$

The circuit for this function has gates $g_{ijk}$ for each $a_{ij} \land b_{jk}$ , and for every $i, k$ a binary tree of $\lor$ gates to compute the outer expression

Each tree has $m - 1$ $\lor$ gates with $lo g m$ depth, so this circuit has size $O (m^{3}) = O (n^{3/2})$ and depth $O (lo g n)$

We extend this example to calculate the transitive closure of $A$ , the matrix $A \lor A^{2} \lor \dots \lor A^{m}$

This matrix is closely related to the $P A T H$ problem (the canonical $NL$ -complete problem), where we view $A$ as an adjacency matrix of a directed graph

The computation of $A^{i}$ can be represented with a binary tree of size $i$ and depth $lo g i$ , hence computing $A^{m}$ takes $(O (n^{2}), O (lo g^{2} n))$

We make circuits for each $A^{i}$ which adds another factor of $m$ to the size and an additional layer of depth (but not a factor)

Hence the size-depth complexity of transitive closure is $(O (n^{5/2}), O (lo g^{2} n))$

It turns out that many interesting problems have complexity $(O (n^{k}), O (lo g^{k} n))$ for some $k$ , and can therefore be considered highly parallelizable

Definition: Let $NC^{i}$ be the class of languages that can be decided by a uniform family of circuits with polynomial size and $O (lo g^{i} n)$ depth

$NC$ is the class of languages in $NC^{i}$ for some $i$
Functions computed by such families are called NC computable

$NC$ stands for Nick’s class (Nick Pippenger)

A small note is that this is not quite the standard definition for $NC^{1}$

Theorem: $NC^{1} \subseteq L$

On input $w$ of length $n$ , as needed the algorithm can construct the $n$ th circuit and evaluate the circuit with a depth-first search

Since the circuit depth is logarithmic, we can store the path and partial results in logarithmic space

Theorem: $NL \subseteq NC^{2}$

Let $A$ be a language (encoded into ${0, 1}$ ) decided by an $NL$ machine $M$

We construct a uniform circuit family $(C_{0}, C_{1}, \dots)$ for $A$

Note that a configuration of $M$ on $w$ in a log space machine describes the state, the contents of the work tape, and the positions of both the input and the work tape heads, but not $w$ itself

This means that there are polynomially many configurations, and we can make the edges match a computation on $w$ by looking at the transition function and the head position on $w_{i}$

Thus, we can output a circuit that computes the transitive closure of a graph $G$ that matches the computation graph for $M$ and then outputs the position indicating the presence of such a path

This circuit has polynomial size and $O (lo g^{2} n)$ depth (as explained in our earlier example)

That this can be done by a log space transducer is a bit unintuitive, but keep in mind we can be flexible with the output representation of a circuit

Theorem: $NC \subseteq P$

A polynomial time algorithm can just run the log space transducer to generate $C_{n}$ and then simulate it on an input of length $n$

$P = NC$ would imply all polynomial time solvable problems are highly parallelizable, which seem unlikely

Definition: A language $B$ is P-complete if $B \in P$ and every $A$ in $P$ is log space reducible to $B$

Theorem: If $A \leq_{L} B$ and $B \in NC$ then $A \in NC$

This follows because $NC$ circuit families can compute log space reductions

Theorem: $C I RC U I T - V A LU E = {⟨ C, x ⟩ ∣ C is a Boolean circuit and C (x) = 1}$ is $P$ -complete

We’ve already seen how to reduce a language to a circuit when we first introduced circuit complexity

We reduce a language $A$ in $P$ on input $w$ by producing a circuit simulating $A$ and passing in that circuit and $w$ to $C I RC U I T - V A LU E$

We can do this reduction in log space because the circuit has a simple and repetitive structure, so $C I RC U I T - V A LU E$ is $P$ -complete

Binyamin's Notes

Explorer