NP-Completeness

NP-completeness is an important advance in the question of P vs NP

Some problems can be formally linked to the rest of $NP$ , meaning a solution to these problems implies a solution to the rest of $NP$

This is very important practically, because it means we have a means of proving a problem is probably unsolvable in polynomial time

The theory of NP-completeness starts with the satisfiability problem

A Boolean formula is an expression involving Boolean variables and operations, and is satisfiable if some assignment of variables makes the formula evaluate to $1$

$S A T = {⟨ ϕ ⟩ ∣ ϕ is a satisfiable Boolean formula}$

Theorem: $S A T \in P$ $⟺$ $P = NP$

How is this true? We use the concept of reducibility to prove this theorem

Definition: A function $f : Σ^{*} \to Σ^{*}$ is a polynomial time computable function if some polynomial time Turing machine $M$ exists that halts with just $f (w)$ on its tape, when started on any input $w$

Definition: Language $A$ is polynomial time (mapping) reducible to language $B$ , written $A \leq_{p} B$ if a polynomial time computable function $f$ exists, where $w \in A ⟺ f (w) \in B$

We call $f$ a polynomial time reduction

This is an analog to mapping reducibility, and it’s useful because it establishes shortcuts for proving membership to $P$ , and therefore $NP$

Theorem: If $A \leq_{p} B$ and $B \in P$ , then $A \in P$

As an example, we first introduce a slight variation on the satisfiability problem

A literal is a Boolean variable or a negated Boolean variable, a clause is several literals connected with $\lor$ s, and a Boolean formula is in conjunctive normal form (called a cnf-formula) if it comprises of clauses connected with $\land$ s

A cnf-formula is called a 3cnf-formula if all clauses have three literals
$3 S A T = {⟨ ϕ ⟩ ∣ ϕ is a satisfiable 3cnf-formula}$

Theorem: $3 S A T \leq_{p} C L I Q U E$

We convert a 3cnf-formula into a clique problem by creating a node for each item in all $k$ clauses and making an edge two nodes if they are in different clauses and are not negations of each other; then, we test for a $k$ -clique

This tells us that if $C L I Q U E$ is solvable in polynomial time then so is $3 S A T$ , even though they are quite different looking problems

Definition: A language $B$ is NP-complete if $B$ is in $NP$ and every $A$ in $NP$ is polynomial time reducible to $B$

If $B$ is $NP$ -complete and $B \in P$ , then $P = NP$
If $B$ is $NP$ -complete and $B \leq_{p} C$ for $C$ in $NP$ , then $C$ is $NP$ -complete

This is fascinating, but requires us to have a single $NP$ -complete problem before we can connect other problems

Theorem: $S A T$ is $NP$ -complete

This is not a simple proof, since it requires showing that any language in $NP$ is polynomial time reducible to $S A T$ , a fact that looks ridiculous at first glance (at least to me), but makes more sense when you consider that Boolean operations are the basis for electronic circuitry

The reduction is fairly simple conceptually but requires dealing with many details

Proof:
To start, $S A T$ is trivially in $NP$

Let $N$ be a nondeterministic Turing machine that decides $A$ in $n^{k}$ time
A tableau for $N$ on $w$ is an $n^{k} \times n^{k}$ table whose rows are the configurations of a branch of $N$ on $w$

Each row starts and ends with $#$ and we place the head’s state at its location on the tape, offsetting the rest of the tape (due to this configuration, we technically assume the machine decides $A$ in $n^{k} - 3$ )

A tableau is accepting if any row of the tableau is an accepting configuration, so the problem of determining whether $N$ accepts $w$ amounts to determining if an accepting tableau for $N$ on $w$ exists

Let $C = Q \cup Γ \cup {#}$ , where $Q$ is the state set and $Γ$ is the tape alphabet
For each $i, j \in [1, n^{k}]$ and $s \in C$ , we have a variable $x_{i, j, s}$
Each of the $(n^{k})^{2}$ entries of the tableau is called a cell, and its value is designated through the variable $x_{i, j, s}$ which takes on the value $1$

Let’s now produce a formula $ϕ$ that corresponds to an accepting tableau for $N$ on $w$ , which will take the form $ϕ = ϕ_{cell} \land ϕ_{start} \land ϕ_{move} \land ϕ_{accept}$

We first guarantee that each cell has a single value set, through the expression $ϕ_{cell} = 1 \leq i, j \leq n^{k} ⋀ [(s \in C ⋁ x_{i, j, s}) \land (s, t \in C s \neq = t ⋀ (\overline{x_{i, j, s}} \lor \overline{x_{i, j, t}}))]$

Then, we explicitly require the first row to be the starting configuration with $ϕ_{start} = x_{1, 1, #} \land x_{1, 2, q_{0}} \land x_{1, 3, w_{1}} \land \dots \land x_{1, n + 2, w_{n}} \land x_{1, n + 3, ⊔} \land \dots \land x_{1, n^{k} - 1, ⊔} \land x_{1, n^{k}, #}$

We require an accepting configuration $ϕ_{accept} = 1 \leq i, j \leq n^{k} ⋁ x_{i, j, q_{accept}}$

Our final formula $ϕ_{move}$ is the most complex, guaranteeing that configurations are legal by scanning through the cells with a $2 \times 3$ window, which is enough context to guarantee transitions are valid

If the top row of the tableau is the start configuration and every window in the tableau is legal, then each row of the tableau is a configuration that legally follows the preceding one

We omit the precise definition and write $ϕ_{move} = 1 \leq i < n^{k}, 1 < j < n^{k} ⋀ a_{1}, \dots, a_{6} is a legal window ⋁ (x_{i, j - 1, a_{1}} \land \dots \land x_{i + 1, j + 1, a_{6}})$

” $a_{1}, \dots, a_{6}$ is a legal window” is a well-defined boolean expression dependent on the Turing machine’s transition function

$ϕ$ ‘s total size is $O (n^{2 k})$ , which shows we can define $ϕ$ in polynomial time

Thus, we can reduce any Turing machine $N$ on $w$ into a $S A T$ problem, which proves the Cook-Levin theorem, showing $S A T$ is $NP$ -complete

Corollary: $3 S A T$ is $NP$ -complete

We note that we can use some Boolean algebra to create $ϕ$ from the $S A T$ proof in 3cnf, by distributing when necessary and splitting larger clauses like $(a_{1} \lor a_{2} \lor a_{3} \lor a_{4})$ into $(a_{1} \lor a_{2} \lor z) \land (\overset{z}{ˉ} \lor a_{3} \lor a_{4})$

NP-Complete Problems

For reasons not well understood, most naturally occurring $NP$ -problems are either in $P$ or $NP$ -complete

Proving a problem is $NP$ -complete is useful because it tells you a better solution does not exist (for all intents and purposes)

To do this, we look for structures in a language that can simulate the variables and clauses in Boolean formulas, sometimes called gadgets

Our previous theorem now tells us $C L I Q U E$ is $NP$ -complete

A vertex cover of an undirected graph $G$ is a subset of the nodes where every edge of $G$ touches one of the nodes

Theorem: $V ERTEX$ - $CO V ER = {⟨ G, k ⟩ ∣ G is an undirected graph that has a k -node cover}$ is $NP$ -complete

For each variable $x$ in $ϕ$ , we produce an edge connecting two nodes, labeled $x$ and $\overset{x}{ˉ}$

For each clause in $ϕ$ , we produce a node for each variable in the clause, all connected to each other, and all connected to their corresponding variable node as specified above

$G$ has $2 m + 3 l$ nodes, where $ϕ$ has $m$ variables and $l$ clauses, and we let $k = m + 2 l$

We create a cover from a solution of $ϕ$ by taking the true literal variables in the cover, and excluding one true literal in each clause triplet

A little thought shows that we can also do the inverse, so $V ERTEX$ - $CO V ER$ polynomially reduces to $3 S A T$ and is therefore $NP$ -complete

The takeaway in this example is that we are looking to simulate variables and clauses in the problem we’re looking to reduce

Theorem: $H A MP A T H$ is $NP$ -complete

We look to show that for any 3cnf-formula $ϕ$ , we can construct a directed graph $G$ with two nodes, $s$ and $t$ , where a Hamiltonian path exists between $s$ and $t$ $⟺$ $ϕ$ is satisfiable

$ϕ = (a_{1} \lor b_{1} \lor c_{1}) \land \dots \land (a_{k} \lor b_{k} \lor c_{k})$ where each $a, b, c$ is a literal $x_{i}$ or $\overset{x}{ˉ}_{i}$ for $x_{1}, \dots, x_{l}$

The graph looks like a sequence of diamonds representing each variable, where $s$ starts at the start of the diamond of $x_{1}$ and ends at the end of the diamond of $x_{l}$

Each diamond allows for two possible paths that snake through, going from the start to the left represents $x_{i}$ and going to the right represents $\overset{x}{ˉ}_{i}$

Hence, with this construction, a path encodes a selection of $x_{i}$ or $\overset{x}{ˉ}_{i}$ for each $i$

Now how do we encode the clauses? The key is that in the center of each diamond we hold $3 k + 1$ intermediate nodes, corresponding to pairs for each clause and separators between these pairs

To the side of our diamonds, we have one node per clause

These nodes are hooked up to the diamonds such that they can be entered and exited along a path encoding $x_{i}$ only if the clause contains $x_{i}$ , and they can be entered and exited along a path encoding $\overset{x}{ˉ}_{i}$ only if the clause contains $\overset{x}{ˉ}_{i}$

Thus, a satisfiable expression can be converted into a path quite simply, by snaking through the variables according to the selections of $x_{i}$ and attaching each clause to the path at the first variable that satisfies that clause

The separators in between clause pairs guarantee that any Hamiltonian path must be “normal”, i.e. it follows our expected construction and does not take any shortcuts

Theorem: $U H A MP A T H$ is $NP$ -complete

Our construction before no longer works because our connections between a clause and a variable can be satisfied whether a path selects $x_{i}$ or $\overset{x}{ˉ}_{i}$

Instead, we make a reduction that takes a general directed graph $G$ with nodes $s$ and $t$ and constructs an undirected graph $G^{'}$ with nodes $s^{'}$ and $t^{'}$ , such that $G$ has a Hamiltonian path from $s$ to $t$ $⟺$ $G^{'}$ has a Hamiltonian path from $s^{'}$ to $t^{'}$

We replace each node $u$ of $G$ by a triple of nodes $u^{in}, u^{mid}, u^{out}$ in $G^{'}$ , except for $s$ and $t$ which are replaced with $s^{out} = s^{'}$ and $t^{in} = t^{'}$ respectively

Then, we add $u^{out} \leftrightarrow v^{in}$ in $G^{'}$ if $u \to v$ in $G$

It’s clear that a Hamiltonian path in $G$ corresponds to a Hamiltonian path in $G^{'}$ , if we follow the “intended” directions

And furthermore, the only way to form a Hamiltonian path in $G^{'}$ is following the intended direction, because we are forced to hit the middle nodes and therefore must transition from $u^{out}$ to $v^{in}$ until we get to $t^{in}$

Theorem: $S U BSET$ - $S U M$ is $NP$ -complete

We reduce $ϕ$ to an instance of $S U BSET$ - $S U M$

Each number in the problem has a digit for each variable $x_{i}$ and a digit for each clause

Each $x_{i}$ and $\overset{x}{ˉ}_{i}$ transforms to a number with a $1$ in the corresponding digit and a $1$ in each clause that it takes part

In addition, each clause transforms to two identical numbers with a $1$ in the corresponding clause

Then, the problem is to sum to a number with all $1$ s in the digit portion and all $3$ s in the clause portion

Binyamin's Notes

Explorer

NP-Complete Problems

Table of Contents