NP-completeness is an important advance in the question of P vs NP

Some problems can be formally linked to the rest of , meaning a solution to these problems implies a solution to the rest of

This is very important practically, because it means we have a means of proving a problem is probably unsolvable in polynomial time

The theory of NP-completeness starts with the satisfiability problem

A Boolean formula is an expression involving Boolean variables and operations, and is satisfiable if some assignment of variables makes the formula evaluate to

Theorem:

How is this true? We use the concept of reducibility to prove this theorem

Definition: A function is a polynomial time computable function if some polynomial time Turing machine exists that halts with just on its tape, when started on any input

Definition: Language is polynomial time (mapping) reducible to language , written if a polynomial time computable function exists, where

We call a polynomial time reduction

This is an analog to mapping reducibility, and it’s useful because it establishes shortcuts for proving membership to , and therefore

Theorem: If and , then

As an example, we first introduce a slight variation on the satisfiability problem

A literal is a Boolean variable or a negated Boolean variable, a clause is several literals connected with s, and a Boolean formula is in conjunctive normal form (called a cnf-formula) if it comprises of clauses connected with s

A cnf-formula is called a 3cnf-formula if all clauses have three literals

Theorem:

We convert a 3cnf-formula into a clique problem by creating a node for each item in all clauses and making an edge two nodes if they are in different clauses and are not negations of each other; then, we test for a -clique

This tells us that if is solvable in polynomial time then so is , even though they are quite different looking problems

Definition: A language is NP-complete if is in and every in is polynomial time reducible to

If is -complete and , then
If is -complete and for in , then is -complete

This is fascinating, but requires us to have a single -complete problem before we can connect other problems

Theorem: is -complete

This is not a simple proof, since it requires showing that any language in is polynomial time reducible to , a fact that looks ridiculous at first glance (at least to me), but makes more sense when you consider that Boolean operations are the basis for electronic circuitry

The reduction is fairly simple conceptually but requires dealing with many details

Proof:
To start, is trivially in

Let be a nondeterministic Turing machine that decides in time
A tableau for on is an table whose rows are the configurations of a branch of on

Each row starts and ends with and we place the head’s state at its location on the tape, offsetting the rest of the tape (due to this configuration, we technically assume the machine decides in )

A tableau is accepting if any row of the tableau is an accepting configuration, so the problem of determining whether accepts amounts to determining if an accepting tableau for on exists

Let , where is the state set and is the tape alphabet
For each and , we have a variable
Each of the entries of the tableau is called a cell, and its value is designated through the variable which takes on the value

Let’s now produce a formula that corresponds to an accepting tableau for on , which will take the form

We first guarantee that each cell has a single value set, through the expression

Then, we explicitly require the first row to be the starting configuration with

We require an accepting configuration

Our final formula is the most complex, guaranteeing that configurations are legal by scanning through the cells with a window, which is enough context to guarantee transitions are valid

If the top row of the tableau is the start configuration and every window in the tableau is legal, then each row of the tableau is a configuration that legally follows the preceding one

We omit the precise definition and write

is a legal window” is a well-defined boolean expression dependent on the Turing machine’s transition function

‘s total size is , which shows we can define in polynomial time

Thus, we can reduce any Turing machine on into a problem, which proves the Cook-Levin theorem, showing is -complete

Corollary: is -complete

We note that we can use some Boolean algebra to create from the proof in 3cnf, by distributing when necessary and splitting larger clauses like into

NP-Complete Problems

For reasons not well understood, most naturally occurring -problems are either in or -complete

Proving a problem is -complete is useful because it tells you a better solution does not exist (for all intents and purposes)

To do this, we look for structures in a language that can simulate the variables and clauses in Boolean formulas, sometimes called gadgets

Our previous theorem now tells us is -complete

A vertex cover of an undirected graph is a subset of the nodes where every edge of touches one of the nodes

Theorem: - is -complete

For each variable in , we produce an edge connecting two nodes, labeled and

For each clause in , we produce a node for each variable in the clause, all connected to each other, and all connected to their corresponding variable node as specified above

has nodes, where has variables and clauses, and we let

We create a cover from a solution of by taking the true literal variables in the cover, and excluding one true literal in each clause triplet

A little thought shows that we can also do the inverse, so - polynomially reduces to and is therefore -complete

The takeaway in this example is that we are looking to simulate variables and clauses in the problem we’re looking to reduce

Theorem: is -complete

We look to show that for any 3cnf-formula , we can construct a directed graph with two nodes, and , where a Hamiltonian path exists between and is satisfiable

where each is a literal or for

The graph looks like a sequence of diamonds representing each variable, where starts at the start of the diamond of and ends at the end of the diamond of

Each diamond allows for two possible paths that snake through, going from the start to the left represents and going to the right represents

Hence, with this construction, a path encodes a selection of or for each

Now how do we encode the clauses? The key is that in the center of each diamond we hold intermediate nodes, corresponding to pairs for each clause and separators between these pairs

To the side of our diamonds, we have one node per clause

These nodes are hooked up to the diamonds such that they can be entered and exited along a path encoding only if the clause contains , and they can be entered and exited along a path encoding only if the clause contains

Thus, a satisfiable expression can be converted into a path quite simply, by snaking through the variables according to the selections of and attaching each clause to the path at the first variable that satisfies that clause

The separators in between clause pairs guarantee that any Hamiltonian path must be “normal”, i.e. it follows our expected construction and does not take any shortcuts

Theorem: is -complete

Our construction before no longer works because our connections between a clause and a variable can be satisfied whether a path selects or

Instead, we make a reduction that takes a general directed graph with nodes and and constructs an undirected graph with nodes and , such that has a Hamiltonian path from to has a Hamiltonian path from to

We replace each node of by a triple of nodes in , except for and which are replaced with and respectively

Then, we add in if in

It’s clear that a Hamiltonian path in corresponds to a Hamiltonian path in , if we follow the “intended” directions

And furthermore, the only way to form a Hamiltonian path in is following the intended direction, because we are forced to hit the middle nodes and therefore must transition from to until we get to

Theorem: - is -complete

We reduce to an instance of -

Each number in the problem has a digit for each variable and a digit for each clause

Each and transforms to a number with a in the corresponding digit and a in each clause that it takes part

In addition, each clause transforms to two identical numbers with a in the corresponding clause

Then, the problem is to sum to a number with all s in the digit portion and all s in the clause portion