Slide 1: Today we will introduce an important class of problems: those solvable by a nondeterministic program that runs in polynomial time. In the next lecture, we will identify the hardness problems among them, and give evidence that many problems can't be solved quickly. Slide 2: First, let us introduce a few problems on graphs. A clique is a subset of vertices that are pairwise adjacent. For example, {2,3,4} is a clique in the graph here, because these three vertices are all adjacent to each other. An independent set is a subset of vertices that are pairwise non-adjacent. {1,3} is an independent set in the graph here, because there is no edge between node1 and node 3. A vertex cover is a set of vertices that touches all the edges in the graph. {3,4} is a vertex cover in the graph here, because all edges in this graph has at least one end point belonging to the set {3,4}. Slide 3: There are a few natural graph problems related to the concepts we introduced on the previous slide. One question is to find the largest clique in the graph. If the graph is a social network graph, say friendship relationship on Facebook, then a clique may correspond to a community where every member is a friend of any other member. And finding the largest clique corresponds to finding the largest community. In this course, the computational problems we look at are decision problems - questions with a yes/no answer. So we will rephrase the task of finding the largest clique into the following yes/no question: Given a graph G and an integer k, does the graph contain a clique of k vertices? This decision problem is denoted as CLIQUE. Or more formally, CLIQUE consists of yes instances to the decision problem, namely all graphs G and integer k, such that G contains a clique of size k. As we will see later, the decision version of the clique problem is closely related to the search version of the problem (of finding the largest clique). Analogously, we can look at the Independent Set problem: Given a graph G and an integer k, does G contain an independent set of size k? Again, this decision problem corresponds to the search problem of finding the largest independent set in a graph. We will also look at the Vertex Cover problem: Given a graph G and an integer k, does G contain a vertex cover consisting of k vertices? This decision problem corresponds to the search problem of finding the smallest vertex cover in a graph. Note that Clique and Independent Set are maximization problems: we look for the largest clique and independent set. Vertex Cover is a minimization problem: we look for the smallest vertex cover. We just introduced three decision problems. They all share a key property: When trying to solve one of these problems, if I also give you candidate solution, then one can quickly check whether the candidate solution is valid or not. This is the key property of a class of problems we introduce today. Slide 4: Why are candidate solutions verifiable quickly for Clique? Look at this graph here, which is very complicated. If I ask you "Does this graph contain a clique with 5 vertices?", you may not be able to quickly answer this question. But if I also give you a candidate solution, such as these 5 vertices: 1, 5, 9, 12, 14. And now I ask you: "is this candidate solution a clique in the graph?" Just checking whether these 5 vertices form a clique can be done quickly. You just go over every pair of vertices in {1,5,9,12,14} and see whether they are all adjacent to each other. So you look at (1,5), (1,9), (1,12), ..., (12,14), and see whether all these edges appear in the graph. If they are, then you know {1,5,9,12,14} is indeed a clique. We also make sure there are 5 vertices in the subset {1,5,9,12,14}, so that the clique indeed contains 5 vertices. It is possible that someone may give you a wrong candidate solution, such as {1,2,3,4,5}. This subset is not a clique because 2 is not adjacent to 3. By checking whether all pairs of vertices in the candidate solution are adjacent or not, we can quickly rule out this candidate solution. Or someone may give you {1,5,9,12} as a candidate solution. This is again in invalid solution, because we want to know whether the graph contains 5 vertices, but this subset only has 4 vertices. Slide 5: The algorithm we use to verify a candidate solution is a verifier. This algorithm takes as input an instance x of the problem together with a candidate solution s. In the example of CLIQUE, the verifier takes as input the instance x consisting of a graph G and an integer k, such as the graph G on the previous slide and k=5. The verifier also takes as input a candidate solution s, such as {1,5,9,12,14} on the previous slide. If this verifier accepts the instance x and a candidate solution s, then x must be a "yes" instance. For example, if the verifier algorithm for Clique accepts, then the vertex subset must be a clique to the graph containing k vertices. Conversely, if the instance x is a "yes" instance, then there must be some candidate solution causing the verifier to accept. In the case of CLIQUE, if the graph G does contain a clique of size k, a candidate solution causing the verifier to accept is simply the correct solution, i.e. a set of k vertices that form a clique. The verifier runs in polynomial time if it runs in time polynomial in the length of the instance. Here this notation |x| means the length of the instance, that is, the number of bits required to represent the instance. In other words, a polynomial-time verifier can quickly verifier whether a candidate solution is correct or not. Note that whether the verifier is efficient only depends on the length of the instance, not the length of the candidate solution. But implicitly the candidate solution must be relatively short, that is, it must have length polynomial in the size of the instance. This is because our polynomial-time verifier can only read a polynomial number of bits from the candidate solution. And here comes the crucial definition: NP denotes the class of problems that have polynomial-time verifiers. That is, problems whose candidate solutions can be verified in polynomial time. Slide 6: From this definition, the decision problem CLIQUE is in NP. Given in stance to CLIQUE, that is, "Does graph G contain a clique of size k?", together with a candidate solution C, consisting of a subset of vertices in G, here is a verifier algorithm. The verifier makes sure C contains k vertices. It also makes sure all pairs of vertices in C are connected by an edge in G. If all these conditions are satisfied, the verifier accepts (declares the candidate solution as valid), otherwise the verifier rejects. This verifier algorithm runs in time order k square, where k is the number of vertices in the clique we are interested in. Since k is at most the number of vertices in the graph G, this verifier runs in polynomial time. And since we have a polynomial-time algorithm for CLIQUE, this problem is in NP. You should also check that the problems (or languages) INDEPENDENT-SET and VERTEX-COVER are also in NP. Slide 7: If a problem (or language) belongs to the class P, the problem must also belong to the class NP. This is because the verifier algorithm can simply ignore the candidate solution, and solve the problem itself! Such an algorithm does satisfy the definition of a verifier on slide 5. For instance, in the problem PATH, we want to answer the question "In this graph G, is node s connected to node t by a path?" This problem can be solved by Breadth First Search (BFS). And this BFS algorithm that ignores a candidate path between s and t still solves the PATH problem correctly. The million dollar question: If a problem belongs to NP, does it also belong to P? In other words, if candidate solutions to a problem can be verified quickly, does can the problem be solved quickly (without giving a candidate solution)? This question is known as P versus NP. Slide 8: We do not know the answer to this question. In fact, it is one of the seven outstanding questions in mathematics. In 2000, Clay Math Institute identified 7 open problems for the 21st century. P versus NP is the open problem from computer science. There are well-known open problems related to Physics (Yang-Mills existence and mass gap, Navier-Stokes existence and smoothness), and also famous open problems from pure math (Poincare conjecture, Riemann hypothesis). These are really difficult open problems. Among them, only one has been solved so far, the Poincare conjecture. It was solved by Perelman in 2006, relying on a century of work. Mathematicians also recognize the importance and difficulty of the "P versus NP" question. Slide 9: Sometimes this question is phrased as "Is P equal to NP?" Here are the two possibilities. If P equals NP, as shown on the left, then checking candidate solutions is as easy as solving the problem directly, and both can be done in polynomial time. If P is different from NP, then NP strictly contains P, and some problems in NP do not belong to P. As we will show in the next lecture, CLIQUE, INDEPENDENT-SET, and VERTEX-COVER will be those problems in NP but not in P. We don't know whether P equals NP. They are widely believed to be different, because intuitively searching for a valid solution is much harder than simply verifying the correctness of a solution. For example, solving homework problems seems much more difficult than grading them. Slide 10: This phenomenon holds more generally. In Math, searching for a proof is much harder than verifying whether it is correct. For instance, the proof for Fermat's last theorem took over 300 years to find. Once discovered, other mathematicians can verify it in months. In science, given a huge collection of data, finding a theory to explain it is much harder than verifying whether the theory is correct. In engineering, designing solutions to solve problems meeting the constraints is usually harder than verifying whether a proposed solution is feasible. For crimes, searching for the suspects and motivations takes a long time, too. Slide 11: So again, this is the picture we believe to be true: NP strictly contains P. But we aren't sure. Slide 12: Back to the three decision problems we introduced at the beginning of this lecture. We just argued they are all in NP. But these three problems also share another common property: we do not know how to solve any of them efficiently, in polynomial time. Slide 13: Let's say you want to solve CLIQUE. One algorithm is to try all possible subsets of k vertices in the graph, and checking whether the subset is indeed a clique. For example, given the graph on this slide and suppose we want to know whether it contains a clique with 3 vertices. We can try all candidate solutions on 3 vertices, namely {1,2,3}, {1,2,4}, {1,3,4}, {2,3,4}, and checking one by one whether the subset is a clique. Slide 14: How fast is this algorithm? There are "n choose k" many subsets to look at. And for each subset of size k, it takes k square time to check whether all pairs of vertices are adjacent. The overall running time is k square times "n choose k". This number is at least 2^n, when k is n/2. So the algorithm may run in exponential time. And we do not know algorithm for solving clique that runs faster than exponential time. Slide 15: We strongly suspect problems like CLIQUE, INDEPENDENT-SET and so on require exponential time to solve, or at least strictly more than polynomial time to solve. While we do not know how to prove this yet, we can at least relate these problems to each other. In the next lecture, we will show that if one of CLIQUE, INDEPENDENT-SET, VERTEX-COVER, etc can be solved quickly in polynomial time, then all of these problems can be solved in polynomial time. We will tie all these problems together, even though we do not know whether any single one of them can be solved quickly. Slide 16: In the next lecture, we will show that problems such as CLIQUE, INDEPENDENT-SET, VERTEX-COVER are as hard as one another. Also, we will show that they are at least as hard as any problem in NP.