Topological sort and Connected components in directed graph

Since I’m reading about algorithms, I figured I’d review topological sort and Kosaraju’s algorithm as well.

To sort vertices in a directed graph topologically is to arrange them into a sequence so that no vertex has an edge pointing to a vertex earlier in that sequence. Of course, this requires the graph to be DAG (directed acyclic graph).

The algorithm is simply to find the reverse postorder in DFS (depth first search) of the vertices. This is implemented by pushing a vertex into a stack after its DFS call completes. The order in which vertices are popped out of the stack is the graph’s reverse postorder. The proof is described below:

We need to prove that for every edge from s to v, s would appear before v in reverse postorder. There are several conditions:

  1. v is called and returned before the call to s. In this case, s is pushed to the stack later than v, hence s appears before v.
  2. v is called later than s but returned before s returns. The former conclusion also holds.
  3. v is called later than s has returned. This case is impossible. Since s has an edge to v, v has to be called somewhere earlier than s returns.

Therefore, our algorithm can give us a topological sort.

Next, we use a similar way to look for connected components in DAG. A connected component is defined as a subgraph in which all vertices have a way to reach each other. This algorithm is not so obvious: first get a topological order of the reverse of the graph G – Gr, then use the order to run DFS in G. All vertices reached by one call belong to a connected component, so we can mark them.

The proof also needs some brain.

Consider the second step of the algorithm. First, it is obvious every connected component edge is called in the second step, duh. Then we prove from the other side. Say we have a vertex v that is reached by calling DFS on s. We’re gonna prove there is an edge from v to s as well.

From the way we generated the order, s appears before v in reverse postorder of Gr. Now consider Gr, we have several conditions:

  1. the call to v ends before the call to s starts. This is impossible because v has an edge to s in Gr.
  2. the call to v starts after the call to s, and ends before s ends. This means there is an edge from s to v in Gr, which then proves that in G there is an edge from v to s. Connected component it is.
  3. the call to v starts after the call to s ends. Impossible because s appears before v in reverse postorder of Gr.

Therefore, Kosoraju’s algorithm is correct in finding connected components of DAG.

I appreciate the brilliance of the algorithms. It’s interesting because they don’t seem so hard to me now but I spent days to understand the proof the first time I read about them. I’m reading divide-and-conquer next. I’m hoping all the complicated recursions don’t play with my mind too much.

Exchange argument in Greedy Algorithms

Recently I started reading this algorithm textbook for my CS330 class next semester – Algorithm Design by Jon Kleinberg and Eva Tardos. Today I am going to record my thoughts on Chapter 4.3 – Exchange argument.

Exchange argument is actually the name of a method used to prove a type of greedy algorithm. The question that we focus on is as follows: Suppose there is a computer with limited cache memory, and we need to process a sequence of data from the main memory. What is the best order to “evict” certain data and retrieve data from the main memory so that the total number of times we evict (misses) is the least?

According to Les Belady, the best way to do this is to evict the “farthest-in-the-future” item in the cache memory. For example, consider a sequence like a b c d a d e a d b c and a cache memory of size 3 with initially a b c in it. The best plan is to evict c at step 4 becuase it is “farthest-in-the-future”.

The proof uses the method of “exchange argument”, in which we first suppose the best way to do this, say S, and our plan, say S*, and we revise our plan so that it eventually is identical to S, in the meantime follows our “farthest-in-the-future” rule.

Assume the first j steps of S and S* are identical, so now we have the same sequence of data in cache memories. For the (j + 1)st step, say S evicts the item e but S* evicts f since f is farthest. We would like S* to “come back” to S as soon as possible and avoid adding any more misses. Now consider the (j + 2)nd request.

If there is a request to item g which is not equal to either e or f:

If S evicts any item other than f, we would simply let S* also evict that item. After this step two cache memories would be identical, job done. Else, S evicts f, we would let S* evict e. In this way two cache memories are again identical. Hooray.

Else, there is a request to e:

Note that S*’s cache memory now contains e. If S evicts f, two cache memories are now equal. If S evicts some f’ other than f instead, we would also let S* evict f’, and bring f, then again voila.

There can’t be a request to f before to e because we already know f is farther than e.

The proof above is not complete, but the spirit is shown: first assume we have the best solution, and try to prove that our solution is the best by closing the gap. Quite amazing.

So I came back to this server again…

I used to build my personal website on this server with handwritten html and css. Later I found out it is better to use WordPress. After all I hate calculating pixels and designing layout. WordPress helps me with all that.

I requested a VM from Duke OIT and installed WordPress there. I wrote a few Tech blogs about iOS and just when I thought I could use the service forever and ever, I could no longer access my VM. I don’t know if that’s because I’m back in China but even with VPN I am unable to get on.

Enough said, I came back. I spent some time this morning installing WordPress and I am planning to write here. I’m sharing this server with two of my best friends and talented computer scientists so security is not a problem.

Off we go.