**Acknowledgements** All ideas in this post are due to
@mabokhamis and
@hung-q-ngo; I’m
just writing them down so that I don’t have to bother them when I
forget.

In an earlier post I talked about how to use the AGM bound to bound the run time of the Generic Join algorithm. It turns out we can sometimes bound the run time of recursive queries as well. Consider the transitive closure program in Datalog:

```
P(x, y) :- E(x, y).
P(x, y) :- E(x, z), P(z, y).
```

During semi-naive evaluation, we will compute a delta relation dP every iteration as dP_{i+1} = E \bowtie dP_i - P_i. The set difference can be done in linear time, so we will focus on the join only.

If we look at *all* iterations, we’ll be computing E \bowtie dP_0 \cup E \bowtie dP_1 \cup \cdots \cup
E \bowtie dP_{k-1}. Factoring out `E`

, we get E \bowtie (dP_0 \cup \cdots \cup dP_{k-1}) = E
\bowtie P_k, where P_k is the
relation P at iteration k. Since P_k
must be contained in the final output O, i.e. |P_k| \leq
|O|, at this point we can say the whole semi-naive algorithm runs
in O(|E|\times|O|). But turns out we
can do better.

To reduce clutter, I’ll write P for
P_k. Now take a closer look at the join
`Q(x, y, z) :- E(x, z), P(z, y)`

. For the moment, let’s also
add O into the join to make a triangle
`Q'(x, y, z) :- E(x, z), P(z, y), O(x, y)`

. With this, we can
use the AGM bound to get O(|E|^\frac{1}{2}
|P|^\frac{1}{2} |O|^\frac{1}{2}) \leq O(|E|^\frac{1}{2} |O|),
which is a tighter bound than the above. I now claim we can also use
this bound for Q. The key is that the
execution of Generic Join for Q' is
exactly the same as that for Q.

Consider the variable ordering `z, x, y`

. The GJ loop for
Q' is the following:

```
for z in E.z ^ P.z
for x in E[z].x ^ O.x
for y in P[z].y ^ O[x].y
output(x, y, z)
```

Since O is the final output of the transitive closure program, we have the invariant \forall x, y, z : (x, z) \in E \wedge (z, y) \in P \implies (x, y) \in O. Therefore, we can remove the intersections with O on both inner loops, and the run time would remain the same since O does not filter out any value. With O removed, the nested loop now computes exactly Q, taking the same time as Q'.