Join Trees, the "Hard" Way

Given a conjunctive (datalog) query Q, the hypergraph of Q has a vertex v_x for each variable x, and a hyperedge \{v_x, v_y, v_z, \ldots \} for each atom R(x, y, z, .\ldots). For example, the following hypergraph (credit to Zheng and Wim) represents the query below it.

Q(a,b,c,d,e,f,g,h) :- R(a,c,d), S(b,c,d), W(d,h), U(e,g,h), V(f,g,h).

A query is \alpha-acyclic if it has a join tree: there is a tree node for each atom, and for every variable x, the nodes containing x are connected. One join tree for the query looks like this:

      W(d,h)      
     /      \     
R(a,c,d)  U(e,g,h)
    |        |    
S(b,c,d)  V(f,g,h)

An equivalent definition requires that for any two nodes containing a variable x, the path between the nodes must all contain x. It's not hard to see this is the same as requiring all nodes containing x to be connected.

The join tree ensures that during the semijoin reduction passes of the BCY algorithm, we need only reduce a parent with its children (or vice versa), because all attributes shared between the parent and its descendants are all contained in the children.