The space of partial solutions is represented as a graph. The nodes are situations, and the edges link together situations that are one step away. Typically the graph is very large or even infinite, and is not completely constructed at the beginning, but only as far as needed: when the search reaches a k-promising situation (v_0, ..., v_{k - 1}), the algorithm generates the adjacent k + 1 vectors (v_0, ..., v_{k - 1}, v_k). The graph is thus not explicit but implicit.
Backtracking is a method for searching such large implicit graphs. It tries to minimize unnecessary work by never proceeding a path from a node that corresponds to a vector which is not promising.
procedure backTrack(int k, int* v) { /* v is a k-promising vector */ if (v is a solution) printVector(v); /* If we only want to find one solution, we can stop here. If the extension of a solution never leads to a further solution, the following for loop can be skipped. */ for (each (k + 1)-vector w = (v_0, ..., v_{k - 1}, v_k)) if (w is (k + 1)-promising) backTrack(k + 1, w);
Depending on the problem the tests and the graph are different. The definition of k-promising may also include extra conditions, to guide the search, for example to prevent that equivalent vectors are considered more than once.
The most dumb idea is to try all subsets of size n. This implies testing (n^2 over n) ~= n^n possibilities, quite outrageous already for k = 8.
Slightly better is to realize that queens should be in different columns. So, a solution is given by a vector (v_0, ..., v_{n - 1}) n which v_i indicates the column in which the queen in row i is positioned. If we now also realize that all v_i must be different, then the number of tests is reduced to n! ~= (n / e)^n. Substantially better, but still not good at all.
A shortcoming of these methods is that we first generate a complete solution, and then test whether it is feasible. Many solutions with a common impossible prefix are generated and tested. Here backtraking comes in and brings large savings (which are very hard to quantify other than by experiment). The program might look as follows. It is called with k == 0.
void queenPlacement(int k, int n, int v[n]) { int i; boolean columns[n]; boolean norm_diag[2 * n - 1]; boolean anti_diag[2 * n - 1]; if (k == n) printVector(n, v); else { for (i = 0; i < k; i++) columns[i] = 0; for (i = 0; i < k; i++) columns[v[i]] = 1; for (i = 0; i < 2 * k - 1; i++) norm_diag[i] = anti_diag[i] = 0; for (i = 0; i < k; i++) { norm_diag[v[i] - i + n - 1] = 1; anti_diag[v[i] + i] = 1; } for (i = 0; i < n; i++) if (! columns[i] && ! norm_diag[i - (k + 1) + n - 1] && ! anti_diag[i + (k + 1)]) { v[k + 1] = i; queenPlacement(k + 1, n, v); } } }The "else" is there because the extension of a solution is not giving further solutions (though this would also have been detected by the tests further down). In a real implementation, better performance is achieved by passing even the boolean arrays as arguments. In that case the extra column and diagonals must be added to the sets before the recursion and taken out again after it.
The book gives some numbers for n = 12. 12! = 479001600. The whole backtrack tree has size 856189 and the first solution is found after testing 262 nodes.
Because this is an optimization problem, the notion of solution is not entirely adequate, backtracking is rather designed for decision problems, in which one should answer questions of the type "is there a feasible solution?" or "is there a feasible solution achieving a value of at least V?". If we would ask the latter question, we could output a selection of objects as soon as sum_{j = 0}^{k - 1} v_{i_j} >= V. In this case the extension of a solution might again be a solution.
The search starts with the empty set, which is a 0-promising set. Then the algorithm adds one element and tests whether the weight is below W. If yes, it recurses and tries to add a second element. In this way it continues until there are no further elements to add (given the increasing ordre of the indices), or adding an element would violate the weight limit.
A nice, non-trivial example is given by a set of six objects with weights 1, 2, 3, 4, 5, 6 and values 2, 8, 10, 10, 20, 25, respectively. A greedy algorithm would probably start picking the object with weight 6, because it has the largest ratio v / w. However, this leads to a solution of value at most 37 ({1, 2, 6} or {1, 3, 6}) while the best choice is {2, 3, 5} with value 38. The backtracking tree (draw it!) has 32 nodes in total, so even for such a tiny problem with a quite high weight bound there is already a considerable saving compared to testing all subsets (even though each of them can be generated and tested somewhat faster).