Input description: A set S of n points in and a query region Q.
Problem description: Which points from S lie within Q?
Discussion: Range search problems arise in database and geographic information system (GIS) applications. Any data object with d numerical fields, such as person with height, weight, and income, can be modeled as a point in d-dimensional space. A range query describes a region in space and asks for all points or the number of points in the region. For example, asking for all people with income between $0 and $10,000, with height between 6'0'' and 7'0'', and weight between 50 and 140 lbs. defines a box containing people whose body and wallets are both thin.
The difficulty of a range search problem depends on several factors:
If you are querying against a nonconvex polygon, it will pay to partition your polygon into convex pieces or (even better) triangles and then query the point set against each one of the pieces. This is because testing whether a point is inside a convex polygon can be done more quickly and easily than for arbitrary polygons. Algorithms for such convex decompositions are discussed in Section .
A data structure for efficiently answering such aggregate range queries can be based on the dominance ordering of the point set. A point x is said to dominate point y if y lies both below and to the left of x. Let DOM(p) be a function that counts the number of points in S that are dominated by p. The number of points m in the orthogonal rectangle defined by and is given by
The second additive term corrects for the points for the lower left-hand corner that have been subtracted away twice.
To answer arbitrary dominance queries efficiently, partition the space into rectangles by drawing a horizontal and vertical line through each of the n points. The set of points dominated is identical for each point in each rectangle, so the dominance count of the lower left-hand corner of each rectangle can precomputed, stored, and reported for any query point within it. Queries reduce to binary search and thus take time. Unfortunately, this data structure takes quadratic space. However, the same idea can be adapted to kd-trees to create a more space-efficient search structure.
Implementations: LEDA (see Section ) provides excellent support for maintaining planar subdivisions in C++. In particular, it supports orthogonal range queries in time, where n is the complexity of the subdivision and k is the number of points in the rectangular region.
Ranger is a tool for visualizing and experimenting with nearest-neighbor and orthogonal-range queries in high-dimensional data sets, using multidimensional search trees. Four different search data structures are supported by Ranger: naive kd-trees, median kd-trees, nonorthogonal kd-trees, and the vantage point tree. For each of these, Ranger supports queries in up to 25 dimensions under any Minkowski metric. It is available in the algorithm repository.
A bare bones implementation in C of orthogonal range search using kd-trees appears in [GBY91]. Sedgewick [Sed92] provides code fragments of the grid method for orthogonal range search in C++. See Section for details on both of them.
Notes: Good expositions on data structures with worst-case performance for orthogonal-range searching [Wil85] include [PS85]. An exposition on kd-trees for orthogonal range queries in two dimensions appears in [PS85]. Their worst-case performance can be very bad; [LW77] describes an instance in two dimensions requiring time to report that a rectangle is empty.
The problem becomes considerably more difficult for nonorthogonal range queries, where the query region is not an axis-aligned rectangle. For half-plane intersection queries, time and linear space suffice [CGL85]; for range searching with simplex query regions (such as a triangle in the plane), lower bounds preclude efficient worst-case data structures. See [Yao90] for a discussion.
Related Problems: Kd-trees (see page ), point location (see page ),