# VLFeat教程kd-tree forest      給文章打分！ Loading...

VLFeat implements the randomized kd-tree forest from FLANN.
This enables fast medium and large scale nearest neighbor queries among high dimensional data points (such as those produced by SIFT).

# Introduction

A kd-tree is a data structure used to quickly solve nearest-neighbor queries. Consider a set of 2D points uniformly distributed in the unit square:

```  X = rand(2, 100) ;
```

A kd-tree is generated by using the `vl_kdtreebuild` function:

```  kdtree = vl_kdtreebuild(X) ;
```

The returned `kdtree` indexes the set of points `X`.
Given a query point `Q`, the function `vl_kdtreequery` returns
its nearest neighbor in `X`:

```  Q = rand(2, 1) ;
[index, distance] = vl_kdtreequery(kdforest, X, Q) ;
```

Here `index` stores the index of the column of `X` that
is closest to the point `Q``distance` is
the squared euclidean distance between `X(index),Q`.

A kd-tree is a hierarchal structure built by partitioning the data recursively along the dimension of maximum variance. At each iteration the variance of each column is computed and the data is split into two parts on the column with maximum variance.
The splitting threshold can be selected to be the mean or the median (use the `ThresholdMethod` option of `vl_kdtreebuild`).  kd-tree partitions of a uniform set of data points, using the mean (left image) and the median (right image) thresholding options of `vl_kdtreebuild`.
On the bottom right corner a query point is marked along with the ten closest neighbors as found by`vl_kdtreequery`.
Figure generated by `vl_demo_kdtree`.

# Querying

`vl_kdtreequery` uses
a best-bin first search heuristic. This is a branch-and-bound technique that maintains an estimate of the smallest distance from the query point to any of the data points down all of the open paths.

`vl_kdtreequery` supports
two important operations: approximate nearest-neighbor search and k-nearest neighbor search. The latter can be used to return the k nearest neighbors to a given query
point `Q`. For instance:

```[index, distance] = vl_kdtreequery(kdtree, X, Q, 'NumNeighbors', 10) ;
```

returns the closest 10 neighbors to `Q` in `X` and
their distances, stored along the columns of `index` and `distance`.

The `MaxComparisons` option is used to run an ANN query. The parameter specifies how many paths in the best-bin-first search of the
kd-tree can be checked before giving up and returning the closest point encountered so far. For instance:

```[index, distance] = vl_kdtreequery(kdtree, X, Q, 'NumNeighbors', 10, 'MaxComparisons', 15) ;
```

does not compare any point in `Q` with more than 15 points in `X`.    Finding the 10 approximated nearest neighbors for increasing values of the`MaxComparisons` parameter.
Note that at most `MaxComparisons` neighbors can be returned (if more are requested, they are ignored). Figure generated by `vl_demo_kdtree_ann`.

# Randomized kd-tree forests

VLFeat supports constructing randomized forests of kd-trees to improve the effectiveness of the representation in high dimensions. The parameter `NumTrees` of `vl_kdtreebuild` specifies
how many trees to use in constructing the forest. Each tree is constructed independently. Instead of always splitting on the maximally variant dimension, each tree chooses randomly among the top five most variant dimensions at each level. When querying, `vl_kdtreequery` runs
best-bin-first across all the trees in parallel. For instance

```  kdtree = vl_kdtreebuild(X, 'NumTrees', 4) ;
[index, distance] = vl_kdtreequery(kdtree, X, Q) ;
```

constructs four trees and queries them.    The parameter `NumTrees` tells `vl_kdtreebuild` to
construct a number of randomized kd-trees. Figure generated by `vl_demo_kdtree_forest`.
from:http://www.vlfeat.org/overview/kdtree.html