Most previous works in the database literature offers focused on indexing lower dimensional data and on other types of questions besides likeness queries. The lc-d forest was one of the initial structures recommended for indexing multidimensional info for nearest neighbor concerns. Recently, this kind of structure have been used in geographic information systems for questions like similarity queries, and may be useful for similarity indexing. Other strategies such as space filling curves, linear quad trees, and grid documents, do not level well to high proportions, but can be useful for moderate dimensional info.
The R-tree and its most successful variant, the R*-tree, have been completely used generally for indexing high dimensional data inside the database literature. However , as ranges will be stored on each dimension, the index needs more space and time to search in larger dimensionality. For that reason, higher dimensional data typically is planned to a lower dimensional space before indexing in R-trees.
The TV-tree may be the only technique in the data source literature thus far that has been recommended specifically for indexing high-dimensional data. Performance side by side comparisons clearly demonstrate that the TV-tree can be much more efficient compared to the R*-tree. Yet , the superior performance depends upon two presumptions. The first assumption is that dimensions and the feature vectors are ordered by “importance. ” This second assumption is that models of feature vectors in the dataset will certainly tend to exactly match upon dimensions, specifically on the starting “important” measurements.
The first assumption is sensible (if not really desirable) as an appropriate change may be used. The 2nd assumption has not been explicitly stated, Ln the paper, nevertheless a mindful analysis with their algorithms uncovers that all their performance improvement depends upon this. In some applications, the original characteristic vectors include a small pair of discrete amounts, so the second assumption really does hold.
Unfortunately, this second presumption will normally not end up being true in visual details systems, in addition to many other applications. Features during these applications are normally real-valued, so that chances of precisely matching on dimensions will be negligible. In such a case, the TV-tree reduces to the index upon only starting dimensions. Little changes in the proposed algorithms should certainly allow the TV-tree to be a modest improvement in the R*-tree during these applications. Nevertheless , in this paper, we will refer to the R-tree (and variants) because the best recently known structure for similarity indexing as it has confirmed itself much more similarity indexing applications.
There is also related work beyond the database literature. In the information retrieval literature, work has been performed on bunch fides that proposes constructions similar to the SS-tree. In the photo database community, a stationary indexing structure based on Kohonen nets was suggested. Addititionally there is related operate the computational geometry and vector quantization literature.