I had worked on Elasticsearch back in 2015, when it was more known for its text searching capabilities using inverted indexes. As I looked to pick it up again last year for another project, I saw that Elasticsearch had added core support for other data types from text like numbers, IP addresses, geospatial data types, etc.
As I looked to understand the main differences which could allow optimized search over such data types, I stumbled upon BKD trees. Surprisingly, there is not much written about BKD trees apart from a white paper and some blogs. The blog post will look to cover elements leading up to the development of BKD trees and its advantages starting from KD trees.
We will start with the BST (Binary Search Tree) which will be the base for our post. A BST is a binary tree which has lesser elements to its left and greater elements to its right for all nodes. The article will not contain more information regarding the insertion, deletion and searching of elements further since there are numerous sources out there.
BST or other similar implementations of BST like AVL trees, leverage the capability of dividing the search space by 2 at each node during the traversal, thus resulting in a O(logN) search in the best case scenario. It is possible to balance BSTs by rotating the tree with the pivot.
A major flaw or lack of ability with the BST is the ability to deal with multiple dimensions or spaces. For example, if we have a store of latitudes and longitudes, and we are asked to search for a specific set of latitude and longitude. It is easy for us to use BST to search for either the latitude or longitude but not both the elements together since BST is capable of handling only one dimension in its store.
What do we do if we have multiple dimensions or multiple metrics across which we need to run our search queries?
KD or K-Dimensional trees
Th

1378

被折叠的 条评论
为什么被折叠?



