Thursday, 1 May 2014

Tree Structures

Definition of Tree:











Some related definitions:
  • Root - the top most node in a tree.
  • Parent - node that has a child.
  • Siblings - nodes with the same parent.
  • Leaves - nodes with no children.
  • Internal nodes - nodes with at least one child.
  • Degree - number of sub trees of a node.
  • Edge - connection between one node to another.
  • Level - The level of a node is defined by 1 + the number of connections between the node and the root
  • Height - The height of a node is the length of the longest downward path between the node and a leaf
  • Forest - A forest is a set of n ≥ 0 disjoint trees.
Summary:
  1. Binary Tree: each node has at most two children
  2. Binary Search Tree: The left sub-tree contains only nodes with keys less than the parent node and The right sub-tree contains only nodes with keys greater than the parent node.
  3. AVL Tree: the heights of the two child subtrees of any node differ by at most one; if at any time they differ by more than one, rebalancing is done to restore this property.
  4. Red–black: Same number of black nodes.
  5. B-tree/B+-Tree: More than 2 children (100), 50% full, balanced.
  6. R-Tree: 2-D B+-Tree, Split and Merge More complex.
  7. binary heap tree: Use Array in the underline.

1. Binary Tree
A tree that each node has at most two children (referred to as the left child and the right child)

Example:
binary search trees
binary heaps (Maximum Heap Tree or Minimum Heap Tree)

2. Binary Search Tree
  • Each node has no more than two child nodes. 
  • Each child must either be a leaf node or the root of another binary search tree. 
  • The left sub-tree contains only nodes with keys less than the parent node
  • The right sub-tree contains only nodes with keys greater than the parent node.
Operations:
  • Insert
  • Delete
  • Query (Note that there is no guarantee for performance)
  • Sort (preorder or postorder)
  • Find Successor
  • Find predecessor
2.1: Find Successor
1). The node has a right subtree.
If the given node has a right subtree then by the BST property the next larger key must be in the right subtree. Since all keys in a right subtree are larger than the key of the given node, the successor must be the smallest of all those keys in the right subtree.

2). The node does not have a right subtree.
In this case we will have to look up the tree since that's the only place we might find the next larger key. There is no point looking at the left subtree as all keys in the left subtree are guaranteed to be smaller than the key in the given tree.


3). In this case, there is no Successor.

2.2:Tree traversal 
  1. Depth-first and 
  2. Breadth-first
There are three types of depth-first traversal: pre-order,[1] in-order,[1] and post-order.[1] For a binary tree, they are defined as operations recursively at each node, starting with the root node as follows: 

Pre-order
  1. Visit the root.
  2. Traverse the left subtree.
  3. Traverse the right subtree

In-order (symmetric) (It is also the sorting order)

  1. Traverse the left subtree.
  2. Visit the root.
  3. Traverse the right subtree.

Post-order

  1. Traverse the left subtree.
  2. Traverse the right subtree.
  3. Visit the root.

3: Self-balancing binary search tree
(This is maybe the most important basic of tree structures, which is efficient for point query and range query)

Automatically keeps its height (maximal number of levels below the root) small in the face of arbitrary item insertions and deletions.

Note that:
any Self-balancing binary search tree can be used to support:
1) Sorting
2) Insert and Delete
3) Find the next larger or the previous smaller. 

3.1: AVL Tree

the heights of the two child subtrees of any node differ by at most one; if at any time they differ by more than one, rebalancing is done to restore this property.

Lookup, insertion, and deletion all take O(log n) time in both the average and worst cases, where n is the number of nodes in the tree prior to the operation. Insertions and deletions may require the tree to be rebalanced by one or more tree rotations.

It has to be ordered (For Sure)









In case of insertion and deletion, the AVL property might be worng. Thus, rotation is required: There are 4 types of rotation, LL, LR, RL and RR.

3.2: Red–black tree











The balancing of the tree is not perfect but it is good enough to allow it to guarantee searching in O(log n) time, where n is the total number of elements in the tree. The insertion and deletion operations, along with the tree rearrangement and recoloring, are also performed in O(log n) time.

4. Other Popular Trees:
4.1: B-tree
B-tree is a tree data structure that keeps data sorted and allows searches, sequential access, insertions, and deletions in logarithmic time. The B-tree is a generalization of a binary search tree in that a node can have more than two children (Comer 1979, p. 123). Unlike self-balancing binary search trees, the B-tree is optimized for systems that read and write large blocks of data. It is commonly used in databases and filesystems.

The B-tree uses all of the ideas described above. In particular, a B-tree:
  • keeps keys in sorted order for sequential traversing
  • uses a hierarchical index to minimize the number of disk reads
  • uses partially full blocks to speed insertions and deletions
  • keeps the index balanced with an elegant recursive algorithm
In addition, a B-tree minimizes waste by making sure the interior nodes are at least half full. A B-tree can handle an arbitrary number of insertions and deletions.















Insertion


Deletion

4.2: R-Tree
Expanded from B-Tree, R-trees are tree data structures used for spatial access methods, i.e., for indexing multi-dimensional information such as geographical coordinates, rectangles or polygons.



















4.3: binary heap (for Heap Tree)
Note that it is not a BST.
   
It has a very beautiful property: 
Left Child: 2 * x + 1 and right child: 2 * x + 2

 


4.4: Suffix Trie
Build a Suffix Trie for T and the support to do with S(m):
1) Sub-string search
2) Suffix Search
3) How many times the sub-string occurs?

Space Cost: O(T^2)
Time Cost of build: O(T^2)
Time Cost of search: O(m)

4.5: Suffix Tree (Compressed Suffix Trie)
(Nodes) Space Cost: O(2*T) 
(All) Space Cost: O(2*T) 













The label size is still O(T^2), so the next change is:












Use (offset, length);
Another thing is that each leaf node has a number, indicating the location of the suffix.












Note:
It can be used for other operations:
1) longest common sub-string.

4.5 + - generalized suffix tree:

Combine different strings together and build a suffix tree.





No comments:

Post a Comment