http://discuss.fogcreek.com/joelonsoftware1/default.asp?cmd=show&ixPost=22948&ixReplies=15
AVL Trees vs. Red-Black Trees?
Hi,
I worked through the chapter in Introduction to Algorithms by Cormen et
al on red-black trees. One of the problems at the end discusses AVL
trees.
Which, overall, is better to use and why? I didn't see much comparison
between the two, and a Google search didn't yield anything satisfying.
If I'm missing something very obvious, I apologize.
Warren Henning
Tuesday, December 17, 2002
I'm
not sure that it'll be much info for you but trees in Java Collections
are implemented using red-black trees. Which probably doesn't mean a
lot.
Evgeny Goldin
Tuesday, December 17, 2002
So are std::map and set::set in STL.
Ivan-Assen Ivanov
Tuesday, December 17, 2002
Creating a tree which balances nodes will tend to improve the search
time for any random term within the tree.
So the red-black tree is meant to provide a O(log n) time. This is
true so long as it remains balanced, and is true only from the root of
the tree.
Any insertions, unless also balanced will tilt the tree, or sub tree.
To re-balance the tree is supposed to also take O(log n) time, which I
find dubious or at any rate counter intuitive.
Regardless of that, a red-black tree's performance will degrade
depending upon the degree of insertion and rotation about a node takes
place so I would not use it for a dynamic index. In situations such as
a view of fixed data, or a search to fixed data it would be the optimum
layout.
Simon Lucy
Tuesday, December 17, 2002
RB-Trees are, as well as AVL trees, self-balancing. Both of them
provide O(log n) lookup and insertion performance.
The difference is that RB-Trees guarantee O(1) rotations per insert
operation. That is what actually costs performance in real
implementations.
Simplified, RB-Trees gain this advantage from conceptually being 2-3
trees without carrying around the overhead of dynamic node structures.
Physically RB-Trees are implemented as binary trees, the
red/black-flags simulate 2-3 behaviour.
Alex
Tuesday, December 17, 2002
It's
been a long time since any data structure lectures, but my
understanding is that in an AVL tree the difference between the
shortest and longest path to from the root to any leaf is at most one.
In a red-black tree the difference can be a factor of 2.
Both of these give O(log n) for look up, but balancing an AVL tree can
require O(log n) rotations, whilst a red black tree will take at most
two rotations to bring it into balance (though it may have to examine
O(log n) nodes to decide where the rotations are necessary). The
rotations themselves are O(1) operations since you are just moving
pointers around.
You might also want to look up 2-3-4 trees, or more generally B trees
for more balanced tree data structures.
Rob Walker
Tuesday, December 17, 2002
Actually,
AVL trees only guarantee that, for each node in the tree, the heights
of its subtrees differ by at most one. Since the height is defined by
the longest path to a leaf, it makes no guarantees about the ratio
between the shortest path to a leaf and the longest. In fact, it is
possible to generate a tree with the AVL property whose whose
shortest/longest path ratio is arbitrarily bad.
In the worst case, the constant factors for search can be much worse
for an AVL tree than a comparably sized RB Tree. On the other hand, AVL
trees thend to be simpler to implement.
Devil's Advocate
Tuesday, December 17, 2002
"Since
the height is defined by the longest path to a leaf, it makes no
guarantees about the ratio between the shortest path to a leaf and the
longest. In fact, it is possible to generate a tree with the AVL
property whose whose shortest/longest path ratio is arbitrarily bad."
Actually, I think I am wrong about this part. At the very least, I want
to think about it more before I feel comfortable asserting it.
Devil's Advocate
Tuesday, December 17, 2002
If
you require a balanced tree type data structure I recommend taking a
look at Splay trees. Splay trees are cool because they are relatively
easy to implement, and they don't require storing any additional
information at each node.
They offer O(logN) amortised performance. This means that a single
action (insert/update/delete) may require more than O(logN) operations,
but in the long run the average number of operations per action is
O(logN).
You could say that Splay trees adapt to the input data, which I think
is pretty neat.
Peter McKenzie
Tuesday, December 17, 2002
In
fact, we need not compare btw. the longest and the shortest path of a
tree to measure its performance. What we need, instead, is the tree's
longest path, or height, relative to the number of its elements, say N.
We all know that a perfectly balanced tree has (log N) height. It may
be optimal to search, but is however too rigid to use if we wanted to
insert or delete an item.
A Red-Black tree has 2(log N) height, considering the fact that its
longest path can be at most twice as its shortest path.
An AVL has 1.44(log N) height. This can be proved by successively
comparing two subtrees up to root, thus getting the relation btw. its
element count and its height.
As we can see, AVL tends to be more balanced, therefore more rigid,
than Red-Black: during the worst case of deletion, we might encounter
O(log N) rotations in AVL but only 3 in Red-Black. Searching for an
item, or inserting an item, has the same complexity, O(log N).
Of course, the is trade-offs: AVL is much simpler to understand and/or
to implement.
Junghyun Chae
Sunday, March 07, 2004
I
forgot to add: by definition, every AVL is therefore subsets of
Red-Black. One should be able to color any AVL tree, without
restructuring or rotation, to transform it into a Red-Black tree.
Junghyun Chae
Sunday, March 07, 2004
"Searching for an item, or inserting an item, has the same
complexity, O(log N)."
Please forgive me for this mistake. Only searching has a complexity of
O(log N). Inserting for both trees has a constant complexity, say O(1).
Junghyun Chae
Sunday, March 07, 2004
What
is the difference between AVL and RB-Trees? It seems as though AVLs use
a balance factor and RB-Tree uses color to tell if the child is out of
ballance? I don know.
Shack Man
Wednesday, March 24, 2004
Search
time is logarithmic for both. The only possible advantage is that with
a red-black tree, there will be a slightly lower amount of rotations
needed to maintain balance.
Greg Holl
Wednesday, March 24, 2004
Just making it clear:
AVL trees have average height 1.44 * log(N+2) - 0.33
Red Black trees have worst case height of 2.00* log(N+1)
Red Black & AVL trees both have O(log N) search, insert and delete,
min, max, previous, next in terms of nodes.
Somebody said insert was O(1). It is not, as it still take O(log N)
steps to find the correct place to insert. What is O(1) for Red Black
insert, is any rebalancing necessary to restore the trees properties.
At most a double left or right rotation is needed. For a AVL tree,
rebalancing is O(log N). This is where a RB tree gains over AVL. There
is less rebalancing of nodes after an insert.
It is swings and roundabouts. RB trees have faster inserts than AVL
trees but at the expense of slower searches.
Stephen Howe
Monday, June 21, 2004
The
reason it matters that red-black trees require O(1) rebalancing
operations (vs. O(log n) for AVL) comes up in the application to
persistent data structures (Sleator, Tarjan, et al.) A persistent data
structure makes it possible to roll back any algorithm to a previous
step. To do so, it must keep a trail of updates. Any persistent data
structure based on an AVL tree is going to have to store O(log n) times
as many updates as one based on a red-black tree.
For any given application, an AVL tree might be the way to go, but
red-black trees were necessary to solve a theoretical problem. BTW,
persistent data structures are themselves used to solve other
algorithmic problems (e.g. planar point location, Sleator and Tarjan)
so ultimately, the constant-rebalancing is necessary to improve the
speed of certain algorithms.
The textbook treatment of red-black trees often hides the actual reason
that many theorists prefer them, and make it seem like they're just
trendier than AVL trees. But the application to persistence is the main
theoretical justification for using them.
Paul Callahan
Wednesday, August 18, 2004
Recent
Topics
Fog Creek Home