July 28
Part 2: The Swiss Army Knife of Data Structures ... in C#
Update:
In a personal communication with Eric Lippert it was revealed that, in his words:
"I worked up a full implementation as well but I decided that it was too complicated to post in the blog. What I was really trying to get across was that immutable data structures were possible and not that hard; a full-on finger tree implementation was a bit too much for that purpose."
I am happy to correct my post and acknowledge that a previous C# implementation did exist. Hopefully, Eric will publish his implementation.
So, the work discussed here is most probably the first published Finger Tree implementation in C#.
In
Part 1 I described my experience in implementing the Finger Tree data structure in C#.
Here is a comparison of the performance of some common operations on sequences of length 100,000 as implemented by the C# Finger Tree and the .NET 3.5 Enumerable extension methods (applied on a List<UInt32> object).
Disclaimer: This comparison was carried out on my 4-year old PC, having CPU speed of 3GHz and RAM of 2GB and running Windows XP. Results will largely vary on other configurations, depending on CPU and memory speed, available memory, cache size, ..., etc. Another factor to consider is the time necessary for jitting the .NET IL code on initial method execution. However the observed ratios (speedup) should remain closely the same. Below is a table, summarizing the results:
Performance Comparison:
FSeq vs. .NET Collection with 100,000 elements
IEnumerable<uint> vs. FSeq
|
Method
|
.NET Av. Time
|
FSeq Av. time
|
FSeq SpeedUp
|
|
Concat()
|
5268 µs
|
56 µs
|
94.07
|
|
ElementAt(),
[ ]
|
1 µs
|
46 µs
|
---
|
|
Skip()
|
2022 µs
|
49 µs
|
41.27
|
|
Take()
|
1513 µs
|
50 µs
|
30.26
|
|
Reverse()
|
4859 µs
|
93 µs
|
52.25 |
As seen from this comparison,
using a Finger Tree as a general sequence data structure in .NET may speed up sequence operations (roughly) 30 to 94 times (extreme cases to more than 100 times). The only exception is element access, and only in the case of an array (note that the
List<T> class in .NET is actually an ArrayList). If the instance on which the
Enumerable methods are called is of some other type, say
LinkedList<T>, then the speedup for all operations would be even bigger. Because the tested methods of
Enumerable are using deferred execution, it was necessary to add an additional operation that references an item located the end of the result-sequence.
Another problem I had to deal with was how to test for "typical" or "average" operations. In other words, how to make the tests representative. I ended up with the decision to include in the Concat(), Insert(), Remove() and Substr() tests strings with a maximum variety of lengths. What in contrast I call extreme cases is when the sequences/strings to be inserted/concatenated/removed/extracted are all with the same maximum length (close to the length of the first collection/string, on which the operation is performed -- in this case close to 100,000)
Another interesting comparison can be made if a string is represented as a sequence of characters. As the table below shows, for long strings the potential speedup of using a Finger-tree-based string implementation is 3 to 5 times (extreme cases 5 to 25 times):
Performance Comparison:
FString vs. .NET string with 100,000 elements
string vs. FString
|
Method
|
.NET Av. Time
|
FString Av. time
|
FSeq SpeedUp
|
|
Concat(), +
|
655 µs
|
129µs
|
5.08
|
|
Insert()
|
653 µs
|
220µs
|
2.97
|
|
Remove()
|
176 µs
|
55 µs
|
3.20
|
|
Substring()
|
167 µs
|
54 µs
|
3.09 |
The source code of the performance tests is
here. Take notice that in its present state this code doesn't produce any output at all. To see the timing results, run under the VS2008 Debugger and put breakpoints at suitable locations.
In the third part of my Finger-Tree series I will present a brief performance comparison between a Finger-Tree-based sequence and XPath sequences, as implemented in existing XSLT 2.0 processors.