BTree Performance


BTree Performance

These classes are quite efficient. The table below shows the times taken (in milliseconds) for the standard test (description of standard test follows table) on a Pentium III 500 with 256Mb RAM running Windows '98.

Type Smalltalk Express Dolphin Smalltalk IBM VisualAge for Smalltalk Java JDK 1.2
Addition 6.16 5.54 3.11 5.68
Deletion 6.16 5.69 3.21 5.72
Fetch 3.34 2.42 1.89 3.68

The standard test consists of 10000 random additions of entries with an average index length of 30 characters and an average data length of 52 characters. This is then followed by a further 20000 random operations divided randomly and roughly equally among additions, deletions and fetches.

Cacheing

By allowing the index and data pages to be cached you can considerably improve the performance of the Btrees. The following table shows the results of the above test if all the index and data pages in the BTree are cached.

Type Smalltalk Express Dolphin Smalltalk IBM VisualAge for Smalltalk Java JDK 1.2
Addition 1.66 1.94 0.80 1.61
Deletion 1.06 1.27 0.67 0.93
Fetch 0.47 0.61 0.56 0.62

Key Compression

Further space efficiency is provided by compressing the keys. A number of algorithms are available to do this depending on the nature of the keys. The method used here is to substitute the common key prefix in subsequent keys e.g. if you have three keys in the one block -

ACCOUNT_BLOGGS_ANNE
ACCOUNT_BLOGGS_JOE
ACCOUNT_BLOGGS_JOE JR

These are represented in the file as -

+ACCOUNT_BLOGGS_ANNE
+JOE
+ JR

Where the ASCII value of the character at + indicates the number of common characters shared with the previous record (up to a maximum of 255). If + is 0 then there is no compression. The first index record in a page is never compressed. Compression introduces an overhead to the manipulation of data in the BTree but shows definite advantages in very large trees where there is sufficient commonality of index prefix to allow a reduction in the number of index levels and hence access time. It also allows the programmer to provide sensible keys to the data without worrying unneccessarily about the space overhead