Monday, 08 September 2008
PicoSearch

Dictionaries and Collections

This article appeared in the April 1999 issue of EXE, and discusses the features of both the Dictionary and Collection objects. It also provides some personal commentary on the Bruce McKinney "Declaration of Independance".


This month I’m going to take a look at the Collection object that was introduced as a new feature in Visual Basic 4, and then at the Dictionary object that was subsequently introduced with Visual Basic 6. This latter object was actually already available because it is part of the Visual Basic Scripting Edition, rather than part of the Visual Basic for Applications library as in the case of the former. The VBScript library was originally introduced with Microsoft Internet Explorer 3.0, and was then superseded with version 2 as part of Microsoft IIS version 3.0. A more widespread version 3 was made available with Internet Explorer 4, IIS 4, and also with Microsoft Outlook 98. The newest edition, now running at version 4, has been shipped along with Visual Studio 6.0.

Collection object

The Visual Basic documentation defines a collection as "an ordered set of items that can be referred to as a unit". This basically means that it’s an object that can store multiple items of data. A collection is the next step up in complexity from an array; whereas an array needs formal management by the programmer in terms of dimensioning and resizing, a collection automatically handles its own storage requirements.

Each element within a collection has two main attributes, a key and the actual data item itself. One rule that is imposed here is that the key must be a string value, and of course it must be unique so that it can be used to identify the specific element. An attempt to add a new element with a duplicate key will raise an error. The actual item of data being stored can be of any type, including variants, meaning that object references can also be stored.

A collection exposes the familiar Count property, and Add, Item, and Remove methods. Because it’s an object it must be instantiated as such, using the New keyword:

Dim colTest As Collection

Set colTest = New Collection

To add a new element to an existing collection the formal syntax is

object.Add item, key, before, after

The item and key values we’ve already discussed. The before and after parameters, which are optional, allow you to specify whether the new element is to be placed either before or after another existing element. For example, a line of code such as

colTest.Add "gamma", "c", , "b"

will insert an element whose key value is "c" immediately after the element whose key value is "b". These two optional parameters do incur quite a performance penalty so it’s advisable to add data items in a pre-sorted order if at all possible.

Notice the syntax to denote the non-supplied before parameter. In Visual Basic nowadays if you are supplying only a few of an optional number of parameters then you must observe the ordering of the list of parameters and explicitly show the comma placeholder for the other parameters that are not supplied. An alternative approach to this is to use the new explicit assignment operator, :=, such as is found in languages like Pascal and Clipper. The use of this operator is currently limited; it’s only for use with these named arguments. It can’t be used in the general assignment of a value to a variable which, as a matter of personal choice, is a shame because I quite like it (along with the == operator for an equality test). However in the case in point it can be used to explicitly identify which parameters you are supplying. The equivalent syntax for the Add operation shown above would be

colTest.Add key := "c", item := "gamma", after := "b"

In order to access the contents of the collection a new language construct has been introduced, the For Each…Next loop. The old standard For…Next loop can still be used in association with the Count property if you prefer. In order to use the For Each…Next loop it is necessary to declare a general purpose Variant variable to store a reference to the element currently being accessed. This process of working through the set is known as enumeration. The collection logic includes a routine known as an enumerator that takes into account such concepts as a Remove operation taking place during a For Each…Next loop. If an element is removed then the enumerator will ensure that the internal index is fixed such that the next element will not be leapfrogged during the subsequent iteration.

Therefore we can wrap up all of these concepts into a general piece of code such as that shown in listing 1. This example shows the instantiation of the collection, various forms of assignment and enumeration, and finally the release of the collection.

Dictionary object

As we already discussed, the VBScript library provides the Dictionary object. Because it doesn’t exist within the default Visual Basic libraries it is necessary to create a reference via the References option under the Project menu – look for Microsoft Scripting Runtime. You will see that the underlying file for this is scrrun.dll.

I tend to think of a Dictionary object as an alternative to a Collection, rather than as an enhanced collection. The Dictionary object gives you a few extra features that you don’t get with the Collection, such as an Exists property to tell you if a certain key value can be found among the current contents. There is also a useful RemoveAll method to quickly initialise the contents. However the underlying architecture is based around an array rather than a collection, the practical upshot of which is that you can’t use the For Each…Next construct. Just as you started to get used to not having to use the standard For…Next syntax you find it’s suddenly back in fashion again.

One other quirk that exists between the two is the syntax of the Add method. Whereas the Collection expects the mandatory parameters to be Item, Key, the Dictionary expects them to be provided in the opposite order. One probable workaround to this is to remember to use named arguments. A further disparity is that the Dictionary doesn’t accept the Before and After parameters, again further emphasising it’s array-based implementation.

The Dictionary object offers a few benefits over the collection. Some of these benefits are:

  1. A greater degree of discrimination when searching by key. A comparison method allows for binary or textual comparisons, for example.
  2. Method for extracting keys or data values into an array.
  3. A method for changing a key value
  4. Keys are not limited to being string datatypes.

Overall the Dictionary object offers a more powerful approach. Its reliance on underlying arrays does give it a general performance increase, but opinions seem to vary as to how much of an improvement there really is. One claim that I found boasted of a five-fold increase, whereas another piece of research gave the two comparable performance. New features aside I suspect that the real deciding factor for the developer will often be based around a preference, or otherwise, to use the For Each…Next syntax.

And then there’s Bruce…

EXE is a magazine that is dedicated to software development, so for this reason I have never particularly contemplated writing about wider issues that might be happening within the development community itself. After all I’m limited to two pages so I figure I’ve got to make them count. This time, however, I’m making an exception. Bruce McKinney, who authored the first two editions of the best-selling Microsoft Press book Hardcore Visual Basic has posted a rather lengthy diatribe on the Internet stating that he is ending his involvement with Visual Basic. The full text of his argument can be found at here. To understand (I hope) my own reply to his comments will of course necessitate reading his article too.

In an attempt to condense his main points, he is accusing Microsoft of adding too many "doodads" to a product that is built on a "weak foundation". Is his attack on Visual Basic justified? I’ve had quite a bit of discussion with various people over this issue and I still find Visual Basic to be a very worthy tool, but perhaps Bruce has raised some valid points.

Those who have read his book will know that it is concerned with trying to extend the power of Visual Basic applications by diving quite deeply into Windows itself. Not that there’s anything wrong with this; the Windows API offers a very rich set of routines. It’s just that, well, you need to be kind of careful using some of them. For example I occasionally see articles explaining how to create and manipulate threads from Visual Basic. Threads are an integral, and fairly fundamental, feature of all 32-bit versions of Windows, and yet Visual Basic doesn’t provide any direct programming support for them. You can write code to use them, but it means delving into the Windows API and frankly it’s very easy to screw up if you don’t know how threads actually work at the system level.

In his books Bruce demonstrated ways in which the Windows API could be accessed through type libraries with Visual Basic 5 as a step towards more powerful programming, but when Visual Basic 6 appeared he found that his solution was broken, which of course troubled him greatly. The whole point here though is that much of his involvement with the product seems to have been with "pushing the limits" of Visual Basic (as the back cover of his book states). The focus of the Visual Basic design team has obviously been to create a product that is primarily suitable for constructing business-type applications, for example the various tiers of a sales ordering system. The high availability of third-party products such as sophisticated map-rendering engines has opened up the potential use of the product through clearly defined means of extension, namely the ActiveX automation interface. However I think that what Bruce has expended so much effort on is trying to expand the product through avenues that the VB developers haven’t yet provided much direct support for. This chiefly entails the kind of integration with the Win32 API that can be achieved with other languages such as Visual C++.

Whether he is right to be able to expect to do this is very much a matter of opinion, and I expect that this is the type of issue that could easily generate wildly differing opinions. The last couple of releases of the tool have focused on the ability to develop applications and components that fit in with corporate-style applications. The recent added support for Microsoft Transaction Server and Microsoft Message Queue Server is indicative of this. Bruce isn’t wrong to try to extend the product into new territories, but I’m sure that we would all hope that the natural evolution of the product will eventually address these issues in an architecturally stable manner. I say eventually, although of course we would like it NOW!

I’m genuinely sorry that he’s become so disappointed with Visual Basic, and perhaps I can understand his reasoning for wanting to walk away from the product when I consider what his own expectations of it were. However most of us use it for features that, by his own admission, he has no interest in – namely database access and/or Internet development – so I don’t foresee his statement leading the masses away to another tool. The last I heard on the issue was that he hadn’t actually decided what his new development tool would be. Goodbye Bruce, and good luck.

Copyright ©2002 Jon Perkins I, Jon Michael Perkins, hereby assert and give notice of my right under section 77 of the Copyright, Designs, and Patents Act 1988 to be identified as the author of the foregoing article.