Saturday, October 28, 2006

C Add-Ins in Excel: Performance Comparison with VBA

As Excel is also applied in order to solve industry level business problems, and as therein the speed will be of the essence, the choice of programming in VBA could become unfeasible. In such cases critical parts can be developed in C, which is the topic of this post.

It is curious to note that while other language choices in Microsoft Visual Studio are well documented, writing an Xll in C – that is, registering it, exchanging data via XLOPER structure and running C API – is much more of an uncharted water.

There still exists some vital material: for instance a thorough book “Excel add-in development in C/C++, Applications in Finance” by Steve Dalton. On the net an example of a guide would be interpolation add-in by JChampion, which also goes over the basic concepts, and might be a better read, than Microsoft’s own material on the net.

The example is as follows. The idea was to test C at its best: to have it make 137 000 000 comparisons of text based on Excel data. (Text comparisons should have the biggest superiority of performance in C, because of how it handles the data type).
The data (which is interesting on its own!) comes from perhaps the biggest publicly available database of owners in Estonia –
the daily list of the owners of the stock of company Tallink, consisting of records on about 16000 entities. The add-in works as follows – it is an array formula that compares two snapshots of data and writes out stock owners, who are present in both periods. No optimization of the algorithm is done – just a plain one by one comparison.

The Microsoft Visual Studio 2005 project with C files and the compiled Xll as well as the ready made files on Tallink are downloadable. I would like to express my sincere gratitude to Jüri Vilipõld, who benchmarked the programs. The results showed, that depending on the computer, the program took about an hour to run if VBA operations accessed Excel cells throughout the operation, programs using VBA arrays trimmed that time down to about 40 seconds, depending on the implementation – one using the direct copying of the entire array and the other not. And the C add-in in turn was about 5.5 times faster, completing in less than 10 seconds. The results, thus, give us a picture of the speed of accessing Excel objects excessively, of using VBA or optimized array copying, and of using C.

S.M. Johnson: production schedules - in VBA

Amongst mathematical algorithms taught to business students one was devised by S.M. Johnson back in 1954. The issue is having two different operations undertaken on products, in succession. A range of products at hand has different lapses of time, for both operations. The goal is to arrange such a schedule, which will minimize the cumulative production time. This program was needed by the Chair of Business Mathematics.

ANN solving XOR in VBA

Here is a feedforward, multiconnectivity, Sigmoid transfer function, backpropagation, multi layer perceptron Artificial Neural Network source code in VBA; that I needed to make for my studies. It will solve the XOR nonlinear separation problem and graph out the learning path.

Traveling Salesman Problem (TSP) in Excel/VBA. The use of graphics in Informatics II class

The most advanced topic that has been addressed in our Informatics II for non-IT students: VBA, class is the Traveling Salesman Problem (in this file). As TSP is NP-complete, as far as its complexity is concerned, problems such as traversing all the 24978 towns of Sweden are solved. Our problem is a very simple one – of passing through 16 cities.

The Waites family project VisualBots has created an Excel Add-In by that name. My thanks go to their project regarding the TSP – from which I borrowed the algorithm that is used. The algorithm used is simulated annealing, which is a heuristic that, with a decreasing probability, chooses a longer total distance – in order to avoid getting stuck in a local minima. The VisualBots is an add-in that gives Excel additional graphical capabilities.

With a bit different accentuation it has also been a goal in our university to rely on Excel’s graphical objects. For years the Informatics II for non-IT students has used those to convey the essentials of programming with VBA. Lately, with the advent of educational programming languages, such as Scratch, our approach has been validated, as courses like Harvard’s CS50 also start with graphical objects now. In the present example graphical objects are complemented with Excel’s charts displaying the dynamics of the solution process.

Gauss-Jordan elimination in Excel/VBA. The use of matrix problems in Informatics II class

In our undergraduate Informatics II: VBA for non-IT students, class we do many computations on matrices – in order to let students reason algorithmically. As we start out, it’s good to show that a simple 12 row algorithm can solve linear equation systems by doing Gauss-Jordan elimination. Examples of subsequent test problems include: finding the number of elements between -3 and +3 amongst those above the main diagonal; finding the average of all positive elements of a matrix.

As a huge variety of such problems can be created, this is a topic that guarantees a thorough and genuine understanding of the code written – compared to other topics of the class, where a bigger proportion of material seems to be memorized, and not so thoroughly understood. This topic also gives a good incentive for streamlining testing – as literally hundreds problems can be handed out as WebCT tests; with students entering the multiple choice (again from amongst hundreds) answers after doing programming in Excel/VBA. In order to be sure of students’ comprehension of the topic, multiple tests have to be completed.

Friday, October 27, 2006

NPV and IRR in Execl/VBA

In our undergraduate Informatics II: VBA for non-IT students, class the financial indicator Net-Present-Value is programmed. This program also includes the counterpart, IRR, the difficulty of which also remains within the scope of this class. By the fourth semester, on which Informatics II is held, students have already taken the basic classes such as finance and statistics, so that no time needs to be used for conveying background information.

As IRR can have multiple values, one would have to start out with an initial guess. We take a value that would ensue from a too good to be true project: a version of our current project in which all the future cash flows occur at once (there is no discounting from the future) – thus purportedly overestimating IRR. We then start decreasing our estimation of IRR and at every step check whether we have arrived at a correct answer.