- MPI programming is rarely done the way I think it should be done.
The tools provided by the compiler manufacturers, and all the instructions I've seen on the Internet favor the simple
divide-and-conquer approach where an approximately equal amount of work is apportioned to each core or node.
I consider this a failure of our educational system, and so I've given a couple of lectures over the years on how
to do this a little smarter.
This method might not work for all problems, but then maybe the problems could be re-thought to get it into a form
that could make use of this standard software-engineering approach.
For seismic data processing, the applications we wrote that use my code here are 3.5 to 5 times faster than the
competition (according to Total), and runs on tens of thousands of cores processing around 1 PB of data (the
initial requirement I was given for one application was that it take a tiny dataset of 5 TB and process it
on a 100 core machine in 1 week - the execution was completed in less than 3 days).
- Matrix Inversion
- This method for inverting a matrix will work for any matrix, regardless of the size or shape.
The drawback is the amount of memory that is required, but there are ways to modify this to account for large sparse
matrices that may be suitable.
An early version of this in C++ I wrote in 1992 used a template where you'd have to supply the definition of 0,
1, subtraction, and division for the datatype, and it would even work with complex datatypes (like differential
- Limited Size Binary Tree
I came upon the need for a binary tree for an embedded device in the mid '90s that would throw away the oldest
data and keep the newest, should a problem arise with the transmission of the data to the collection facility.
This particular device collected electricity usage for smart household metering systems (communicating over the
power lines), and when the project was abandoned, and someone had a question about it, the facility was opened
after 7 years and found to be still running, still collecting the data and waiting for the transmission to work
(but the antenna for transmitting the data back to the facility had been stolen).
Here's a simple binary tree: BTree.h BTree.c, and here's a python
version of the fixed-size one: FixedSizeBinaryTree.py - the original was
I've made many more little algorithm implementations over my career, but they would just clutter these pages,
if I got around to re-implementing them. Back in '85, for instance, I beat the Cray-supplied quicksort for selecting
the best input data for an Optimal Interpolation routine for a weather analysis program (by a factor of about 4x)
while working at the Air Force Global Weather Central.
I removed the Householder matrix inversion routine from a 3D refraction statics program and coded in an OpenMP modified
Jacobian Relaxation routine that gave a speedup of about 200x for a test dataset. And once I wrote a program in C that
when compiled and run would output its own source code.
Compile the program and run it and pipe the output to a file
which could be compiled and run and the output would match the input - without referencing any external files (including
its source code). I was challenged by a student on that one, and my solution kept track of which generation it was, it
would randomly generate variable and function names each run, and would toggle between logic states as well. That one took
all day to create, and although the original source code is long gone, here's a recent version that
keeps track of its iteration number. Compile the code with
gcc repeat.c && ./a.out | tee repeat.c,
and then "rinse and repeat".
The games can also be thought of as algorithm implementations.
Of course, I never think of them that way when I'm playing them.