(Photo by Darren Hester)
Some algorithms take a long time to return their results. Whether it is because the algorithm has to operate on a huge data set, or because it has combinatorial complexity; every time you run it you have to wait minutes or even hours for the thing to finish, making errors very expensive.
This post gives some advice on how to prototype slow algorithms with as little frustration as possible. We assume that this algorithm is being implemented experimentally – that is, you will tweak it and change it often before it is finished (it is not the kind of algorithm you type in straight from a text book). For example, I used the ideas outlined here while playing with the texture generating algorithm of the previous post.
This post is not about optimisation, looking for another algorithm, implementing it in another (faster) language (C++) or writing correctness tests for it. Some of the suggestions go against good programming practices for production code, and you should not use them for that.
The idea behind these suggestions is:
- to minimise error (that would force a rerun after correcting it);
- to minimise duplication of calculation between successive runs; and
- to minimise losing a better procedure or parameter set (that would force several reruns in trying to reconstruct it from memory).
I will assume that your code is divided in two parts:
- a function (and its helper functions) that forms the algorithm; and
- a test harness.
To illustrate the ideas proposed below, we will use the following experimental blur algorithm as an example. I give it here in Python, but the tips are not language specific.
def blur(grid, repeat_count, window_radius): window_area = (window_radius * 2 + 1)**2 tmp_grids = [None] * repeat_count
for i in range(repeat_count): tmp_grid[i] = Grid2D(grid.dims)
for index in grid.index_iter(): for window_cell in grid.wrapped_square_iter(index, window_radius): sum += window_cell tmp_grid[i][index] = sum / window_area
grid = tmp_grid[i]
new_grid = Grid2D(grid.dims)
for index in new_grid.index_iter(): new_grid = tmp_grid[floor(random() * repeat_count)]
return new_grid def demo_blur(): grid = load_image(‘bar.png’) grid = to_grey(grid) grid = blur(grid, 20, 5) write_image(grid, ‘bar_blur.png’)
1. Implement a quick-run feature
A quick-run is where all loops and data structure sizes are set to the minimum necessary to make all the code run. This is to prevent large amounts of processing time before hitting a run-time error because of a small mistake near the end of the algorithm. This step is especially helpful if you use a dynamic language, where there is less help from the compiler to check validity of code.
In our example, there are three factors that determine the length of the algorithm:
- the image size;
- the repeat count; and
- the window radius.
We can implement a quick-run feature by changing our harness as follows:
quick_run = True if quick_run: test_base_name = '8x8' test_image = test_base_name +'.png' #This is an 8 x 8 image. repeat_count = 2 window_radius = 1 else: test_base_name = 'bar' test_image = test_base_name +'.png' repeat_count = 20 window_radius = 5 def demo_blur(): grid = load_image(test_image) grid = to_grey(grid) grid = blur(grid, repeat_count, window_radius) write_image(grid, ‘blur.png’)
Every time we make a change, we first run the program with quick_run set to True just to make sure we have not introduced any errors in the code. If you use a language that supports #define macros (C and C++) or some similar functionality, this can be implemented much more seamlessly.
Be careful for degenerate cases that skip loops – make sure all loops run at least once.
2. Consider using disk-caching for intermediate results
If it is faster to load a data structure from disk than to recreate it, you might gain a considerable saving in time by saving it to disk the first time it is created, and loading it thereafter.
In our example, we can store the temporary grids cache the temporary grids on disk by changing the function, allowing us to tweak part of the algorithm following the comment more effectively:
def blur_once(grid, window_radius, step): fname = test_image_base + '_blur_once_' + str(window_radius) + '_' + str(step) f = file(fname) if f.exists(): return load_image(f) else: tmp_grid = Grid2D(grid.dims) for index in grid.index_iter(): for window_cell in grid.wrapped_square_iter(index, window_radius): sum += window_cell tmp_grid = sum / window_area save_image(tmp_grid, fname) return tmp_grid def blur(grid, repeat_count, window_radius): window_area = (window_radius * 2 + 1)**2 tmp_grids = [None] * repeat_count for i in range(repeat_count): tmp_grid[i] = blur_once(grid, window_radius, i) grid = tmp_grid[i] ### Part that can be tweaked follows ### new_grid = Grid2D(grid.dims) for index in new_grid.index_iter(): new_grid = tmp_grid[floor(random() * repeat_count)] return new_grid
When caching is used carefully, it can save time. But it can also waste a lot of time.
- Use it if you use one test case often. In our example, we would not gain much if we tested with a different image every time. But if we used one or perhaps a few images, you can reap the benefits.
- When tweaking different parts of the algorithm, caching might become a hindrance. Never cache results that depend on your current tweaking. Not only will the extra disk saves slow things down, but you might be confusing old results from the cache instead of the results you actually need.
- Make sure that the filename is unique for the function and all the input parameters. You might use results from the cache for the incorrect inputs without realizing it, leading to very frustrating debugging. This works hand-in-hand with tip 4.
- Remember to stabilise randomness; that is, seed the randomiser at appropriate spots. (It is, of course, something you should generally do for debugging and unit tests / regression tests).
- For a little extra work, you can avoid bugs due to a non up-to-date cache, at least in some cases. Perform the algorithm on a subset or sample of the data, and compare this with the cache. Signalling inconsistencies can help flush out bugs. You can even modify the caching to update automatically when such an inconsistency appears – but you should still always signal cache misses, so that you can immediately see when your cache is failing, perhaps because of a bug or unstable randomisation.
- Make your cache easy to clear (define a simple function for this).
- Make it easy to disable caching for a run.
- If you use an interpreted language, you can often use a “soft-cache” instead. This just means that you store intermediate results in variables, and only recalculate them when necessary. This can be a powerful method, but in my experience it requires a lot of mental maintenance, and is therefore error-prone. Because we are dealing with slow algorithms, a crash during an interactive session might be disastrous.
3. Use “yield” to return intermediate results for easy inspection…
…if your programming language allows this (C#, Ruby, Python). When prototyping an algorithm, it is often helpful to inspect (or analyse) intermediate results. Doing this can clutter up the code, especially since you want to put it in and take it out as the need arises.
Here is how we can modify our code:
RESULT = 0 BLUR_ONCE = 1 def blur(grid, repeat_count, window_radius): window_area = (window_radius * 2 + 1)**2 tmp_grids = [None] * repeat_count for i in range(repeat_count): tmp_grid[i] = blur_once(grid, window_radius, i) yield tmp_grid[i], BLUR_ONCE, i grid = tmp_grid[i] new_grid = Grid2D(grid.dims) for index in new_grid.index_iter(): new_grid = tmp_grid[floor(random() * repeat_count)] yield new_grid, RESULT, 0 def demo_blur(): grid = load_image(test_image) grid = to_grey(grid) for result, result_id, step in blur(grid, repeat_count, window_radius) if result_id == BLUR_ONCE: if step % 2 == 0: write_image(res, ‘blur_once_’ + test_base_name + ‘_’ + str(step) + ‘.png’) elif RESULT_ID == RESULT: write_image(result, ‘blur.png’)
Note that you can change what you want to inspect in one place. This is even more useful in complicated or deeply nested algorithms. Using an extravagant language feature such as yield also makes it easier to port the code to the production version – it is easier to spot and remove.
If your language does not support yielding, you can implement a inspect function and still have s astructure very similar to the above:
RESULT = 0 BLUR_ONCE = 1 def blur(grid, repeat_count, window_radius): window_area = (window_radius * 2 + 1)**2 tmp_grids = [None] * repeat_count for i in range(repeat_count): tmp_grid[i] = blur_once(grid, window_radius, i) inspect(tmp_grid[i], BLUR_ONCE, i) grid = tmp_grid[i] new_grid = Grid2D(grid.dims) for index in new_grid.index_iter(): new_grid = tmp_grid[floor(random() * repeat_count)] inspect(new_grid, RESULT, 0) return grid def inspect(result, result_id, step): if result_id == BLUR_ONCE: if step % 2 == 0: write_image(res, ‘blur_once_’ + test_base_name + ‘_’ + str(step) + ‘.png’) elif RESULT_ID == RESULT: write_image(result, ‘blur.png’) def demo_blur(): grid = load_image(test_image) grid = to_grey(grid) #The result is inspected in the inspect function; here we can ignore it. #We still put it in a variable to remind us that the function returns a result. ignore_result = blur(grid, repeat_count, window_radius)
Do not split the function (using polymorphism or overloading to prevent the if-elif logic). The point is that we want a single place where we can choose whether to output the result to disk or not.
4. When tweaking functions, make copies rather than tweaking in-place
Do this even for relatively small changes. Number functions, and associate saved results with the version of that function. One of the most frustrating things is when a change has unwanted effects, and you can’t undo it (and can’t remember how it was when working). Just as frustrating is when you have several results, and do not know which result belongs with which version of the algorithm. Being careful in this regard will prevent long searches for the right settings. It will also make caching (Tip 2) safer.
Under normal coding circumstances, you would refractor these versions to save code and promote maintainability. For prototyping code, I would not suggest this – just copy and paste the function, change the version number, and make the necessary changes in place. Refactoring introduces more room for error – enough to invalidate this tip. Of course, in situations where you want to check out different combinations of algorithms, you might need to refactor after all. As always, do not refactor without the protection of unit tests.
The above applies especially when changing library code – rather make a copy of the library code outside the library (perhaps only a single function or class), make the change there, and use the copy. This allows you to be quick and dirty; you can always refactor the change back in if the change is also useful in other application.
And if you think this approach will lead to a refactoring nightmare later on, you are right. But later on is the right time to deal with it – now you are designing an algorithm, and you need all your brain power. The point of this is to not mess up any existing (tested and working) code.
5. Implement an early-warning system
Even when you do not always know exactly what the state of a data structure should be, you often know what it should not be (for instance, empty). One mistake I make often is to process the wrong variable, for instance:
grid = do_something(grid) new_grid = so_something_else(new_grid) #Should really work on grid
To help eliminate this problem, I implement a few very quick tests to flag suspicious conditions. What tests you implement depends on your application; here are a few generic ones:
- Test for emptiness
- Test for uniformity (for example, to detect all-black images)
- Test for no-change after processing
- Test for excessive deviations after processing
- Use these tests in assert statements if your language supports them.
- When working with large data structures (such as images), it is important not to test the entire structure, but rather just a sample. Your tests should not take any time to run at all! For images, even a sample of 10 pixels can be enough in many situations.
- If you take a random sample, it is a good idea to use a separate randomiser for these tests. This way seeding the randomiser elsewhere (to stabilise for caching, for example) won’t affect the test samples negatively – it is better to test with different sample sets each time.
- Do not strive for mathematical correctness and completeness (unless this is easy to achieve) – rather use simple rules that will flag extreme deviations. For example, a blur should not significantly change the average brightness of a sample on an image. Although bounds for his can be calculated exactly, broader bounds (that are easy to estimate) will already flag many mistakes in the code.
- Avoid complicated test code; in particular, avoid complicated sampling. Again, we are not striving for the mathematical best sampling method – that is for the production version.
- Test your tests aggressively! There is nothing like a faulty test to waste time.
- Do not halt the algorithm on a failed test where only a sample is used – output the suspect data structure to disk and inspect it first to make sure it is not a false alarm. If it is a false alarm, the algorithm is not stopped needlessly, and you do not have to put in (unnecessary) code to check what is going on. Remember to increase the sample size of the sample algorithm before the next run.
- It is good to start with very crude tests, and refine them as you get closer to the final version of the prototype. Obviously, tests for the production implementation should be as exact as possible.
What protection devices do you use for prototyping slow algorithms? Let me know in the comments!