how i write fast code

this is an excerpt (and work in progress, complete with errors and incomplete sections) of a much larger work i am currently writing (summer 2015). all of my major libraries require that the code that calls them be structured as non-blocking event loops. this explains what this means, and is from the "TECHNIQUE" portion of the new project.


loops are a powerful and fundamental tool of computing but also the most common cause of slow unresponsive code. loops are the most common way in which code blocks (and for Arduino specifically, delay() is as bad or worse). loops of all types enclose blocks of code that operate on a range or sequence of datums, often indexed with a loop variable.


roughly speaking, there are two kinds of loops, closed and open. closed loops are what most people think of when they hear the word "loop" -- block of code enclosed by any of a language's loop statement types; for(), while(), do(), etc. the statements so enclosed are executed repeatedly until the loop terminates.

closed loops are a form of subroutine; an "inline subroutine". the block of code within the loop is executed until loop termination. loop code is often a prime candidate for conversion to a full subroutine.

closed loops are easy and satisfying. they they look and act like neat little self-contained subprograms. and that can be a problem -- when enough time is spent inside them, they block.

the key to successfully using closed loops in an event strategy is to make sure that you consider all of the following:

what causes the loop to terminate? loops terminate when some condition is met. if the condition is not met, then the loop will not terminate, and it will spend all of eternity grinding through the contained statements. some loops simply count, often indexing some datums (clearing an array, etc). loops with simple numeric counts often don't block unless the count is "very large" -- and that is dependent on what else is going on in the loop. filling an array with 0's is "fast"; calculating a Bitcoin block would be "slow".

does the loop depend on time in any way? if a loop awaits for a future time through any means, such as delay(), waiting for a value to come up with millis(), then this is the very definition of a blocking loop, if loop termination depends on time in any way. this is covered in the TIME section and in the case studies. referencing time is perfectly fine; delaying or awaiting a future time is blocking.

does code within the loop wait for a particular state of any input? the thing about the unknown is, you don't know what it is. if loop termination depends on an input pin changing to some particular state, then that loop blocks, and worse, likely hangs your program in a stuck state1. any change requiring human intervention or most external processes (weather, wind, light, whatever) needs to be handled as events, not in a loop. even when this sort of event is certain to happen, the time between events is again, almost by definition, too long to be put into a loop. it will block.

how much time does it take for the loop to terminate, worst-case? the good news is that it's rarely important to know or care how long a loop takes to execute. while loop-execution time can easily be measured -- refer to my SRLoopTally library -- it's usually adequate to simply meet the guidelines above and not do anything that obviously blocks. without any concerted effort on my part, most of my modestly complex Arduino installations run 5000 loops per second, under 200 microseconds average, but with an occasional (every 10 seconds) loop that takes one or two milliseconds.

closed loops should be thought of as tools for solving tiny, brief, simple and self-contained tasks such as clearing a table, turning all LEDs on or off, rolling dice, calculating a value. things that terminate essentially immediately. closed loops must terminate deterministically -- closed loops cannot wait for any outside event to occur.

in general, to do anything repeatedly, or to await one or more events by which you want to trigger an action, or want to influence some ongoing activity, use an unwound loop.


all computers are inherently sequential machines. statements are executed sequentially top to bottom unless control flow statements change the sequence, often based upon some condition. program execution invokes the path metaphor, with some non-human extras added (metaphors have their limits).

though our focus will soon be with the nuts and bolts of the programming language, it is important that during the coding process that you do not lose track of your very human intent for the program. our job is: how do we break a given task into smaller, simpler, "atomic" chunks? and how to arrange for those to get done in the right order? when the task is extremely simple, such as the canonical BLINK example, the order (sequence) of actions is obvious enough. you can probably puzzle out from examples and the Reference how to skip (make conditional) small blocks of code. the larger problem is that the strategic ways and means, tools and metaphors for escaping simple linearity are simply not contained in programming language references, because they are higher-level human constructs.

you do not code your way out of needing to block, you think and strategize to not need to before you write a single line of code. non-blocking code doesn't await anything in a loop, or with the horrible delay() function. non-blocking code exists in unwound loops and state machines.

to react as fast as possible to events in your machine, when a resource needed isn't available, you perform other tasks while you await the resource.



when writing computer programs time refers to elapsed time, not wall-clock time. the number line is an excellent visualization. computer time has a number of characteristics that makes it very easy to work with. time is a positive integer which begins at 0 when powered on increasing by one unit up to a maximum of 4294967295 units (the maximum value an unsigned long int can hold). when maximum count is reached it overflows, the count starting over at 0 and increasing as before. 4294967295 milliseconds is approximately 49 days.

Arduino contains both millisecond and microsecond clocks; this discussion considers only the millisecond clock, accessed by the millis() library function. millis() returns the current timer as an unsigned long integer.

calculating the time X milliseconds in the future is easy:

unsigned long futureTime= millis() + X;

to treat futureTime as an event, simply compare to the current time:

if (millis() >= futureTime) { // TIME HAS COME, DO SOMETHING }

extending this into a repeating timed event is also simple:

if (millis() >= futureTime) { futureTime= millis() + X; // schedule NEXT event... // TIME HAS COME, DO SOMETHING }

you might visualize this as hopping up the time line, X units at a time.

while these are perfectly correct and usable if you will be using more than one timer the hand-management of unique variables for each "timer" quickly becomes unwieldy. a practical solution calls for a bit more overhead and complexity, and a very practical timer library will be introduced later.


a switch is a physical device that makes or breaks an electrical circuit. there are many configurations of switches but overwhelmingly the most common type for computer interface is the single-pole, normally-open, momentary operation, aka pushbutton switch. see the XXXXXXXX section for a more detailed description of what a switch is and how it is connected to the computer.

with the suggested electrical arrangement in section XXXXXXXXX the digitalRead() function samples the pin's electrical state and returns it as a numeric value, 0 if closed/pressed otherwise 1. given that in most instances we want to know when the switch was activated by some outside force (eg. your finger) a common approach is something like this:

if (digitalRead (7) == 0) { // switch was pressed // DO SOMETHING HERE }

however, this is almost always a mistake. it fails to detect the event of switch activation, instead responds to the mostly-unchanging rest or activated state of the switch. in fact this code does the opposite of detect-change because it has no memory. this code is not of sufficient complexity to capture an event.

recall from previous that detecting change requires remembering, and remembering is bound to the passing of time, therefore detecting change in a switch requires managing all three components: the current switch state, previous switch state, and elapsed time.

XXXXXXXXXXXXXXXXXXXXXXX work up past, current, time event to sample.

ref THEN WRITE switch lib.


it is standard advice that you put code into a subroutine when it is used more than once. this is correct but barely touches the surface of the powerful advantages of subroutines. subroutines are containers that help you think at a high level of abstraction about events, time and actions. subroutines literally extend the programming language. subroutines surround and contain complexity. subroutines can reduce complex logic using the technique of early exit. subroutines can contain code and data private to that code. subroutines make programs easier to think about, easier to write, easier to read and more reliable.

subroutines are sub-routines -- smaller programs that your larger program calls to do what the code within them does. you use a subroutine for what it does without being concerned for how it does it. this is the container metaphor, full-blown. it synergizes fully with how we think. finally, subroutines are how code becomes reusable and shareable -- all library code is contained in subroutines.

it may be possible to use too many subroutines, but i have never seen it. each time you need to write new code, start by creating a new subroutine. it can't possibly hurt.

while subroutines can be written to accept any number of input parameters, they can have only one (or none) return value. the reason for this is partially historic, due to the original intent of the C language as a "portable assembler"2. it is a constraint that can be used to advantage to structure your code. a subroutine invokation can be inserted into any control-flow or assignment statement and tested as if it were a built-in, a library function or a variable.

here's a simple example.

if ((number < 1) && (number > 100)) { Serial.println ("number is out of range"); }

here is the same subroutined. there is no real difference in execution but readability is increased:

if (outOfRange number, 1, 100)) { Serial.print ("number is out of range"); }
// return true if the input number is outside of the given lower and upper bounds. bool outOfRange (int number, int lower, int upper) { return (number < 1) && (number > 100); }

why bother with the extra effort? one reason is that programs expand as we write and use them. programs are evolving creative works. outOfRange() is now a general purpose function that can be used elsewhere in the program and in other programs. good subroutine names suggest their function, and so the invokation reads more clearly. if you need to change the way your code makes a partcular decision, you only have to make one edit if that code is contained in a subroutine. "if out of range, print message" while terse reads a bit more like prose. most importantly, subroutines hint at a way to think about code, a strategy for coding effectively, to push complexity into managable containers that you can make work one by one to build a larger program.

this is one of those subjects where the rules can only be expressed as a collection of do-and-don't anecdotes because subroutines are so flexible that an exhaustive study isn't possible or practical. if you put related program statements together into a subroutine you will stumble upon most of the advantages if you pay attention and remember some general rules of thumb:

when you think of a section of code as "doing something" subroutine it.

when you have code that calculates something based upon inputs and a bunch of control-flow statements subroutine it.

bounding is one of those things that comes up a lot in good code: determining if a number is within a range, or forcing an value to be within a range (within bounds). these are wordy and messy when done in-line; they are clean neat and readable when subroutined.

printing messages to humans is a messy illogical task because human languages are messy and illogical. subroutine it.

if you need to do something if and only if a large number of conditions are met by all means subroutine it.

i have an Arduino project that uses a purchased add-on board that stores sound files, and plays those sounds upon command. in my program, commands are issued via a timed schedule. the code for initiating playback is complex, depending on a number of settings and possible errors of the board. to play sound file N the code must:

all of the above conditions must be true to play a sound; and if not, each of them must generate an error message. put inline it becomes a messy tangle of multiple if..else if..else if.. conditionals. but put into a subroutine, it becomes easy to read and maintain:

void playSound_loop () { if (! timer (PLAYTIMER)) return; // return if not time yet if (! enabled) { // test MUTED Serial.println ("NOT ENABLED"); // not error per se but report return; } if (loadTrackNumber (n) == 0) { // try to select the track... Serial.print ("FILE DOESNT EXIST"); return; } playSound(); // initiate sound playing }

most important of all is the simple fact that the entire task of handling sound file playing is done with a single subroutine call with no additional intervention. the above code is of course a very small, abstracted and simplified version of the real code. but in the actual project the task of sound playing is handled exactly the same. loop() invokes it simply with:

void loop () { playSound_loop(); }

some of the advantages to this won't be evident until you read through the rest of the sections.


loop()is not just a place to stick your code. a carefully structured loop() is the key to good strategy.

loop() is an unwound loop. the Arduino reference tells you that code placed in loop() executes from top to bottom and exits; after exiting loop() is immediately called again. this process repeats endlessly. this is a loop, unwound like tape off a spool.

it may help to think of loop() as a series of tasks to be executed as quickly as possible, to get from the first statement to the last in the shortest period of time. this is the overriding goal of this programming strategy.

code placed in loop() -- and all code called by your code, all the way to the bottom of every library function called -- must not block. another way to state this: all code called in loop() must complete/return as quickly as possible without waiting. how this is done is covered in a later subsection.

it is important to be aware of the path the CPU follows through your code as your program runs, and this will be examined as we proceed. as the demon executes your program, it follows the path of instructions, setting variables and outputs, reading variables, branching at conditionals, etc. each iteration of loop(), entry to exit, follows some path of statements.

let's examine the main loop function from the FANCY BLINKER case study in Section XXXXXXXX.

void loop () { blink_loop(); // blink the LED down_switch_loop(); // watch for switches up_switch_loop(); // watch for switches report_loop(); // periodically prints blink rate value SRL.tally(); // loop() statistics }

loop() here simply invokes each task subroutine in order. loop() doesn't "know" what each does (though carefully chosen names and written comments help inform the author) and that is the point -- loop() simply invokes each of the tasks. the task subroutines do the work.

recall that our overarching strategy is to react to events quickly enough that the response is seamlessly real time without lag or delay. tfor this to happen each task subroutine must not block. the next section illustrates how that is accomplished, but for the moment assume that each task subroutine above takes an insignificant amount of time.

but how much time does each task take? and what is "quick enough"? the first question is easily answered empirically: the SRL.tally() function measures loop() execution performance and prints out a summary at the rate specified in SRL.begin(). below is a capture from an Arduino Uno running the FANCY BLINKER program:

#loop: 36674/s, 27uS avg, 264uS max

let's consider this terse report in detail. this program executes 36,676 iterations of loop() per second, for an average of 27 microseconds each. in the measurement interval (10 seconds in this example) the longest single loop() execution time was 264 microseconds. therefore that 27 microsecond average was calculated over some 366,760 iterations in the 10 second period making the average quite representative, with the 264 microsecond worst-case not much of an outlier.

but why the variation at all? the reason is that the statements executed within each task subroutine depends on conditions internal to that subroutine. loop() is quite intentionally isolated from the internals of the task subroutines. loop() simply invokes them one by one, endlessly, without knowledge of what goes on within them.

if the average loop execution time of 27 microseconds is equally distributed between each of the five tasks, it means that each is taking just over 5 microseconds to execute. it is generally accepted that response times under 10 milliseconds are subjectively undetectable and feel instantaneous. this code clearly exceeds that requirement by a huge margin.

loop() runs quickly because each task-loop runs quickly. if any task-loop took a significant amount of time to execute this strategy would fail. this strategy depends entirely on the way that task loops are structured and coded. the method is deceivingly quite simple and explained in the next section, though the explanation may not fully make sense until you've gone through the rest of the subsections that follow.

as author, the task-loop subroutine container allows you to think only about what that task must do and not be distracted by the concerns other tasks. as long as the code within the task loop doesn't block it can be completely isolated, without side effects to or from other task loops.

the short answer, expounded upon below, is that the code in task loops do not block. if you recall from the METAPHOR section, blocking is standing in one place waiting for something you need, preventing anything else from happening. because CPU code execution is sequential, code that blocks prevents all other code from executing -- the machine is effectively halted. when a task loop needs some resource in order to continue, it terminates, remembering where it left off, so that when loop() again invokes it later, it can see if the resource is available. once again, now, the past, and time are required. how this is accomplished is explained in the next section.


sequence refers to the order of execution of program statements. statements are executed in the order in which they occur. execution order is affected by control-flow statements, but also by subroutine call and return.

in this strategy program control flow is categorized as one of two types:

trivial sequences are simply inline default order, or with conditionally-skipped blocks eg. if(). trivial sequences are always forward, and never branch backward or loop.

a complex sequence is any sequence that executes in any order other than top to bottom, eg. branch backwards, loop, or more than "a few" conditional blocks.

any closed loop whose block is a trivial sequence, can remain a trivial sequence if it is "unwound" and contained in a subroutine, called from loop() or a task loop. all other loops are complex sequences, no matter how few lines of code they are.


Arduino's loop() is a trivial sequence. trivial refers exclusively to the order of execution, and has nothing to do with what the task loops might do. the subroutines in loop() are executed in order from top to bottom without variation and loop() repeats endlessly.

all of the task loops in the FANCY BLINKER case study are trivial sequences; for the most part, once the time condition is met the code simply executes in-line, with the switch tasks having a single if() statement that inherently does not block.


complex sequences are where program complexity (and bugs) often explodes when constructed with ad hoc logic, such as multiply nested if..else statements. nested if..else sequences quickly get out of hand; as little as two-deep and the logic becomes impenetrable, inflexible and unreadable. nested if..else logic is very difficult to modify.

the solution to constructing any sequence more complex than the most trivial is the state machine.


state machine programming may seem alien to you at first, but they are incredibly simple. do not let mere unfamiliarity get in the way of mastering them, because once you do you will have a way to think about tasks of any complexity, and you will wonder, as i do, why they are not universally used. state machines are capable of handling any sequence possible to be executed by a computer. there is much theoretical treatment of state machines on the internet and in the literature and i swear, all of it is intended to obfuscate and confuse, because few things of such power are easier than a state machine.

a state machine is an abstraction that can be constructed in any programming language using ordinary control-flow statements, eg. if(). the C language however has a control-flow statement specifically to build state machines:

the statement is a metaphor for a multi-position rotary electrical switch. instead of two positions, on vs. off, a rotary switch can be turned to one of N positions. they are very common on appliances to select amongst modes of operation, etc. to construct a state machine from the statement we need to add a state variable, an integer that represents the position of the switch's shaft that selects one-of-N available positions.

as our first example let's construct a state machine to blink an LED. the blink_loop() task subroutine in the FANCY BLINKER case study will be simpler than what we construct here, but the state machine method has advantages that will become apparent later.

the order of decisions and actions in blink_loop() are trivial; once the scheduled time is reached the rest of the statements are executed sequentially. the state machine makes the sequence explicit and controllable, even though in this instance it is not necessary.

let's examine execution order in the FANCY BLINKER blink_loop() task. begin by mentally labelling the statements in blink_loop() 1, 2, 3, and 4. when it is not time yet, millis() < T, the execution order is 1, 1, 1, 1, 1, 1, ... the demon does not get past the first statement. the moment that the blink time is reached, execution order is then 1 (true), 2, 3, 4 and the LED changes state. upon next iteration the sequence will again be 1, 1, 1, 1 ... since step 2 re-scheduled the next blink time.

in a state machine the order of execution, is determined solely by an integer called the state variable. this is of critical and profound importance -- we can now calculate statement execution order. the contents of the state variable directly controls the position of the rotary switch.

here is the blink_loop() code converted to a rather awkward state machine:

void blink_loop() { static unsigned long T; static byte led; static int state; switch (state) { case 0: if (millis() >= T) state= 1; break; case 1: T= millis() + blinkPeriod(); state= 2; break; case 2: led= ! led; state= 3; break; case 3: digitalWrite (LEDpin, led); state= 0; break; } }

let's briefly review the topology of the statement, which begins with switch (state) and encompasses the block of code between { braces }. each case has a matching break and case..break encloses a block of code. defines a rotary switch, with each of the case..break blocks a switch position. state is the ingeniously named state variable.

blink_loop() is invoked repeatedly from loop() as as before. the first statement evaluates the state variable. the state variable answers the question "where did i leave off?" or "where was i?" as you please. this is loosely analogous to an awareness of task -- a intentional rememberance and choice of where to continue. though in this example the sequence is trivial this ability to control execution order shows the power of a state machine.

given that initially state is 0 the switch statement passes control to the first case, 0. statements are executed until the break is reached -- and then the entire switch statement terminates -- control is immediately passed to the statement following the closing brace.

the statement in case 0 only checks if sufficient time has passed. if not the conditional is false, state does not change, the switch terminates and the subroutine exits. this repeats until blink time arrives -- only then is state= 1 executed. therefore the order of statement execution is the same as before; 1, 1, 1, 1, 1, 1, ...

once blink time has arrived the conditional is true and state= 1 is executed, the break is encountered, the switch terminates and the subroutine exits. it is not until the next iteration that the switch statement passes control to case 1, which then schedules the next blink time, sets state to 2, exists the switch statement and the subroutine returns. execution order is then 1, 1, 1, ... 1, 2.

this process continues with for the other states and the statements perform the LED-blink operations as before. note that the last state, case 3, sets state back to 0, to await the next blink time.

now this initial example state machine has some silly features due to the overly literal way that i converted it from inline code for illustration purposes here. let's fix that now and construct a nicely succinct state machine that we will next expand into something more interesting.

the only thing that takes significant time is waiting for the blink ON/OFF time. the rest of the statements are straightfoward and all of them can be combined into one state. the first state (case 0) awaits the time event as before; the other state schedules the next time event, changes the LED to the opposite state and changes the state back to 0.

void blink_loop() { static unsigned long T; static byte led; static int state; switch (state) { case 0: if (millis() >= T) state= 1; break; case 1: T= millis() + blinkPeriod(); led= ! led; digitalWrite (LEDpin, led); state= 0; break; } }

once again, just as in the treatment of events, the present, the past, and time are represented. the state machine and the state variable make up a rememberance of what has already been done (previously executed case/state) and what must be done now (current case/state). the passage of time, as a series of steps, is made explicit.


now that we have introduced all of the basic techniques it is time to more thoroughly examine the critical feature of this strategy, the avoidance of blocking. here a truism is useful:

code is fast, the world is slow.

it should be clear by now that awaiting any resource or event by looping in place blocks execution and must be avoided at all times; and that the solution to this problem always involves the past, the present, and time.

the particular structure of loop() with it's trivial sequence of task-loops, and task-loops as state machines that do things based on a state variable brings with it many advantages not yet outlined here. one of those is testability.

debugging Arduino programs is very primitive, much as it was in the 1960's. usually all we have at our disposal to reveal bad logic and mistakes is embedded print statements, LED blinking, etc. effective debugging requires careful thought and disciplined strategy.

state machines help this process incredibly. with them you can code a state specifically to print out debug info at a human rate, or when a particular condition occurs. thinking in terms of state machines

XXXXXXXXXXXXXXXX render task loops as tifs and paint control flow over it in color -- one color for each path will allow one image to suffice for multiple states.


once your design is scrawled out, program layout can begin.

the Wiring default layout is as good as any and better than most. my programs tend to use the same overall layout regardless of function or complexity. keep in mind that the source file is a script -- a set of instructions to the machine and a description of what you the author intend. here's a very rough overview:

i follow this as a general guideline and violate it whenever appropriate. the idea here and throughout the practice of writing code is to use repetition and familiarity to our advantage. when i need to find a variable, a subroutine, or a declaration, i know approximately where to look without searching. given the large amount of trivia a routine structure helps. this is one way in which code-writing fundamentally differs from prose -- creativity is expressed not at the level of word-craft but in larger functional organization.

strive to include background and supporting information in comments. i tend towards logorrhea in comments, and often have more comment than code. comments are my intent; code is the implementation. code is not self-documenting -- code does not contain meaning.

where possible and not excessively excessive, i try to include copious hardware documentation in the comments for pin declarations, eg. part and pin number, wire colors, that sort of thing. often when debugging you just need a hint to identify a thing to test without referring to schematics etc.


do a "lets examine..." for setup() and loop() and task_loop()


it is a common fallacy that computer memory contains numbers. it does not, exactly; memory cells contain symbols. it is true that if you represent ordinary integers in base-two, that the resulting pattern of digits, 0 or 1, happen to "fit" directly into a memory location (metaphorically, at least), and this is not coincidence. but this does not mean that a cell full of bits is a number; that is simply wrong.

the CPU has particular built-in instructions to manipulate memory contents in all sorts of interesting arithmetical and decidedly non-arithmetical ways. these manipulations are most deep fundamental tricks for doing fun, useful things with a computers. if you imagine each bit to be a physical block or square (as turing did) then with machine instructions you can perform "tricks" with cell contents ("bits") as you can with simple physical manipulation of blocks.


you are limited here only by your ability to visualize "tricks" for sliding, flipping, rotating, copying, transposing. the CASE STUDIES illustrate some of the things that you can do with what is called bit fiddling.