» Dump

Procedural is fine, separate IO is good

Question

On Stack Exchange, laggyfrog asked about structuring a game loop in a manner where multiple functions manipulate fields of a shared struct. To condense the example, using Rust syntax:

struct State {
  x: i32,
  y: String,
}

fn task1(x: &mut i32) { ... }
fn task2(x: i32, y: &mut String) { ... }
fn output(x: i32, y: &str) { ... }

fn main() {
  let mut state = State { ... };
  while state.y.len() < 4 {
    if state.x > 0 {
      task1(&mut state.x);
    } else {
      task2(state.x, &mut state.y);
    }
    output(state.x, &state.y);
  }
}
full question by laggyfrog

What are the pros and cons of structuring an application as a pipeline of functions mutating a shared struct?

I could not find anything regarding this question except some lecture notes about game design and a book which describes something similar but not quite the same.

General Description

The approach is as follows.

There are two kinds of state in an application: Core State and Derived State. The Core State is any state that cannot be readily derived from other state. The Derived State is any state that is not core. As a trivial example, if you have N bananas and M apples, you can easily calculate how many fruits S you have with S = N + M. Thus N and M is core state, and S is derived.

The main application components are Data, Systems and the Main Loop.

Data is represented as a single struct, that contains all of the core state of the application, publicly accessible. All fields within this struct should ideally be primitive types, PODs, generic collections (a hash set, a list, etc.) or utility types from the standard library (e.g. Option and Result in Rust).

A System is a function that is called in the Main Loop. It must never return any value and instead mutate the values, which are parts of Data, passed into it and must never call another systems directly. The second requirement means that systems can only communicate by modifying Data (or shared private state when grouped in a class).

Systems can be grouped into classes with private state, which should ideally be Derived. Instances of such a class are created before the Main Loop, the internal state must be initialized either during construction or in the Main Loop.

The Main Loop is simply the outermost loop inside the main function that keeps the program running. This loop must be located in the main function and not hidden behind a function call.

Example

A simple example that illustrates what the code written in this fashion may look like is given below. It is written in Rust but is not specific to it.

// Farm Data
#[derive(Default)]
struct Data {
    pub keep_going: bool,
    pub money: u32,
    pub time: Time,
    pub sheep: Vec<Sheep>,
    pub cows: Vec<Cow>,
    pub ducks: Vec<Duck>,
    pub food: u32,
    pub last_error: Option<FarmError>,
    pub log_messages: Vec<Log>,
}

fn feed_ducks(ducks: &mut [Duck], food: &mut u32, last_error: &mut Option<FarmError>) {
    for duck in ducks {
        if *food > 2 {
            *food -= 2;
            duck.hunger -= 1;
        } else {
            *last_error = Some(FarmError::new("not enough food"));
        }
    }
}

// If you want to record logs, you must pass the array as a parameter to the place
// where it is needed.
// Using globals to push logs directly from any part of the code is discouraged
// as all application state should reside in Data.
fn sell_cows(
    cows: &mut Vec<Cow>,
    money: &mut u32,
    last_error: &mut Option<FarmError>,
    logs: &mut Vec<Log>,
) {
    // Sell the cows, get money.
}

fn buy_food(food: &mut u32, money: &mut u32) {
    // Buy food to feed ducks (using money).
}

fn shear_sheep(sheep: &mut [Sheep], last_error: &mut Option<FarmError>) {
    // Shear the sheep.
}

fn rest(logs: &mut Vec<Log>) {
    // Rest
}

fn print_logs(logs: &Vec<Log>) {
    for log in logs {
        println!("{log}");
    }
}

fn clear_logs(logs: &mut Vec<Log>) {
    logs.clear();
}

fn check_bankrupt(money: u32, keep_going: &mut bool) {
    if money == 0 {
        *keep_going = false;
    }
}

fn main() {
    let mut data = Data::default();
    while data.keep_going {
        if data.time == Time::Morning {
            feed_ducks(&mut data.ducks, &mut data.food, &mut data.last_error);
        } else if data.time != Time::Night {
            sell_cows(
                &mut data.cows,
                &mut data.money,
                &mut data.last_error,
                &mut data.log_messages,
            );
            shear_sheep(&mut data.sheep, &mut data.last_error);
            buy_food(&mut data.food, &mut data.money);
        } else {
            rest(&mut data.log_messages);
        }
        check_bankrupt(data.money, &mut data.keep_going);
        print_logs(&data.log_messages);
        clear_logs(&mut data.log_messages)
    }
}

My questions are the following:

  1. Can this way of structuring an application survive in the real world and be used for almost any application imaginable? What are its pros and cons (including the most obvious ones) for a game versus, say, a microservice or other application?
  2. Can any of the problems be removed if some of the constraints were lifted?
  3. Does this approach scale to larger (game or non-game) projects? Can it make them simpler than using the current best practices?

Software Engineering Stack Exchange question by laggyfrog, reproduced here under the CC BY-SA 4.0 license.

Answer

This answer discusses that procedural approaches are perfectly fine, that it is possible to actually use the state struct, what the consequences are from externalizing I/O operations, and provides pointers to related concepts.

Procedural is fine

This is a traditional procedural design.

Procedural programming is entirely fine, and can help structure the program logic into parts that can be tested independently.

A general concern with procedural designs is mutable global state. When every procedure can change any data, it is difficult to track data flows, which may create or obscure bugs. Sometimes this data flow obfuscation happens through literal global variables, though Rust discourages this. But a similar anti-pattern easily happens in OOP languages where there is a class with all the data, and all the procedures are methods on this god object.

The presented design avoids the worst of this, because the individual steps/procedures do not take the entire state as a parameter, but only receive access to the individual parts they need. This is good.

But this also means that the struct is currently useless. While the individual parts of the state are grouped into the struct, the struct itself is never used. This is just a fancy way to specify lots of local variables.

If we rewrite my reduced example to use local variables, we can note that the steps themselves do not have to be changed. Instead, the struct definition is essentially inlined:

fn task1(x: &mut i32) { ... }
fn task2(x: i32, y: &mut String) { ... }
fn output(x: i32, y: &str) { ... }

fn main() {
  let mut x: i32 = ...;
  let mut y: String = ...;
  while y.len() < 4 {
    if x > 0 {
      task1(&mut x);
    } else {
      task2(x, &mut y);
    }
    output(x, &y);
  }
}

Actually using the state struct

However, it can be very useful to have a single struct if we actually use it. For example:

  • we can more easily dump the entire state for debugging
  • we can create an explicit function for each step/iteration

Let's look at this second idea in more detail. We can create a function that represents one loop iteration or one simulation tick, essentially extracting the entire loop body.

There are different ways to specify the signature of such a function. It might mutate the data in-place, might consume the old state and return the next state or might create a copy of the state.

mutating the data in-place:

fn step(state: &mut State) -> bool { ... }

Maybe the step function returns a boolean to indicate whether the loop should continue, maybe that is checked externally.

Potential usage:

while step(&mut state) {
  output(&state);
}
output(&state);

What this approach affords us is better testability as we can now test one simulation step in isolation, without having to run the full loop.

consuming the state:

fn step(state: State) -> Option<State> { ... }

Here, the step function will completely consume the old state, and can reuse its resources to create the next state. In this example, the function returns an option to indicate when the loop should stop.

Potential usage:

while let Some(next) = step(state) {
  state = next;
  output(&state);
}

This is generally equivalent to the previous approach, but sometimes consuming data is conceptually simpler than updating it.

In particular, we now have a (pure) function, which might be easier to reason about. That is not quite as important in Rust due to its borrow checker, but it can still be useful. In particular, creating a new state is more likely to force you to think about all aspects of the state, making it more difficult to accidentally forget to update something.

create a copy of the state:

fn step(state: &State) -> Option<State> { ... }

In this variant, the old state still exists when the new state was created. This has some drawbacks, like being unable to reuse resources from the old state – everything has to be copied.

However, it has the consequence that states are immutable and can be kept around. We can now compare the old and new states, which might be useful for debugging, or for doing time-series analyses. In a simulation, keeping track of the history might be useful for detecting some convergence criterion.

This can also give rise to features like undo by simply resetting the current state to an earlier one, or may allow exploring alternative histories.

While each state is conceptually an independent copy, the immutability of states makes it possible to share data between states. For example, expensive sub-states that are unlikely to be modified could be put behind an Rc or Arc smart pointer, enabling a shallow copy.

Keeping I/O separate

There is a related aspect about I/O that is worth highlighting: all of these design variants happen to separate I/O from mutation.

The individual steps/tasks register information in the state, which is then printed out at the end of the loop.

This has a couple of interesting consequences:

  • We have decoupled business logic from I/O. This is generally considered to be a good thing.

  • We can test business logic in isolation, without I/O. The tests can also make assertions about the output that would be produced just by looking at the state. This grants independence from having to handle formatting details in the tests.

  • We can change the output approach without having to modify the steps. For example, this could be ported from terminal output to GUI output.

By registering data structures representing future output in the state, we are very flexible in what we do with the output – with the caveat that this makes user interaction much more difficult. Such I/O would have to be carried across multiple states. What would normally be ordinary control flow now must be handled as an explicit state machine:

  • normal control flow

    async fn step() {
      let response = await prompt("pls give number");
      do_something_with(response);
    }
    
  • explicit state machines

    enum PromptState {
      None,
      Prompt(String),
      Response(String),
    }
    
    fn step(state: PromptState) -> PromptState {
      match state {
        // if None, prompt the user
        PromptState::None => PromptState::Prompt("pls give number".into()),
    
        // if the prompt is still active, do nothing
        PromptState::Prompt(p) => PromptState::Prompt(p),
    
        // if we have a response, use it and reset state
        PromptState::Response(response) => {
          do_something_with(response);
          PromptState::None
        }
      }
    }
    

In some cases turning I/O into data is really worth it for the added flexibility and debuggability, but for more interactive scenarios this will be painful.

This is a data-oriented approach to handle such problems, which is common in more functional languages like Haskell or Rust. However, the strategy pattern would be a related OOP approach. This too lets us make the business logic independent from I/O details, by defining an interface for I/O operations and then passing in an object. This tends to integrate better in the normal control flow, but is more difficult to test because interactions need a mock implementation.

Draft of the OOP approach:

trait Prompter {
  fn prompt(&self, msg: &str) -> String;
}

fn step(prompter: &dyn Prompter) { ... }

Loops that mutate some application state are common in games, embedded systems, and simulations, with differing degrees of real-time requirements.

In computer graphics, double buffering is somewhat related. Here, state (such as a frame buffer) is shared between multiple processes, for example a renderer and a screen. But if one mutable state is shared, it may temporarily be in an invalid/corrupted state. Instead, the renderer writes to a background buffer, and only provides the buffer to the screen once it is complete. The resources of the previous buffer can then be reused to render the next frame.

Similar ideas (switching to a new state once it is complete and consistent) also appear in journaling file systems and lock-free programming.

The idea of keeping an immutable history cannot only be applied to the actual states, but equivalently to the events that produce this state. Based on those input events, the current state can be reconstructed at any point in time. This is typically done in enterprise systems for auditing requirements, and known under the term "event sourcing".

Externalizing I/O operations from the business logic is very common. As already discussed there are object-oriented and data-oriented approaches towards this. On the OOP-ish side, there are lots of architecture-level patterns such as ports-and-adapters, hexagonal architecture, Onion architecture, …. This requires the business logic to define interfaces, that are then implemented by plugins. On the more functional side, architectures like functional core, imperative shell exist, where future output operations are represented by data structures, essentially creating a script of actions that will be executed later.

So some of the ideas in the question's original design appear in other contexts as well. However, this kind of loop with a shared state is not universal, because a lot of software does not have this kind of loop structure. It is often more convenient (and thus less bug-prone) to have temporary state in local variables, using normal control flow, or to use events. In particular, non-game user interfaces commonly use more event-driven approaches (possibly with callbacks, possibly with reactive programming techniques).

licensing CC-BY-SA-4.0