Sleepers: how to inject syscalls into Miou?

This tutorial shows how to inject a new syscall to Miou and extend the API of it with blocking operations. For the example, we're going to implement the sleepers. Unix.sleepf is a blocking operation. The fundamental problem with Miou is that it performs operations in the background (scheduling). So using a blocking operation with Miou prevents it from managing other tasks concurrently (manage tasks entered with Miou.call_cc) or in parallel (wait for parallel process tasks introduced by Miou.call).

As stated in the documentation, and this is a fundamental rule: > you should never give Miou blocking tasks (such as Unix.sleepf)

That said, how do you manage blocking tasks? Miou offers an API that allows you to extend its API with such tasks. The idea is to create a Miou.syscall which will then allow us to create a suspension point. In other words, we will deliberately suspend the execution of our function. Finally, by injecting another function, we'll inform Miou when it's time to unblock our suspension point - and let the function continue.

What we want to do?

So let's get down to business. The aim of this tutorial is to enable you to write this code:

open Miou

let program () =
  Miou.run @@ fun () ->
  let a = Miou.call_cc (fun () -> sleep 1.) in
  let b = Miou.call_cc (fun () -> sleep 2.) in
  Miou.await_all [ a; b ]
  |> List.iter @@ function Ok () -> () | Error exn -> raise exn

let () =
  let t0 = Unix.gettimeofday () in
  program ();
  let t1 = Unix.gettimeofday () in
  assert (t1 -. t0 < 3.)

This code explains simple behaviour: our tasks a and b should run concurrently. In other words, in the end, we should consume strictly less than 3 seconds (about 2 seconds) to complete this little program.

You can have fun replacing sleep with Unix.sleepf and you'll see that we're back to a simple sequential execution where we need 3 seconds to finish the program. And that's normal, Miou doesn't know that Unix.sleepf is blocking, so it will execute the two tasks one after the other without scheduling them correctly.

So we've got our test, which will validate what we're expecting.

Syscalls.

Syscalls are indicators of a "suspension point" that needs to be "continued"/signaled. Signals can only be made when a system event occurs: in our case, when we have waited n seconds. In this way, the user is able to:

  1. create such an indicator (Miou.syscall)
  2. create a suspension point from this indicator (Miou.suspend)
  3. organise their indicator management according to a unique ID (Miou.uid)
  4. transfer the signal once the system has warned us to do so (Miou.signal)

The first function allows us to create our sleep "syscall". The second will allow us to specify the point at which we would like to obtain the result of our blocking operation and the third function will allow us to keep (and store) this syscall so that we can find it again later.

let sleepers = Hashtbl.create 0x100

let sleep until =
  let syscall = Miou.syscall () in
  Hashtbl.add sleepers (Miou.uid syscall) (syscall, until);
  Miou.suspend syscall

As you can see, the implementation of a 'syscall' is relatively simple, but it is always associated with the implementation or extension of another function: the Miou.select function.

Miou is quite stupid, trying to carry out all the tasks we give it in the hope that they will solve our promises. And it does this as long as it has at least one unresolved promise. In our case, the promise we've just created will never be resolved by any task. To clarify Miou's behaviour in this situation, you can run this code:

let dummy _ =
  let select ~block:_ _cancelled_syscalls = [] in
  { Miou.select; Miou.interrupt= ignore }

let () = Miou.(run ~events:dummy @@ fun () -> sleep 1.; ())

This code will never end simply because we didn't give Miou anything (a Miou.signal value) to un-suspend (and resume) our suspension point.

But as you can see, I've specified an Miou.select function here which always returns an empty list. In truth, if Miou has no more tasks to do and there are still syscalls, it will try one last thing: execute our Miou.select function. This can return a new signal that could resolve our syscall. And it's here that we'll be able to continue/resume our sleeper.

Contrary to what we have just said, this Miou.select function (and only this one) can block! And, in reality, this is not a problem as all the tasks have been executed. We can therefore be in a busy waiting state for the next event to unblock our execution flow.

In our case, it's a case of taking the smallest sleeper, waiting and then returning a signal that resume that same sleeper. We also need to update the other sleepers because we're going to consume time.

let select ~block:_ _cancelled_syscalls =
  let min =
    Hashtbl.fold
      (fun uid (syscall, until) -> function
        | Some (_uid', _syscall', until') when until < until' ->
            Some (uid, syscall, until)
        | Some _ as acc -> acc
        | None -> Some (uid, syscall, until))
      sleepers None
  in
  match min with
  | None -> []
  | Some (_, _, until) ->
      let until = Float.min 0.100 until in
      Unix.sleepf until;
      Hashtbl.filter_map_inplace
        (fun _ (syscall, until') ->
          Some (syscall, Float.max 0. (until' -. until)))
        sleepers;
      let cs = ref [] in
      Hashtbl.fold
        (fun uid (syscall, until) acc ->
          if until <= 0. then begin
            cs := Miou.signal syscall :: !cs;
            Hashtbl.remove sleepers uid
          end else acc)
        sleepers [];
      !cs

let events _ = { select; interrupt= ignore }

Usage.

Now that we have our Miou.select function and our syscall sleep, we can use them:

let prgm () =
  Miou.run ~events @@ fun () ->
  let a = Miou.call_cc (fun () -> sleep 1.) in
  let b = Miou.call_cc (fun () -> sleep 2.) in
  ignore (Miou.await a);
  ignore (Miou.await b)

let () =
  let t0 = Unix.gettimeofday () in
  prgm ();
  let t1 = Unix.gettimeofday () in
  assert (t1 -. t0 < 3.)

Note that our Miou.select function has been transferred to Miou.run (via the Miou.events value)! Without it, our code wouldn't work. And that's it! Our program did not fail to run, which means that we used less than 3 seconds (about 2).

$ ocamlfind opt -linkpkg -package unix,miou main.ml
$ ./a.out
$ echo $?
0

And now we have proof that our 2 processes ran "at the same time". We say that they ran concurrently. Sleepers are a good example for understanding the syscalls mechanism with Miou, but of course you can extend this yourself with read, write and select as functions notifying us of system events.

The reason behind this API.

The fundamental objective remains the ability to specify/inject a system other than Unix. The ambition is, of course, to integrate Miou into unikernels. In this respect, we consider that lwt introduced the design that is probably the least costly for the user and the most maintainable for us. As such, we have simply reproduced what lwt already offered: miou and miou.unix.

There are many ways of abstracting and injecting implementations. Functors and value passing are examples. Once again, experience and usage may not be the state of the art of what can be done with OCaml, but they are valid arguments for the choice we have made.

It should also be noted that Miou has been designed for the development of system and network applications. Although the scheduler itself (miou) does not interact with the system, its task management policy is intrinsic to the way in which we can interact with the system today. Here again, experience and usage come first. There are many ways of interacting with the system (and some may be more interesting than others, depending on our specific application). But select() is still the simplest and most widely accepted of all systems (like Windows!).

Given the combination of unikernels and what the systems can offer, we have once again decided to take into account what has already been done. It's certainly not fancy, but it has the merit of having worked for quite some time.

Events & domains.

As you can imagine, this little introduction is not complete if we take into account Miou.call. Miou can launch tasks in parallel and these tasks can perform I/O. In our example, we can replace Miou.call_cc with Miou.call. The problems that will arise from such a change will be, to say the least, difficult to explain in full. However, they focus on a point that is fairly simple to see: we are not protecting our sleepers from changes that several domains can make at the same time.

Overall, this often requires synchronisation mechanisms between domains in order to manage parallel access to our sleepers. However, if you have already done some parallel programming, these mechanisms can:

Based on these findings, we propose a fairly simple design: a syscall is always managed by the domain that launched it. It is local to the domain. It is somewhat equivalent to Miou.call_cc, suspension only (Miou.suspend) operates concurrently with other tasks and each domain manages its own syscalls.

Local events at domains and local storage.

So, if we consider syscalls that can suspend the flow of execution that are always local to a domain, we can consider that each domain should have its own sleepers and that access to them should only be made by a single domain (the one with which they are associated).

From this idea, you can use a local storage. OCaml proposes that you can associate values with domains and retrieve these values according to the domain. This is done using the Domain.DLS module.

let get, set =
  let make () = Hashtbl.create 0x100 in
  let key = Stdlib.Domain.DLS.new_key make in
  let get () = Stdlib.Domain.DLS.get key in
  let set value = Stdlib.Domain.DLS.set key value in
  get, set

let sleep until =
  let syscall = Miou.make (Fun.const ()) in
  let sleepers = get () in
  Hashtbl.add sleepers (Miou.uid syscall) (syscall, until);
  set sleepers;
  Miou.suspend syscall

We then just need to call get () & set () in all the places where we use our hash-table to make sure we're using the one that's local to the domain. And voilà! As you can see, using Domain Local Storage simplifies our code enormously and saves us from having to implement and manage synchronisation mechanisms between domains.

Cancellation & interruption.

There is, however, one final point that we have deliberately omitted from this little tutorial: interruption. It was explained above that our Miou.select function can block and that it's no big deal - in fact, it is. We need to rephrase this assumption: Miou.select can block, but there must be a way for Miou to unblock the function - and by extension, the domain.

It's fair to ask why we would need such a mechanism. The answer is cancellation. It is possible to Miou.cancel a task with Miou.

let prgm () =
  Miou.run ~events @@ fun () ->
  let a = Miou.call (fun () -> sleep 10.) in
  sleep 1.; Miou.cancel a;
  match Miou.await a with
  | Error Miou.Cancelled -> ()
  | _ -> failwith "test"

let () =
  let t0 = Unix.gettimeofday () in
  prgm () ;
  let t1 = Unix.gettimeofday ()  in
  assert (t1 -. t0 < 10.)

In this example, a domain is asked to sleep for 10 seconds. But, at the same time, we want to Miou.cancel this task. At the moment, the domain will wait 10 seconds and then be "cancelled". This is where the interrupt mechanism comes in: Miou will interrupt the domain to tell it that something in its tasks has changed (cancellation). The domain will then recalculate these tasks and re-observe their states before finally realising that the task it was doing has just been cancelled.

The problem is that this interrupt must also interrupt our Unix.sleepf on which our domain is based. It's here, in our Miou.select function, that we're going to replace Unix.sleepf (which can't be interrupted) with Unix.select!

In fact, Unix.select can both wait (like Unix.sleepf) and interrupt itself if an event occurs on one of its file-descriptors. We are going to use the latter mechanism to implement an interrupt mechanism. To do this, we need to create a pipe (Unix.pipe). The interrupt function will be called by Miou whenever domains need to be interrupted (as in the case of cancellation). This interruption consists of writing to one side of the pipe while Unix.select observes the other side.

We also need to handle only syscalls that are pending. A task cancellation clean-up syscalls into the said task and we also need to clean-up the syscalls that have been deleted by Miou in our sleepers. Miou passes to our Miou.select function the points that have been cancelled. From this list, we can 'clean up' our table to keep only the active sleepers.

Finally, we will have to manage 2 cases, the one where we receive an interrupt and the one where we have just consumed our quanta. In the first case, we'll need to consume the byte sent to us by Miou, while the second case is similar to what we did before.

let consume_interrupt ic =
  let buf = Bytes.create 0x1000 in
  let _ = Unix.read ic buf 0 (Bytes.length buf) in
  ()

let update sleepers n =
  Hashtbl.filter_map_inplace
    (fun _ (syscall, until) ->
      let until' = Float.max 0. (until -. n) in
      Some (syscall, until'))
    sleepers

let minimums sleepers =
  let cs = ref [] in
  Hashtbl.filter_map_inplace
    (fun _ (syscall, until) ->
      if until <= 0. then (
        cs := Miou.continue_with syscall (Fun.const ()) :: !cs;
        None)
      else Some (syscall, until))
    sleepers;
  !cs

let select interrupt ~block cancelled =
  let sleepers = get () in
  Hashtbl.filter_map_inplace
    (fun _ (syscall, until) ->
      if List.exists (( = ) (Miou.uid syscall)) cancelled then None
      else Some (syscall, until))
    sleepers;
  let min =
    Hashtbl.fold
      (fun uid (syscall, until) -> function
        | Some (_uid', _syscall', until') when until < until' ->
            Some (uid, syscall, until)
        | Some _ as acc -> acc
        | None -> Some (uid, syscall, until))
      sleepers None
  in
  let ts =
    Option.map (fun (_, _, until) -> until) min |> function
    | Some ts -> Float.min ts 0.100
    | None -> if block then -1.0 else 0.100
  in
  let t0 = Unix.gettimeofday () in
  match Unix.select [ interrupt ] [] [] ts with
  | [], _, _ ->
      let t1 = Unix.gettimeofday () in
      update sleepers (t1 -. t0);
      set sleepers;
      minimums sleepers
  | _ ->
      let t1 = Unix.gettimeofday () in
      update sleepers (t1 -. t0);
      consume_interrupt interrupt;
      set sleepers;
      minimums sleepers

let events _ =
  let ic, oc = Unix.pipe ~cloexec:true () in
  let rec interrupt () =
    if Unix.write oc (Bytes.make 1 '\000') 0 1 = 0 then interrupt ()
  in
  { Miou.select= select ic; interrupt }
The block argument.

There's a situation where Miou has no more tasks to do. In other words, your program only has suspension points waiting for events from the system. In this very specific case, it is possible to wait indefinitely for any event from the system (including an interrupt, of course). In this case, the block value is true. You can therefore give a negative value to select() (which means that it waits indefinitely).

Result.

And there you have it, if you run our example code with cancellation, you can see the interrupt mechanism and the fact that one of our syscalls has been cleaned out. And our program finishes after 1 second.

This code shows the basic architecture of a real scheduler. We centralise everything around the Miou.select. Quite a few issues have not been mentioned here (such as signal management, system interruption, or how to properly close our pipes). Above all, this means that this code is just an example! It does, however, give a general idea of how Miou_unix (Miou's Unix extension) works and how you can extend Miou for more specific system events.

Conclusion.

Miou offers a fairly straightforward API for its extension to system event management. There are many ways of abstracting and subsequently injecting what is fundamentally necessary in the development of system and network applications.

In this respect, we have chosen to re-use what has already suited us for a number of years.

Once again, our approach is part of the development of unikernels too. And it is in this context (of systems and network applications and unikernels) that we have developed the Miou API.

It should also be noted that this tutorial is aimed primarily at those who are curious (in their understanding of Miou_unix) and at those who want to extend Miou to other possible interactions with the system.