OCaml Handles

Let us begin our journey together. First and foremost, make sure that the OCaml environment is properly installed on your machine. If you are just getting started, without OCaml installed, detailed instructions await you on the official OCaml website.

Setting Up

With the environment ready, open the terminal and run the command dune init proj blog in the terminal. Running this command creates a directory called blog with the following structure:

ANSI

blog/
├── dune-project
├── test
│   ├── dune
│   └── blog.ml
├── lib
│   └── dune
├── bin
│   ├── dune
│   └── main.ml
└── project_name.opam

First Steps with Code

With the project structure established, replace the contents of the blog/bin/main.ml file with the following code:

OCAML

let get_dir path =
  let dir = Filename.get_temp_dir_name() ^ "/" ^ path in
  if not (Sys.file_exists dir) then
    Unix.mkdir dir 0o775; (* Sets folder permissions to rwxrwxr-x *)
  dir
;;

Interacting with OCaml

The next step involves running the interactive environment utop¹ from the terminal. After launching utop, load the contents of the main.ml file with the command:

OCAML

# #use "bin/main.ml";;

Run the get_dir function passing the argument "greetings"; the result varies depending on the system. In my case, the result was /tmp/greetings. Take note, this directory is where we will store files throughout this series.

Exploring the Concept of Channels

Let us start with some perspective on what a process² is in the world. By simply observing a computer running, we can see a multitude of processes being executed. Processes correspond to things being executed — for example, what you see right now on your computer screen while reading this text is a process, the browser you are using to read this text is a process, the operating system running the browser is a process, and so on. For the purpose of this text, I will use a simplification that I read in the book Sockets and Pipes, which is the following: "A process is an instance of a running program." But as stated in the book, the distinction is not so clear — for example, a single instance of a web browser can create a separate process for each tab.

We can create as many processes as we want, but while this multiplicity exists in software, hardware has finite resources. A machine may have only one screen to display graphics, one speaker to play sound, one chip to store all files, one cable or wireless connection to transmit data linking the machine to the internet. It is surprising that a computer can work — have you ever imagined the same scenario in some daily activity? Imagine cooking simultaneously with several people in the same kitchen and sharing the same frying pan while everyone tries to fry an egg, or trying to watch a movie with several people in the same room, everyone trying to watch a different movie. The ability to allow multiple actors to coexist harmoniously, sharing the same resources without harming their individual activities, is known by the technical term multiplexing. This is the job of the operating system: coordinating the shared use of physical resources by scheduling the execution of processes.

When a program performs an action, we are usually talking about some interaction with a physical resource. Reading or writing to a file on the hard drive, sending or receiving data over the network, displaying or receiving data on the screen, hearing or playing sound — Input and Output — I/O. All these actions are mediated by the operating system. Saying that a program does something is attributing more responsibility than the program actually has — the only thing a process can do is ask the operating system to do something on its behalf. These requests are called system calls.

Each operating system has its own set of system calls that programs running on it use to perform I/O. On Linux, you can run man 2 syscalls at the command prompt. We will not pay attention to the differences between operating systems because the standard library abstracts these differences for us.

Writing to the World

A pertinent metaphor for explaining I/O is to think of it as a dialogue between a process and the operating system, and we can consider the handle³ as an identifier for that conversation.

A file handle receives different names on Windows and Linux — on Windows it is called a file handle and on Linux it is called a file descriptor, generally abbreviated as fd. OCaml does not have a native handle abstraction; I/O operations are performed through channels. An in_channel is similar to a conduit through which data flows from an external source into the OCaml environment.

Our first contact with a file handle will be a brief I/O operation that uses basic operations from the stdlib module:

OCAML

let write_greeting_file =
  let dir = get_dir "greetings" in
  let file = dir ^ "/greeting.txt" in
  let oc = open_out file in
  output_string oc "Hello, world!";
  close_out oc;
  file
;;

Add this snippet to your main.ml file, load the file again in utop, run the write_greeting_file function, then look at the created file.

The only argument the open_out function receives is the path to the file we want to open. If the file does not exist, it will be created. In its implementation, open_out uses the open_out_gen function which has the type:

OCAML

val open_out_gen : open_flag list -> int -> string -> out_channel

What this signature means is that as its first argument, the open_out_gen function receives a list of flags of type open_flag; this flag signals our intention with the file.

OCAML

type open_flag =
    Open_rdonly | Open_wronly | Open_append
  | Open_creat | Open_trunc | Open_excl
  | Open_binary | Open_text | Open_nonblock

Why do we need to specify in advance whether we are opening this file for reading or writing? Remember that we are not opening the file directly; we are asking the operating system to open the file for us. The answer to the question lies in the operating system's responsibility for mediation and multiplexing.

The file system may have some security restriction — for example, allowing a process to read a file but not allowing it to write to it. The operating system is responsible for enforcing these restrictions, and it performs permission checks at the moment the file is opened.
Or our process may not be the only one accessing the file. Two processes can read a file simultaneously, but two processes trying to write to the same file at the same time can result in a disaster. The operating system keeps track of all file handles, and whether they are for reading or writing, to avoid conflicts.

The second parameter of the open_out_gen function is an integer that represents the file permissions; the value is given in octal — for example, 0o775, which is the value we used in the get_dir function. The third parameter is the path to the file we want to open. The implementation of open_out uses the flags Open_wronly, Open_creat, Open_trunc, and Open_text, and as permission uses 0o666. This means that the file will be opened for writing; if it does not exist, it will be created; if it exists, it will be truncated; and the file will be opened in text mode. The permission 0o666 means that the file will be created with read and write permissions for the file owner and for the owner's group.

The result of the open_out function is an out_channel, which is an abstract type that represents an output channel, which we named oc. output_string needs to know the out_channel to write to.

The open_out_gen function under the hood uses the Linux open function⁴. The return value of the open function is a file descriptor, which is an integer — an identifying number that the operating system assigned to the file we opened. In a way, we can think of this relationship as being equivalent to a person's CPF (a Brazilian national identification number) — the CPF is a unique identifier for each person. We have to pass this number as an argument to every subsequent system call that belongs to this specific interaction mediated by the operating system with the file.

Writing a message to the console is a form of I/O, and it also involves an out_channel. In utop we can use the function like:

OCAML

print_string "hello";;

The print_string function is a specialization of the more general function called output_string, which not coincidentally is the same function we used to write to the file. print_string is defined in terms of output_string and stdout.

OCAML

val print_string : string -> unit
let print_string s = output_string stdout s

What is stdout? When the operating system starts a process, it creates by default some "default" places for the process to read from and write to. The standard output stream is one of these, and stdout is a channel to it.

OCAML

let stdout = open_descriptor_out 1

Each process has its own stdout. What happens when a process writes to the standard output stream? It depends on the context. We generally think of it as "how you print messages to the terminal," because if we run a program at the command prompt, that is what will happen.

Suppose that main.ml has the following content:

OCAML

let () =
  print_string "Greeting!";

When you run the program, you will see that the message is printed to the terminal.

Bash

dune exec blog
Greeting! /blog#

But remember, a process never does anything by itself — it asks the operating system to do something on its behalf. Every I/O action is mediated by the operating system, and stdout is no exception. If we start a process in a context where the output is piped to a file, for example, then what is printed to stdout will not be written to the terminal. The same print_string being written to a file:

Bash

$ dune exec blog > greet.txt

Bash

$ cat greet.txt
Greeting!

Processes running in the background like servers frequently write their logs to stdout with the expectation that the operating system will store those logs in the daemon log.

Closing a channel: once we are done writing, we use close_out to notify the operating system that we no longer need the handle. For our small example, it is not necessary to close the channel, because the program ends right after writing, and when the program terminates the operating system closes all handles that the process owns. But in a larger program, with a longer execution time, it is important to close handles that are no longer needed, because the operating system needs to keep track of processes in memory, and if the number of handles associated with the process grows indefinitely, the operating system may run out of memory.

Conclusion and Future Perspectives

As we conclude this introduction to OCaml, we have established the essential foundations for programming in the language, from setting up the environment to the initial exploration of input and output operations. This journey has provided us with a comprehensive view of the interaction between OCaml and the operating system, as well as the fundamental I/O capabilities, setting the stage for future explorations.

This is just the beginning. In the upcoming chapters, we will dive into more advanced concepts and explore the vast range of functionalities that OCaml has to offer. Stay tuned for the continuation of this series, where we will deepen our knowledge and skills, aiming to create complex and efficient applications in OCaml. The journey through OCaml programming continues, promising deeper and more enriching discoveries. Until then, I invite you to experiment, explore, and become even more familiar with what we have already learned.

Footnotes

utop is an interactive environment for OCaml that surpasses the standard ocaml REPL in terms of functionality and usability. For more details, see the official documentation here. ↩
A handle is an abstract reference, intended to serve as a pointer to a specific resource. This resource can vary from a block of memory or an object managed by another system, such as an operating system or a database. In short, the handle facilitates access to and manipulation of these resources without requiring the programmer to have detailed knowledge about the internal implementation of the resource in question. ↩
A Process is, in essence, an abstraction conceived by the operating system, representing the execution of a program. At any point in time, we can summarize a process by taking an inventory of the different parts of the system that it accesses or affects during the course of its execution. To understand what constitutes a process, we need to understand what a machine state is: what a program can read or write while it is running. Which components of the machine are important for the execution of the program? The instructions reside in memory, as do data manipulated by the program — with its address spaces — which is obviously an important component of the machine state. Registers also constitute the machine state, with various instructions that explicitly perform reads and writes to registers. Notably, certain special registers are crucial to the composition of the machine state. Examples include the program counter which points to the next instruction to be executed, the stack pointer and the frame pointer which are used to manage the stack for function call parameters, local variables, and return addresses. Finally, the program frequently interacts with persistent storage devices. These I/O operations may include a list of files that the process currently has open. ↩
Strictly speaking, this binding depends on the operating system being used. ↩