Experience of running hybrid CHERI userspace on seL4
In this article, I share the experience of getting to a point where we could run userspace applications in seL4, which had CHERI capabilities enabled(Hybrid mode). CHERI  is an architectural extension that adds HW capabilities to modern processors. Morello  is the ARM implementation of CHERI. SeL4  is a formally verified microkernel widely deployed in security-critical applications. Since seL4 is formally verified, it might not benefit from CHERI capabilities, but the user application running on seL4 will still benefit.
The model CheriBSD  provides for implementing an operating system on CHERI is not well matched to seL4, as CheriBSD is a monolithic kernel and seL4 is a microkernel kernel. In particular, CheriBSD’s monolithic architecture does not require passing capabilities between processes, while seL4’s microkernel architecture makes passing capabilities a hard requirement.
We designed a three-stage effort to port seL4 to Morello.
- Run a hybrid userspace application. We do this because it requires minimal change to the userspace, but it still requires kernel changes.
- Run a purecap userspace application which will require porting of the userspace libraries.
- Explore the mapping of CHERI caps to seL4 caps. We decided that this was not the right thing to do.
To date, we have completed the first stage. We successfully use CHERI capabilities within the seL4 userspace and identify challenges that must be met to complete Stages 2 and 3.
This work lies at the intersection of capability-based microkernels and capability in hardware. In this section, we describe seL4 – a capability-based microkernel, and CHERI – a hardware platform with capability-enabled instructions.
SeL4 is a formally verified capability-based microkernel. In a microkernel, most functionality provided by a typical monolithic kernel (such as file system, networking, etc) is provided by user-space servers. In a capability-based system, access to system resources is granted by unforgeable tokens called capabilities.
CHERI (Capability Hardware Enabled RISC Instruction) is a hardware instruction set extension that provides hardware capabilities. CHERI capabilities are 128 bits in size. They contain a virtual address, the size of the accessible region, and permission with which that region can be accessed. CHERI protects its pointers by enforcing three properties -– provenance, integrity, and monotonicity.
- Provenance means that capabilities can only be derived from other valid capabilities and cannot be forged from an address.
- Integrity means that capabilities stored in memory cannot be modified. This is achieved by using tagged memory.
- Monotonicity requires that, when a capability is stored in a register, it is only possible to reduce its bounds and permissions, e.g., a read-only capability cannot be turned into a read-write one.
| Sealed Capabilities: CHERI capabilities have a sealed bit in the 128 bits. When this bit is set, the capability can be neither sealed nor changed.
EXPERIENCE AND LESSONS
Our port of seL4 to Morello proceeded in three stages. Next, we describe the three stages and the lesson learned in each stage.
Hybrid userspace does not need changes in user applications but still requires changes in the kernel. Regardless of the compilation mode of the application, the operating system must be changed to ensure that the new registers are saved. Also, memory must be configured so that it does not cause failure in capability loads and stores. More details are presented in the appendix. CheriBSD is an adequate guide for this.
To take advantage of the spatial memory safety provided by CHERI’s end goal is to compile the application in Purecap mode. In purecap mode, every pointer is a CHERI capability so, the application gets the benefits of high-level languages like Rust/Python with the performance of C/C++. Unlike the hybrid mode, CheriBSD did not provide an adequate reference for the seL4 port. CheriBSD allows the kernel to pass capabilities to the user space and vice versa via a system call. However, since passing pointers via IPC is not essential to the operation of a monolithic kernel, CheriBSD does not support it. It is also not a good idea for two user processes to exchange CHERI capabilities. Since CHERI capabilities are just virtual addresses and not tied to a particular address space, this would mean that two colluding pro- cesses could grow the range of virtual addresses they have access to. IPC is necessary for a microkernel, as most of the system call like features are implemented in user processes. See Figure 1
|Fig 1: Purecap: Pointers are CHERI Caps||Fig 2: New Endpoint Permissions|
So, we plan to extend the IPC mechanism in seL4. We will add new permissions to the endpoint capability in seL4 to allow/prevent sharing of CHERI capabilities via IPC. See Figure 2 and Figure 3.
Sealed capabilities are CHERI capabilities that cannot be dereferenced. A CHERI capability (say, a pointer to an array) can be sealed with another CHERI capability (called a sealing capability). Once sealed, the capability to the array cannot be dereferenced unless it is unsealed by the original sealing capability. CheriBSD splits its sealing cap range into only two ranges since there is only a user and a kernel. But in an uKernel system, we have the kernel, system services, and user applications, so we need to split the sealed caps into at least three ranges. Figure 4 When the CPU boots up, a root capability is put in a well-known register. This root capability has access to the entire virtual address range with all permissions. CheriBSD keeps this root capability in the kernel and gives a limited capability to the userspace. But we give the root CHERI cap to root-task to mimic what seL4 does with seL4 caps. In seL4, the root-task gets the capabilities to any memory not used by the kernel. The kernel does not make any caps to the memory it uses during boot. Therefore, the root task has all the available seL4 capabilities in the system. In that spirit, we hand over the root CHERI capability to the root-task.
|Fig 3: Purecap: Restricted Endpoints||Fig 4:Fine Grained Sealing Cap ranges|
CHERI Vs. seL4 capabilities
|In simple terms,||A fat pointer to a Virtual Address range||A token to do an operation on a system resource|
|Propagation||Trivial: Register Load/Store||Trivial: Mint via kernel|
|Restriction||Trivial: Register operation||Trivial: Retype via kernel|
|Revocation||Hard: Scan all of the memory||Trivial: Scan all CDT|
|Accessibility||Hard: Scan all of the memory||Moderate: Scan all cap lists|
|Protection Domain||All Caps in accessible memory||All caps in the CSpace|
Table 1: Comparing seL4 and CHERI caps
Capabilities in seL4 are tokens for calling methods on system
objects. They are implemented in the kernel using the system call
Revocation is an essential feature of the seL4 capability model,
and the kernel maintains a
Capailability Derivation Tree
(CDT) to assist in revocation.
CHERI capabilities do not maintain a derivation tree.
For example, two capabilities for the same array are independent, and
there is no single palace to revoke it short of scanning the entire memory range.
We summarize these differences in the table above. Due to these fundamental differences, the CHERI capabilities are not a drop-in replacement for the seL4 capabilities.
Having said that, some parts of the seL4 capability system could be augmented with CHERI capabilities. For instance:
- Sealed Capabilities instead of the C-slot identifiers.
- Using Sealed Capabilities for control-flow transfer from kernel to userspace.
However, it is unclear whether:
- There will be any performance improvements
- Would that make seL4 a Single Address Space OS(SASOS)? Especially if the user/kernel boundary gets fuzzy.
This work is a small step in getting C/C++ user applications on seL4 to use CHERI features and get additional security guarantees. We also looked at using CHERI capabilities as a replacement for seL4 capabilities and concluded that they are fundamentally different.
Since we have done QEMU development work, which is not cycle-accurate, we did not collect performance numbers. Instead, we calculate the number of extra instructions required for each operation and the extra memory usage due to larger registers.
So far, we have spent approximately four months of full-time development effort between two developers. We spent most of the initial time understanding the details of seL4 boot-up, context-switch, and process startup code. Similarly, time was spent understanding the intricacies of the CHERI capabilities and setting up the toolchain to compile seL4 with CHERI. So far, we have modified about 500 lines of code. For context, the kernel is about 10K LoC, and the userspace is 50K LoC. About 30% of the diff is in the ARM assembly, and the rest in C.
Impact on Formal Verification
We added our changes to the
aarch64 platform-specific code of
seL4, invalidating the formal verification of seL4.
We have tried to keep the changes in the kernel to the most essential.
But do not have a good sense of how much effort is needed to reverify the kernel.
Trying it out
A readme is available here
Continue the conversation on the
#seL4 channel on the CHERI Slack.
A list of kernel and userspace changes for Hybrid Userspace
- Enable Morello instructions, without which every new Morello instruction will fail.
- Enable the loading and storage of CHERI capabilities in memory.
- Update the context-switch code to save and restore new registers.
- Update system calls to handle new registers.
- Setup Program Counter Capability(
PCC) and Default Data Capability(
DDC) when starting a new process.
- Pass part of the
MAX_CAPto the user process.
This work was done during my internship at ARM Research in the summer of 2022. During this internship, I was mentored by Chris Haster and Eric VH. I also received constant guidance and help from Jessica Clark, John Baldwin, and Robert Watson from the CHERI community.
The video of the final presentation is available.
 The kernel(bulk of changes): diff
 The boot loader (early boot LC/SC register setting): diff
 seL4_libs bringing over some utils like strfcap: diff
 How to setup a CHERI process(basically setup pcc/ddc): link
 Internship Presentation Video
 CHERI Page
 seL4 Microkernel
 CHERI BSD