Experience of running hybrid CHERI userspace on seL4

01 Jan 2023

In this article, I share the experience of getting to a point where we could run userspace applications in seL4, which had CHERI capabilities enabled(Hybrid mode). CHERI [6] is an architectural extension that adds HW capabilities to modern processors. Morello [7] is the ARM implementation of CHERI. SeL4 [8] is a formally verified microkernel widely deployed in security-critical applications. Since seL4 is formally verified, it might not benefit from CHERI capabilities, but the user application running on seL4 will still benefit.

The model CheriBSD [9] provides for implementing an operating system on CHERI is not well matched to seL4, as CheriBSD is a monolithic kernel and seL4 is a microkernel kernel. In particular, CheriBSD’s monolithic architecture does not require passing capabilities between processes, while seL4’s microkernel architecture makes passing capabilities a hard requirement.

We designed a three-stage effort to port seL4 to Morello.

Run a hybrid userspace application. We do this because it requires minimal change to the userspace, but it still requires kernel changes.
Run a purecap userspace application which will require porting of the userspace libraries.
Explore the mapping of CHERI caps to seL4 caps. We decided that this was not the right thing to do.

To date, we have completed the first stage. We successfully use CHERI capabilities within the seL4 userspace and identify challenges that must be met to complete Stages 2 and 3.

BACKGROUND

This work lies at the intersection of capability-based microkernels and capability in hardware. In this section, we describe seL4 – a capability-based microkernel, and CHERI – a hardware platform with capability-enabled instructions.

seL4

SeL4 is a formally verified capability-based microkernel. In a microkernel, most functionality provided by a typical monolithic kernel (such as file system, networking, etc) is provided by user-space servers. In a capability-based system, access to system resources is granted by unforgeable tokens called capabilities.

CHERI

CHERI (Capability Hardware Enabled RISC Instruction) is a hardware instruction set extension that provides hardware capabilities. CHERI capabilities are 128 bits in size. They contain a virtual address, the size of the accessible region, and permission with which that region can be accessed. CHERI protects its pointers by enforcing three properties -– provenance, integrity, and monotonicity.

Provenance means that capabilities can only be derived from other valid capabilities and cannot be forged from an address.
Integrity means that capabilities stored in memory cannot be modified. This is achieved by using tagged memory.
Monotonicity requires that, when a capability is stored in a register, it is only possible to reduce its bounds and permissions, e.g., a read-only capability cannot be turned into a read-write one.

| Sealed Capabilities: CHERI capabilities have a sealed bit in the 128 bits. When this bit is set, the capability can be neither sealed nor changed.

EXPERIENCE AND LESSONS

Our port of seL4 to Morello proceeded in three stages. Next, we describe the three stages and the lesson learned in each stage.

Hybrid Userspace

Hybrid userspace does not need changes in user applications but still requires changes in the kernel. Regardless of the compilation mode of the application, the operating system must be changed to ensure that the new registers are saved. Also, memory must be configured so that it does not cause failure in capability loads and stores. More details are presented in the appendix. CheriBSD is an adequate guide for this.

Purecap Userspace

To take advantage of the spatial memory safety provided by CHERI’s end goal is to compile the application in Purecap mode. In purecap mode, every pointer is a CHERI capability so, the application gets the benefits of high-level languages like Rust/Python with the performance of C/C++. Unlike the hybrid mode, CheriBSD did not provide an adequate reference for the seL4 port. CheriBSD allows the kernel to pass capabilities to the user space and vice versa via a system call. However, since passing pointers via IPC is not essential to the operation of a monolithic kernel, CheriBSD does not support it. It is also not a good idea for two user processes to exchange CHERI capabilities. Since CHERI capabilities are just virtual addresses and not tied to a particular address space, this would mean that two colluding pro- cesses could grow the range of virtual addresses they have access to. IPC is necessary for a microkernel, as most of the system call like features are implemented in user processes. See Figure 1

Fig 1: Purecap: Pointers are CHERI Caps	Fig 2: New Endpoint Permissions

So, we plan to extend the IPC mechanism in seL4. We will add new permissions to the endpoint capability in seL4 to allow/prevent sharing of CHERI capabilities via IPC. See Figure 2 and Figure 3.

Sealed capabilities are CHERI capabilities that cannot be dereferenced. A CHERI capability (say, a pointer to an array) can be sealed with another CHERI capability (called a sealing capability). Once sealed, the capability to the array cannot be dereferenced unless it is unsealed by the original sealing capability. CheriBSD splits its sealing cap range into only two ranges since there is only a user and a kernel. But in an uKernel system, we have the kernel, system services, and user applications, so we need to split the sealed caps into at least three ranges. Figure 4 When the CPU boots up, a root capability is put in a well-known register. This root capability has access to the entire virtual address range with all permissions. CheriBSD keeps this root capability in the kernel and gives a limited capability to the userspace. But we give the root CHERI cap to root-task to mimic what seL4 does with seL4 caps. In seL4, the root-task gets the capabilities to any memory not used by the kernel. The kernel does not make any caps to the memory it uses during boot. Therefore, the root task has all the available seL4 capabilities in the system. In that spirit, we hand over the root CHERI capability to the root-task.

Fig 3: Purecap: Restricted Endpoints	Fig 4:Fine Grained Sealing Cap ranges

CHERI Vs. seL4 capabilities

	CHERI	seL4
In simple terms,	A fat pointer to a Virtual Address range	A token to do an operation on a system resource
Implementation Layer	Hardware	Kernel
Propagation	Trivial: Register Load/Store	Trivial: Mint via kernel
Restriction	Trivial: Register operation	Trivial: Retype via kernel
Revocation	Hard: Scan all of the memory	Trivial: Scan all CDT
Accessibility	Hard: Scan all of the memory	Moderate: Scan all cap lists
Protection Domain	All Caps in accessible memory	All caps in the CSpace

Table 1: Comparing seL4 and CHERI caps

Capabilities in seL4 are tokens for calling methods on system objects. They are implemented in the kernel using the system call mechanism. Revocation is an essential feature of the seL4 capability model, and the kernel maintains a Capailability Derivation Tree (CDT) to assist in revocation. CHERI capabilities do not maintain a derivation tree. For example, two capabilities for the same array are independent, and there is no single palace to revoke it short of scanning the entire memory range.

We summarize these differences in the table above. Due to these fundamental differences, the CHERI capabilities are not a drop-in replacement for the seL4 capabilities.

Having said that, some parts of the seL4 capability system could be augmented with CHERI capabilities. For instance:

Sealed Capabilities instead of the C-slot identifiers.
Using Sealed Capabilities for control-flow transfer from kernel to userspace.

However, it is unclear whether:

There will be any performance improvements
Would that make seL4 a Single Address Space OS(SASOS)? Especially if the user/kernel boundary gets fuzzy.

Closing remarks

This work is a small step in getting C/C++ user applications on seL4 to use CHERI features and get additional security guarantees. We also looked at using CHERI capabilities as a replacement for seL4 capabilities and concluded that they are fundamentally different.

Performance

Since we have done QEMU development work, which is not cycle-accurate, we did not collect performance numbers. Instead, we calculate the number of extra instructions required for each operation and the extra memory usage due to larger registers.

Developer Effort

So far, we have spent approximately four months of full-time development effort between two developers. We spent most of the initial time understanding the details of seL4 boot-up, context-switch, and process startup code. Similarly, time was spent understanding the intricacies of the CHERI capabilities and setting up the toolchain to compile seL4 with CHERI. So far, we have modified about 500 lines of code. For context, the kernel is about 10K LoC, and the userspace is 50K LoC. About 30% of the diff is in the ARM assembly, and the rest in C.

Impact on Formal Verification

We added our changes to the aarch64 platform-specific code of seL4, invalidating the formal verification of seL4. We have tried to keep the changes in the kernel to the most essential. But do not have a good sense of how much effort is needed to reverify the kernel.

Trying it out

A readme is available here

Continue the conversation on the #seL4 channel on the CHERI Slack.

APPENDIX

A list of kernel and userspace changes for Hybrid Userspace

Enable Morello instructions, without which every new Morello instruction will fail.
Enable the loading and storage of CHERI capabilities in memory.
Update the context-switch code to save and restore new registers.
Update system calls to handle new registers.
Setup Program Counter Capability(PCC) and Default Data Capability(DDC) when starting a new process.
Pass part of the MAX_CAP to the user process.

Acknowledgments

This work was done during my internship at ARM Research in the summer of 2022. During this internship, I was mentored by Chris Haster and Eric VH. I also received constant guidance and help from Jessica Clark, John Baldwin, and Robert Watson from the CHERI community.

The video[5] of the final presentation is available.

REFERENCES

[1] The kernel(bulk of changes): diff

[2] The boot loader (early boot LC/SC register setting): diff

[3] seL4_libs bringing over some utils like strfcap: diff

[4] How to setup a CHERI process(basically setup pcc/ddc): link

[5] Internship Presentation Video

[6] CHERI Page

[7] Morello

[8] seL4 Microkernel

[9] CHERI BSD