Securing TEE Apps: A Developer’s Guide

Original author: prateek, roshan, siddhartha & linguine (Marlin), krane (Asula)

Compiled by: Shew, GodRealmX

Trusted Execution Environments (TEEs) have become increasingly popular since Apple announced its private cloud and NVIDIA provided confidential computing in its GPUs. Their confidentiality guarantees help protect user data (which may include private keys), while isolation ensures that the execution of programs deployed on them cannot be tampered with - whether by humans, other programs, or the operating system. Therefore, it is not surprising that the Crypto x AI field has been using TEEs to build products.

Like any new technology, TEE is going through a period of optimistic experimentation. This article hopes to provide developers and general readers with a basic conceptual guide to understand what TEE is, the security model of TEE, common vulnerabilities and best practices for using TEE safely. (Note: In order to make the text easy to understand, we have consciously replaced TEE terms with simpler equivalents).

What is TEE

TEE is an isolated environment in a processor or data center where programs can run without any interference from the rest of the system. In order to prevent TEE from being interfered with by other parts, we need a series of designs, mainly including strict access control, that is, controlling the access of other parts of the system to programs and data in TEE. Currently, TEE is ubiquitous in mobile phones, servers, PCs, and cloud environments, so it is very easy to access and affordable.

The above may sound vague and abstract. In fact, different servers and cloud vendors implement TEE in different ways, but the fundamental purpose is to prevent TEE from being interfered with by other programs.

TEE Concise Handbook: From Basic Concepts to Best Practice Guide for Safe Use

Most readers probably use biometric information to log in to a device, such as unlocking a phone with a fingerprint. But how can we ensure that malicious applications, websites, or jailbroken operating systems cannot access and steal this biometric information? In fact, in addition to encrypting the data, the circuits in the TEE device do not allow any program to access the memory and processor areas occupied by sensitive data.

Hardware wallets are another example of TEE application scenarios. The hardware wallet is connected to the computer and communicates with it in a sandbox, but the computer cannot directly access the mnemonics stored in the hardware wallet. In both cases, users trust the device manufacturer to correctly design the chip and provide appropriate firmware updates to prevent the confidential data in the TEE from being exported or viewed.

Security Model

Unfortunately, there are many different TEE implementations, and these different implementations (IntelSGX, IntelTDX, AMD SEV, AWS Nitro Enclaves, ARM Trust Zone) all require independent security model modeling and analysis. In the rest of this article, we will mainly discuss Intel SGX, TDX, and AWS Nitro, because these TEE systems have more users and complete development tools available. The above systems are also the most commonly used TEE systems in Web3.

In general, the workflow for an application deployed in a TEE is as follows:

  1. A "developer" writes some code, which may or may not be open source.
  2. Developers then package the code into an Enclave Image File (EIF), which can be run in the TEE.
  3. EIF is hosted on a server with a TEE system. In some cases, developers can directly use a personal computer with TEE to host EIF and provide external services.
  4. Users can interact with the application through a predefined interface.

TEE Concise Handbook: From Basic Concepts to Best Practice Guide for Safe Use

Obviously, there are three potential risks here:

  • Developers: What is the purpose of the code used to prepare EIF? The EIF code may not conform to the business logic advertised by the project party and may steal the user's private data.
  • Server: Does the TEE server run the EIF file as expected? Or is the EIF actually executed inside the TEE? The server may also run other programs inside the TEE.
  • Supplier: Is the TEE designed to be secure? Is there a backdoor that could leak all the data in the TEE to the supplier?

Fortunately, TEE now has solutions to eliminate the above risks, namely reproducible builds and remote attestations.

So what is a repeatable build? Modern software development often requires importing a large number of dependencies, such as external tools, libraries, or frameworks, and these dependency files may also have hidden dangers. Now solutions such as npm use the code hash corresponding to the dependency file as a unique identifier. When npm finds that a dependency file is inconsistent with the recorded hash value, it can be considered that the dependency file has been modified.

TEE Concise Handbook: From Basic Concepts to Best Practice Guide for Safe Use

Repeatable builds can be considered as a set of standards. The goal is that when any code is run on any device, as long as it is built according to the pre-defined process, a consistent hash value can be obtained in the end. Of course, in practice, we can also use products other than hashes as identifiers, which we call code measurement here.

Nix is a common tool for repeatable builds. When the source code of a program is made public, anyone can check the code to ensure that the developer has not inserted any abnormal content. Anyone can use Nix to build the code and check whether the built product has the same code metrics/hash as the product deployed in the production environment by the project party. But how do we know the code metrics of the program in the TEE? This involves a concept called "remote proof".

TEE Concise Handbook: From Basic Concepts to Best Practice Guide for Safe Use

Remote attestation is a signed message from the TEE platform (a trusted party) that contains the program’s code metrics, TEE platform version, etc. Remote attestation lets an external observer know that a certain program is being executed in a secure location (the real TEE of version xx) that is not accessible to anyone.

TEE Concise Handbook: From Basic Concepts to Best Practice Guide for Safe Use

Repeatable builds and remote attestation allow any user to know the actual code running in the TEE and the TEE platform version information, thereby preventing developers or servers from doing evil.

However, in the case of TEEs, you always need to trust their vendors. If the TEE vendor behaves maliciously, remote attestations can be directly forged. Therefore, if you consider the vendor as a possible attack vector, you should avoid relying solely on TEEs, and it is better to combine them with ZK or consensus protocols.

The charm of TEE

In our opinion, the particularly popular features of TEE, especially its friendliness in deploying AI Agents, are mainly due to the following points:

  • Performance: TEE can run LLM models, and its performance and cost overhead are similar to those of ordinary servers. However, zkML requires a lot of computing power to generate LLM zk proofs.
  • GPU support: NVIDIA provides TEE computing support in its latest GPU series (Hopper, Blackwell, etc.)
  • Correctness: LLMs are non-deterministic; entering the same prompt word multiple times will give different results. Therefore, multiple nodes (including observers trying to create fraud proofs) may never reach a consensus on the results of the LLM. In this scenario, we can trust that the LLM running in the TEE cannot be manipulated by malicious actors, and the program in the TEE always runs as written, which makes TEE more suitable than opML or consensus to ensure the reliability of LLM reasoning results.
  • Confidentiality: Data in the TEE is not visible to external programs. Therefore, private keys generated or received in the TEE are always safe. This feature can be used to assure users that any message signed by this key comes from the internal program of the TEE. Users can safely entrust their private keys to the TEE and set some signing conditions, and they can confirm that the signature from the TEE meets the pre-set signing conditions.
  • Networking: With some tools, programs running in a TEE can securely access the internet (without revealing queries or responses to the server running the TEE, while still providing third parties with guarantees of correct data retrieval). This is useful for retrieving information from third-party APIs, and can be used to outsource computation to trusted but proprietary model providers.
  • Write access: In contrast to zk solutions, code running in the TEE can construct messages (whether tweets or transactions) and send them out through API and RPC network access.
  • Developer-friendly: TEE-related frameworks and SDKs allow people to write code in any language and easily deploy programs to TEEs just like in cloud servers.

For better or worse, many use cases using TEEs currently have difficulty finding alternatives. We believe that the introduction of TEEs further expands the development space for on-chain applications, which may drive the emergence of new application scenarios.

TEE is not a silver bullet

Programs running in TEEs are still vulnerable to a range of attacks and bugs. Just like smart contracts, they are prone to a range of problems. For simplicity, we categorize possible vulnerabilities as follows:

  • Developer negligence
  • Runtime vulnerabilities
  • Architectural design flaws
  • Operational issues

Developer negligence

Whether intentionally or unintentionally, developers can weaken the security guarantees of programs in the TEE through deliberate or unintentional code. This includes:

  • Opaque code: TEE’s security model relies on externally verifiable code. Code transparency is critical for external third-party verification.
  • Code metrics are problematic: even if the code is public, it is not possible without a third party to reconstruct the code and check the code metrics values in the remote attestation and then check against the code metrics provided in the remote attestation. This is similar to receiving a zk proof but not verifying it.
  • Insecure code: Even if you are careful to correctly generate and manage keys in TEE, the logic contained in the code may leak the keys in TEE during external calls. In addition, the code may contain backdoors or vulnerabilities. Compared with traditional backend development, it requires high standards in software development and auditing processes, similar to smart contract development.
  • Supply chain attacks: Modern software development uses a lot of third-party code. Supply chain attacks pose a significant threat to the integrity of TEEs.

Runtime vulnerabilities

Even the most cautious developers can fall victim to runtime vulnerabilities. Developers must carefully consider whether any of the following will affect the security assurance of their project:

  • Dynamic code: It may not always be possible to keep all code transparent. Sometimes, the use case itself requires the dynamic execution of opaque code loaded into the TEE at runtime. Such code can easily leak secrets or break invariants, and great care must be taken to prevent this.
  • Dynamic Data: Most applications use external APIs and other data sources during their execution. The security model is extended to include these data sources, which are on the same level as oracles in DeFi, where incorrect or even outdated data can lead to disasters. For example, in the case of AI Agents, over-reliance on LLM services such as Claude.
  • Insecure and unstable communications: TEEs need to run within a server that contains the TEE components. From a security perspective, the server running the TEE is effectively a perfect man-in-the-middle (MitM) between the TEE and external interactions. Not only is the server able to peek into the TEE's outbound connections and see what is being sent, it can also censor specific IPs, restrict connections, and inject packets into the connection designed to trick a party into thinking it is coming from xx.

For example, running a matching engine that processes crypto transactions in a TEE would not be able to provide fair ordering guarantees (MEV-resistant) because routers/gateways/hosts could still drop, delay, or prioritize packets based on the IP address they originated from.

Architectural flaws

The technology stack used by TEE applications should be used with caution. When building TEE applications, the following problems may occur:

  • Applications with large attack surfaces: The attack surface of an application refers to the number of code modules that need to be fully secure. Code with a large attack surface is very difficult to audit and may hide bugs or exploitable vulnerabilities. This is also often in conflict with the developer experience. For example, a TEE program that relies on Docker has a much larger attack surface than a TEE program that does not rely on Docker. Enclaves that rely on mature operating systems have a larger attack surface than TEE programs that use the lightest operating system.
  • Portability and liveness: In Web3, applications must be censorship-resistant. Anyone can start a TEE and take over an inactive system participant, and make the application inside the TEE portable. The biggest challenge here is the portability of keys. Some TEE systems have a key derivation mechanism inside, but once the key derivation mechanism inside the TEE is used, then other servers cannot generate the keys inside the external TEE program locally, which makes the TEE program usually limited to the same machine, which is not enough to maintain portability.
  • Insecure root of trust: For example, when running an AI Agent in a TEE, how do you verify that a given address belongs to the Agent? If this is not carefully designed, the real root of trust may be an external third party or key escrow platform, rather than the TEE itself.

Operational issues

Last but not least, there are some practical considerations about how to actually run a server that executes a TEE program:

  • Insecure platform versions: TEE platforms occasionally receive security updates, which are reflected as platform versions in remote attestation. If your TEE is not running on a secure platform version, hackers can exploit known attack vectors to steal keys from the TEE. Even worse, your TEE may be running on a secure platform version today and insecure tomorrow.
  • No physical security: Despite your best efforts, TEEs can be vulnerable to side-channel attacks, which usually require physical access and control of the server where the TEE resides. Therefore, physical security is an important layer of defense in depth. A related concept is cloud attestation, where you can prove that the TEE is running in a cloud data center and that the cloud platform has physical security guarantees.

Building a secure TEE program

We divide our recommendations into the following points:

  • The safest solution
  • Necessary precautions to take
  • Recommendations that depend on the use case

1. The safest solution: no external dependencies

Creating a highly secure application may involve eliminating external dependencies, such as external inputs, APIs, or services, thereby reducing the attack surface. This approach ensures that the application runs in an isolated manner, without external interactions that could compromise its integrity or security. While this strategy may limit the functional diversity of the program, it can provide extremely high security.

This level of security can be achieved for most CryptoxAI use cases if the model is run locally.

2. Necessary precautions to be taken

Regardless of whether your application has external dependencies or not, the following are a must!

Treat TEE applications as smart contracts rather than backend applications; keep update frequency low and test rigorously.

Building TEE programs should be done with the same rigor as writing, testing, and updating smart contracts. Like smart contracts, TEEs operate in highly sensitive and tamper-proof environments where errors or unexpected behavior can lead to severe consequences, including complete loss of funds. Thorough audits, extensive testing, and minimal, carefully audited updates are essential to ensure the integrity and reliability of TEE-based applications.

Audit code and inspect build pipelines

The security of an application depends not only on the code itself, but also on the tools used in the build process. A secure build pipeline is critical to preventing vulnerabilities. TEE only guarantees that the provided code will run as expected, but it cannot fix defects introduced during the build process.

To reduce the risk, the code must be rigorously tested and audited to eliminate errors and prevent unnecessary information leakage. In addition, repeatable builds play a vital role, especially when the code is developed by one party and used by another. Repeatable builds allow anyone to verify that the program executed within the TEE matches the original source code, ensuring transparency and trust. Without repeatable builds, it is almost impossible to determine the exact content of the program executed within the TEE, thus compromising the security of the application.

For example, the source code for DeepWorm, a project that runs a worm brain simulation model in a TEE, is completely open. The execution program inside the TEE is built in a reproducible way using Nix pipelines.

Use audited or validated libraries

When handling sensitive data in TEE programs, use only audited libraries for key management and private data handling. Unaudited libraries may expose keys and compromise the security of your application. Prioritize well-reviewed, security-focused dependencies to maintain the confidentiality and integrity of your data.

Always verify attestations from the TEE

Users interacting with the TEE must verify the remote attestation or verification mechanism produced by the TEE to ensure secure and trusted interactions. Without these checks, the server may manipulate the response, making it impossible to distinguish between authentic TEE output and tampered data. Remote attestation provides key proof of the code base and configuration running in the TEE. We can judge whether the program executed in the TEE is consistent with expectations based on remote attestation.

The specific attestation can be verified on-chain (IntelSGX, AWSNitro), off-chain using ZK proofs (IntelSGX, AWSNitro), or by the users themselves or a managed service (such as t16z or MarlinHub).

3. Use case-dependent recommendations

Depending on the target use case of your application and its structure, the following tips may help make your application more secure.

Ensure that user interactions with TEE are always performed in a secure channel

The server where the TEE resides is inherently untrusted. The server can intercept and modify communications. In some cases, it may be acceptable for the server to read the data but not change it, while in other cases, even reading the data may be undesirable. To mitigate these risks, it is critical to establish a secure end-to-end encrypted channel between the user and the TEE. At a minimum, ensure that the message contains a signature to verify its authenticity and origin. In addition, users need to always check that the TEE gives remote attestation to verify that they are communicating with the correct TEE. This ensures the integrity and confidentiality of the communication.

For example, Oyster is able to support secure TLS issuance by using CAA records and RFC8657. In addition, it also provides a TEE-native TLS protocol called Scallop that does not rely on WebPKI.

Know that TEE memory is transient

TEE memory is transient, meaning that when the TEE is shut down, its contents, including encryption keys, are lost. Without a secure mechanism to preserve this information, critical data could become permanently inaccessible, potentially putting funds or operations at risk.

Multi-party computation (MPC) networks with decentralized storage systems such as IPFS can be used as a solution to this problem. MPC networks split the key across multiple nodes, ensuring that no single node holds the complete key while allowing the network to reconstruct the key when needed. Data encrypted with this key can be securely stored on IPFS.

If necessary, the MPC network can provide keys to a new TEE server running the same image, provided that specific conditions are met. This approach ensures resiliency and strong security, keeping data accessible and confidential even in untrusted environments.

TEE Concise Handbook: From Basic Concepts to Best Practice Guide for Safe Use

There is another solution, that is, TEE submits related transactions to different MPC servers separately, and the MPC servers sign them and aggregate the signatures and finally put the transactions on the chain. This method is much less flexible and cannot be used to save API keys, passwords, or arbitrary data (there is no trusted third-party storage service).

TEE Concise Handbook: From Basic Concepts to Best Practice Guide for Safe Use

Reduce attack surface

For security-critical use cases, it is worthwhile to try to reduce as many peripheral dependencies as possible at the expense of developer experience. For example, Dstack ships with a minimal Yocto-based kernel that includes only the modules required for Dstack to work. It may even be worthwhile to use older technologies like SGX (over TDX) because that technology does not require a bootloader or operating system to be part of the TEE.

Physical isolation

The security of TEEs can be further enhanced by physically isolating them from possible human intervention. While we can trust that data centers can provide physical security by hosting TEE servers in data centers and cloud providers, projects like Spacecoin are exploring a rather interesting alternative - space. The SpaceTEE paper relies on security measures such as measuring the moment of inertia after launch to verify that the satellite has not deviated from expectations during its orbital entry.

Multiple Provers

Just as Ethereum relies on multiple client implementations to reduce the risk of bugs affecting the entire network, multiprovers use different TEE implementations to improve security and resilience. By running the same computational steps across multiple TEE platforms, multiprovers ensure that vulnerabilities in one TEE implementation do not compromise the entire application. While this approach requires the computational process to be deterministic, or to define consensus between different TEE implementations in non-deterministic situations, it also provides significant advantages such as fault isolation, redundancy, and cross-validation, making it a good choice for applications that require reliability guarantees.

TEE Concise Handbook: From Basic Concepts to Best Practice Guide for Safe Use

Looking ahead

TEEs have clearly become a very exciting area to explore. As mentioned earlier, the ubiquity of AI and its continued access to user sensitive data means that large tech companies such as Apple and NVIDIA are using TEEs in their products and offering them as part of their offerings.

On the other hand, the crypto community has always been very security-focused. As developers try to scale on-chain applications and use cases, we have seen TEEs become popular as a solution that provides the right tradeoff between functionality and trust assumptions. While TEEs are not as trust-minimized as full ZK solutions, we expect TEEs to be the first avenue for slowly integrating products from Web3 companies and large tech companies.