Infinite Git repos on Cloudflare Workers

We're building Gitlip - the collaborative devtool for the AI era. An all-in-one combination of Git-powered version control, collaborative coding and 1-click deployments. Our goal is to simplify the practical application of state-of-the-art AI models.

We're preparing to raise a seed round. Reach out to @nataliemarleny for more information.

In this post we will describe how we implemented infinite Git repos on Cloudflare using a new type of serverless database: a highly optimized WebAssembly Git server that runs on Cloudflare Workers and scales horizontally. It allows us to easily host an infinite number of repositories. Additionally, since it runs on Cloudflare our Git server supports IPv6 by default. For comparison, GitHub doesn't yet support IPv6.

Currently we are leveraging this technology to build a coding platform. We're also considering creating a serverless Database as a Service (DBaaS) offering, which would allow anyone to create an arbitrary number of Git repositories in the cloud and use them in their own product. If you'd be interested in a DBaaS product like this, please reach out to @nataliemarleny!

Motivation

Originally, while working on a note-taking application for developers based on Git, we encountered the need to host Git repositories efficiently. Wanting to avoid managing the servers ourselves, we experimented with a serverless approach. After researching, we couldn't find anyone attempting something similar, so besides its potential usefulness, it also seemed like an interesting problem to solve.

We're big fans of Cloudflare and their Workers platform and we were aware of Durable Objects. In terms of the usage model, likely access patterns, and general philosophy, Durable Objects seemed like the perfect underlying storage for a project like this.

We consider Durable Objects a novel and revolutionary type of storage. It offers a key-value store that is transactional, strongly consistent, and persistent. It's tightly integrated with the Workers runtime and is suitable for all sorts of coordination and application-data use cases. Given its usefulness, we fully expect that other cloud providers will offer a comparable type of storage alongside their serverless offerings in the future.

When we started our research, we knew that Cloudflare had built D1 (their SQLite database offering) on Durable Objects. In addition to our early experiments with Durable Objects, this made us confident that what we intended to implement was feasible, so we made it our goal to host a Git repository within a Durable Object.

Git in Cloudflare Workers

Cloudflare Workers is a serverless platform based on the V8 JavaScript engine, which can also execute Wasm binaries, so when attempting to run Git in this environment, we had somewhat limited options. We tried a few different approaches, but in the end, there were only two legitimate candidates:

libgit2 - a cross-platform, linkable Git library written in C,
isomorphic-git - a pure JavaScript implementation of Git.

We judged that it would be easier to start with isomorphic-git, but that the initial up-front investment in making libgit2 work might pay off more significantly, since libgit2 is used much more widely and is more battle-hardened. Prior to our attempts, other developers had already made libgit2 work in Node.js and browsers (see wasm-git), which further encouraged us that we were on the right path.

We ended up compiling libgit2 with Emscripten and packaging it for Cloudflare Workers.

Git uses the filesystem as its underlying storage, so the next step was to implement a filesystem on top of Durable Objects. A big hurdle we encountered at this point was how I/O is handled in modern JavaScript (using promises or async/await) versus how filesystem I/O is expected to work in Emscripten (synchronous system calls). Emscripten offers two mechanisms for using asynchronous JavaScript function calls in synchronous C functions: JSPI and Asyncify. After extensive research, we rewrote significant parts of Emscripten to support asynchronous file system calls. We ended up creating our own Emscripten filesystem on top of Durable Objects, which we call DOFS. Having a filesystem on Durable Objects is very useful for running a Git server, but it also unlocks many other interesting possibilities (we will write about this in the future).

A diagram of Git in Cloudflare Workers — Simplified architecture of Git repos in Durable Objects

Implementing a Git server

Compiling libgit2 to Wasm and implementing a filesystem on top of Durable Objects was a good start, but we needed to do more work to make the entire project useful. At this point, our Git implementation in Cloudflare Workers could store a repo in a Durable Object and communicate with the outside world via custom HTTP operations (read file, list branches etc.), but it couldn't communicate using the Git protocol, so we couldn't use the Git command-line tool to fetch or push.

libgit2 is an excellent Git library, but it only provides client functionality; server functionality is missing. We couldn't find any other implementations of Git server functionality on the web to use as a reference. While Git itself implements the server commands receive-pack and upload-pack, porting them to libgit2 proved impossible. The main reason was that the required server commands depended on numerous other source files, and the interfaces between these files appeared poorly defined. Edward Thomson, the maintainer of libgit2, has written an excellent article on the history of libgit2, detailing Git's issues in more depth.

We abandoned the porting efforts and ended up implementing the missing Git server functionality ourselves by leveraging libgit2's core functionality, studying all available documentation, and painstakingly investigating Git's behavior. We also created an extensive integration test suite to ensure the robustness and performance of our Git server.

Reproducible builds

Compiling native libraries like libgit2 for a specific target platform requires a significant amount of preparation, especially when the library itself needs to be compiled with a modified version of the compiler. We found it cumbersome to maintain the repeatability of the entire build process by manually invoking build commands for each component. Luckily, amazing software built for this exact purpose already exists - enter Nix, a declarative, purely functional build system which enables reproducible builds.

We've built our build system by utilizing Nix. This allowed us to reproducibly build patched Emscripten, patched libgit2, and the C implementation of our Git server from scratch by invoking just a single command. Not only that, but Nix has also enabled us to fine-tune this build process and make it configurable with flags passed to the build command. We can now build our Git server from scratch targeting the native platform (Linux or Mac), Node.js, or Cloudflare Workers. We can also easily configure whether we want the release or development build of the entire package tree. Additionally, Nix allows us to build only a subtree of packages, enabling us to intervene mid-build to make necessary modifications. The learning curve for Nix is steep, but well worth it. If you're interested in a similarly powerful tool that's easier to get started with, flox is an excellent option.

We discovered a beautiful, initially unintended consequence of building our package tree with Nix: our initial investment into creating our build system made it possible for us to compile a broad array of interesting native libraries to WebAssembly using our modified Emscripten compiler. So far, we've compiled zlib, libarchive, and libmagic to Wasm and statically linked them with our Git server. As a result, our Git server can create archives in many different formats (we currently only use zip and tar.gz) and easily detect MIME types for a vast array of stored files. Nixpkgs is full of Nix scripts for building various software packages, and having invested time in creating our build system, we could now compile the ones we needed to Wasm by making further modifications.

Finally, we also compiled QuickJS to Wasm using our Nix build system. We use our QuickJS-based service to run JavaScript files with full support for ES modules' import/export statements in Cloudflare Workers on-demand (more in the next section).

Composable capabilities

One of the core design principles in our codebase is to invest effort in building powerful, composable capabilities and curate them carefully in our repository. Having a strong set of these capabilities opens up interesting combinations especially when using a platform like Cloudflare Workers, which makes composition easy.

Example #1: We developed our Git server with the intention of serving a single repository from a Durable Object, but after accomplishing this, it was now possible for us to package the same HTTP endpoints and Wasm and expose it from a plain Worker instead of a Durable Object. This way, we gained the ability to run the same exact Git code in either a persistent context (clients connect to the same Durable Object) or an ephemeral context (clients connect to any Worker closest to them). There are use cases where executing Git functionality in an ephemeral context is useful, and where sending the request to a Durable Object would be the wrong choice. For example, when validating a branch or tag name in the Web UI, there's no need to reimplement these Git-specific rules in JavaScript if we can expose the exact upstream libgit2 behavior in the ephemeral Worker closest to the Web UI user.

Example #2: If you visit https://gitlip.com/@nataliemarleny/test-repo/ref/HEAD/main.js, you'll see an option to execute this file by pressing "play" on the right. Alternatively, here's a quick video:

Video shows the interaction of composable capabilities

Several of our composable capabilities work together to achieve the final result:

api receives the request to execute main.js from the HEAD of the repo.
api coordinates the services (using service bindings).
git-server service receives a request for an archive of the HEAD of the repo and streams the tar.gz snapshot of the HEAD back to the api.
api forwards the tar.gz stream to the js-run service (QuickJS-based).
js-run service unpacks the archive stream into memory.
js-run service runs the requested file from memory (note that main.js imports fizzbuzz.js!) and streams the response back to the api.
api streams the response back to the user.

A diagram of composable capabilities — Detailed interaction between composable capabilities

For now, executing JavaScript in an on-demand manner like this is just a showcase of what we can easily achieve with our stack of capabilities, but in the future, we plan to make this more powerful by adding support for importing NPM modules and more.

Optimizations

Achieving predictable performance from our Git server required applying several optimization techniques. We'll outline the most important ones.

Like any other serverless platform, Cloudflare Workers come with their own set of constraints. For the purposes of running a Git server, the most important ones are the Worker size (total size of the deployed code), memory, and CPU limit.

In September 2023, the Worker size limit on the Paid plan was increased from 1MB to 10MB, but Cloudflare still recommends keeping your entire deployment under 1MB for best performance. Our Wasm Git server, along with libgit2, all other libraries, and JavaScript glue code, fits in just 800kB, which we consider to be quite an achievement. We achieved this primarily by optimizing for code size during compilation and trimming the number of file formats our libmagic utility can detect.

The runtime limit of 128MB of memory initially posed a challenge. libgit2 makes heavy use of memory mapping when reading or writing objects to Git packfiles. Unfortunately, in a Wasm application compiled with Emscripten, memory mapping requires a completely new copy of the file in memory (even if the file is already in memory). This meant our Git server would copy the entire Git packfile into memory when reading even the smallest objects, causing the server's performance to depend on the packfile size rather than the object size. We attempted to address this by modifying Emscripten, but it proved too difficult, so we opted to modify libgit2 instead. We removed all mmap equivalents and replaced them with read/write equivalents, and the results were incredible. We achieved performance independent of the packfile (repo) size. Note that memory mapping makes total sense in the typical settings for which libgit2 was designed.

Durable Objects are single-threaded, so one might think it would be difficult to efficiently serve concurrent requests to the same repository. Fortunately, the access patterns to a Git repository are well-suited for optimization with a cache. For this purpose we created a component called ConsistentCache, which wraps around Cloudflare's HTTP Cache API (available in every Worker and Durable Object) and adds the necessary consistency guarantees. This component also deduplicates calls to the Git Wasm program, issuing a single call and relaying the response to all requestors in parallel. Using this technique, a significant number of requests to the Git server are fulfilled directly from the cache, and any modification to the repository purges this cache consistently.

Persistent storage in Durable Objects has its own built-in caching layer, which improves overall performance and provides additional consistency guarantees. Unfortunately, all reads that hit this built-in cache are billed the same as accessing the underlying storage. libgit2 specifically, and Git more generally, often need to read small chunks of a file incrementally, resulting in a large number of small reads, which became somewhat expensive during testing. We decided to implement our own storage cache, called StorageEngine, and completely disable the built-in cache. This way, we pay nothing for most of the operations our Git server performs on DOFS, only incurring costs for the occasional flush that writes all inodes and file blocks to storage and for occasional reads that populate the StorageEngine.

Finally, we optimized the implementation of our bare Git repositories to always contain a very limited (mostly constant) number of directories. This allowed us to preload all directories and pack-index files from the persistent storage (depending on the repo between 20 and 60, each up to a few kB in size) every time the Durable Object is instantiated, effectively pre-warming the repository for any Git command it might receive.

(log) StorageEngine.get (1/1): [HTTP_CACHE_LRU_MAP]
(log) StorageEngine.get (4/4): [NNID, ..., PRECACHE]
(log) StorageEngine.get (48/48): [N_1, ..., B_463_0]
(log) StorageEngine.get (0/1): [N_1]
(log) StorageEngine.get (0/1): [N_2]
(log) StorageEngine.get (0/1): [N_350]
(log) StorageEngine.get (0/1): [N_3]
(log) StorageEngine.get (0/1): [N_6]
(log) StorageEngine.get (0/1): [N_349]
(log) StorageEngine.get (0/1): [B_349_0]
(log) StorageEngine.get (0/1): [N_10]
(log) StorageEngine.get (0/1): [B_350_0]
(log) StorageEngine.get (0/1): [N_7]
(log) StorageEngine.get (0/1): [N_463]
(log) StorageEngine.get (0/1): [B_463_0]
(log) StorageEngine.get (0/1): [N_5]
(log) StorageEngine.get (0/1): [N_25]
(log) StorageEngine.get (0/1): [N_24]
(log) StorageEngine.get (0/1): [N_4]
(log) StorageEngine.get (0/1): [B_25_0]
... previous line repeated 263 more times
(log) StorageEngine.get (1/1): [B_24_0]
(log) StorageEngine.get (1/1): [B_24_15]
(log) StorageEngine.get (0/1): [B_25_0]
(log) StorageEngine.get (1/1): [B_24_9]
(log) StorageEngine.get (0/1): [B_24_9]
(log) StorageEngine.get (0/1): [B_25_0]
... previous line repeated 8 more times
(log) StorageEngine.get (1/1): [B_24_10]
(log) StorageEngine.get (0/1): [B_24_10]
(log) StorageEngine.get (0/1): [B_24_9]
... previous line repeated 10 more times
(log) StorageEngine.get (0/1): [B_24_10]
(log) StorageEngine.get (0/1): [B_25_0]
... previous line repeated 8 more times
(log) StorageEngine.get (1/1): [B_24_12]
(log) StorageEngine.get (0/1): [B_24_12]
... previous line repeated 30 more times
(log) StorageEngine._syncBufferFinal START
(log) StorageEngine.put (1): [HTTP_CACHE_LRU_MAP]
(log) StorageEngine._syncBufferFinal END
(log) StorageEngine._syncBufferFinal START
(log) StorageEngine._syncBatchPut (1): [HTTP_CACHE_LRU_MAP]
(log) StorageEngine._syncBufferFinal END

Log for a typical request that reads a file from the repo: [lines 1-3] pre-warm the StorageEngine, [lines 4 and on] Git server reconstructs the file by reading 5 additional blocks from storage.

The optimizations above, along with a few others, ensure that the performance of our Git servers is both reasonable and reliable. Small read and write operations (think a typical README.md) over HTTP complete in under 150 ms, even without caching and regardless of the repository size.

Limitations

For now, our Git server is well-suited for repositories up to about 100 MB in size, which is more than enough for our specific use case. Beyond 100 MB, we encounter a few issues:

Single-threaded packing and unpacking of Git packfiles during clone, fetch, and push operations exceeds the time limit on requests to Workers if the packfiles are too large.
Fetch body streams are not full duplex, which unfortunately means that while we can theoretically clone and push any repository in our Git server, we may not be able to fetch from it. This is because the fetch operation in Git's smart protocol requires a full duplex channel for negotiating the optimal packfile to send. Fortunately, in repositories with up to 32 refs, this negotiation process never occurs.
Cloudflare Workers support only HTTPs for now - so we can't support cloning, fetching and pushing over SSH without meaningfully complicating our infrastructure.

We believe the above limitations are solvable in the long term, and that in the future, we could adjust our Git server to handle repositories of arbitrary size and support SSH.

Demo

To preview this in production, feel free to explore our Gitlip public profiles:

Please note that the current performance is constrained by the fact that our primary database is not hosted on Cloudflare, and calls to it dominate the latency of most requests. We expect to reduce the overall latency of most requests by 50% to 75% through further optimizations, which we'll write about in the future.

Future

Having a serverless and infinitely horizontally scalable Git server infrastructure opens many possibilities for products built on top of it. We believe Git is underutilized for storage purposes, given its versioning capabilities and the fact that it stores plain files, which can be in any format suitable for the application.

An additional benefit of achieving a performant Git server in JavaScript and Wasm is the fact that our server already mostly works directly in the browser itself. This opens up exciting possibilities: imagine having a lightweight Git client as part of a PWA which can shallowly clone a remote repository to allow local editing even when offline. We plan to explore this further down the line.

Conclusion

We're just getting started! Stay up to date with our journey of building Gitlip by following @nataliemarleny.

None of this would be possible without the amazing open-source software and the even more amazing communities and companies that produce it, most notably: libgit2, Emscripten, Nix, and the Cloudflare Workers platform. We're very grateful to work with such incredible tools.

Thanks to Edward Thomson (@ethomson), Sunil Pai (@threepointone), Chris Nicholas (@ctnicholasdev) and Tim Neutkens (@timneutkens) for reading drafts of this post.