Webspaces: Rebooting the 3D Web
tl;dr: Webspaces is a new way to create self-hosted 3D worlds on the web using just HTML. Visit webspaces.space, join the Discord , play around in an example on glitch, or download a blank webspace to get started!
In The Secret Mozilla Hubs Master Plan, I laid out a possible minimum set of requirements that could lead to a web-based 3D metaverse growing as fast as the 2D web:
Real time, avatar based communication (voice + body language)
The ability to join the network independently
Addressable places
Dynamic mixed media in 3D space
Virtual environments
Today, I’m releasing my current best answer to satisfying these requirements, taking a new approach that tries to closely follow the initial conditions and goals of the web. I call it Webspaces.
A webspace is literally just a website, but one that has 3D worlds in addition to traditional 2D pages. By convention, webspaces live on .space domains. A webspace’s 3D worlds are made of plain HTML, no different than 2D web pages.
Webspaces can be created as files on your local computer and are deployed like a normal website, but can also be fully collaborative, multiplayer, and live-editable on the web, using WebRTC and y.js CRDTs. Here’s the simplest HTML 3D world you can start from to build out your webspace:
<html>
<head>
<script src="https://webspace.run"></script>
</head>
<body>
</body>
</html>
That’s it. Adding a magical script tag turns it into an interactive 3D world.
Just save that HTML to a file in a new folder (or download this zip), open it in Chrome¹, follow the prompts, and you’ll spawn in as an avatar. Hit slash (/) to create objects, paste content or URLs with Ctrl-V, or drag and drop files into the world and they will spawn in. The tips in the bottom right show you all the controls. (You can hide them with the ? key.) As you edit the world, your changes will be saved back to the HTML file, which can then be deployed, as-is, to the web.
Once you deploy your webspace to the Internet, it can be updated collaboratively with others who visit it using WebRTC. Or, you can just use it to hang out and chat. It can deploy itself back to the origin using a technique I call web writeback, in the spirit of original vision of the browser as an authoring tool. Right now, writeback to GitHub repositories is supported, but it will be easy to add new types like WebDAV or other common static hosting origins like S3. (Feel free to open an issue if you have a new origin type you’d like to support.)
Webspaces are sovereign, self-hostable virtual spaces that don’t just run on the web but are built into the web. The very first webspace with documentation and demos is live at webspaces.space. It’s just a bunch of static files hosted on Github Pages. Webspaces can be hosted anywhere a static website can, including on the permaweb.
The software is fully open source, licensed under the MPL, and is a derivative work of Mozilla Hubs. It also includes more than a year’s worth of closed source additions from Jel that I’ve now made open source. It’s my gift to the global community of makers building the open metaverse.
This post will describe in more detail what led to the design of the Webspace Engine, the core library which powers Webspaces. It’s a few compressed megabytes of Javascript that “upgrades” your browser to fully support multiplayer 3D HTML worlds. If you just want to dive in, check out the project’s webspace or download an empty webspace, unzip it, and open up the HTML file in Chrome to start building.
The challenge of self-hosting virtual spaces
The Mozilla Hubs project (which Webspaces is built on) has done an amazing job of providing an open source, collaborative mixed media 3D environment built for the web. However, in order for the web emergent metaverse to follow in the footsteps of the 2D web, the ability to join the network independently is key. If it’s hard or impossible to self-host, a few centralized actors will end up hosting everything, presenting grave danger of a few mediators one day becoming gatekeepers to the entire metaverse.
Hubs Cloud was an attempt to de-risk this by simplifying self-hosting of Mozilla Hubs. Aspirationally, the goal was to make self-hosting as easy as centralized hosting, using an insane amount of automation. If it worked, we felt it could lead to a global network of “hubs” growing similarly to the global network of early web servers.
While a solid attempt, most users found Hubs Cloud very challenging to set up. The many moving parts of the Hubs backend led to too much complexity which simply couldn’t be hidden away and often broke at the worst times. The net result was an excessive amount of effort, time, and cost needed to self-host Hubs, which was unfortunate.
To their immense credit, the team at Mozilla has continued to maintain Hubs Cloud and remains committed to supporting self-hosting of Hubs for technically savvy users. But the challenges of making self-hosting easy for everyone has lead Mozilla to begin offering centralized subscription hosting of Hubs. I fully understand the reasons behind this pivot. Sadly though, it means a widely decentralized future for Hubs now seems as unlikely as ever.
Part of the reason I started working on Webspaces is in search of other possible paths to that future. I think Webspaces now offers the easiest way for anyone to get a fully self-hosted virtual space onto the Internet.
After leaving Mozilla, I spent about a year working on a project called Jel to create the “video game for work.” With the rise of remote work during the pandemic, this seemed like an important problem to work on. It was a new project with distinct goals from Hubs. I wrote a post on some of the principles that went into building Jel.
Although Jel was never intended to be an open metaverse project, and never got any real traction in the workplace, I’ve carried much of that work forward into Webspaces. I’ve shut down Jel since Webspaces offers, basically, a superset of Jel’s functionality while also being trivial for anyone to self-host. I’ve open sourced all of it under the MPL, since Webspaces is a true open metaverse project. The work from Jel includes collaborative text panes, audio-based avatar lip syncing, networked torus worlds, and collaborative voxel-based object modeling and world building tools. Fork it, pull parts of it into your own projects, or just read the code to learn how it works!
So, what led me from working on a partially closed source SaaS product like Jel, back to an open source, open metaverse project like Webspaces?
The permaweb metaverse
Some time after pausing work on Jel, I discovered arweave and the idea of the permaweb: web pages that you could pay for today that would theoretically stay up forever by taking advantage of falling storage and hosting costs.
This led to a big idea:
Was it possible? Could you create an experience like Hubs or Jel, but delivered entirely through static HTML and Javascript, hosted forever as a static website, that could keep being visited long after you were gone?
The answer is: kind of. There are three limitations of the web platform that get in the way:
- Multiplayer peer-to-peer networking requires WebRTC signaling
- 3rd party content needs to be served through a CORS Proxy
- A small % of users will require a TURN server
Webspaces solves the first two of these problems by using a simple Cloudflare Worker, based on a library I developed called P2PCF. It is effortless to run and set up, and is dirt cheap (or, usually, free.) Out of the box, Webspaces uses a public worker I‘ve deployed, but you can deploy your own in a few clicks.
For TURN, Webspaces uses a free TURN server but you can specify your own TURN server as well. This is only necessary to support users behind certain NATs.
It turns out, with just a simple Cloudflare Worker and some Javascript, you can deliver a statically hosted virtual space with functionality competitive with Hubs or Jel.
If you put it on the permaweb, it should be possible to visit and explore it forever. Multiplayer would keep working as long as these generic services remained available somewhere on the Internet for you to use.
This was a bit surprising, but it does mean that all of the difficulty and complexity of setting up the Hubs backend for self-hosting virtual spaces with Hubs Cloud just… goes away.
This is great news for the web emergent metaverse!
Going back to the requirements:
Real time, avatar based communication (voice + body language)
The ability to join the network independently
Addressable places
Dynamic mixed media in 3D space
Virtual environments
For real time, avatar based communication, Webspaces uses peer-to-peer WebRTC (using P2PCF for signaling) and the Jel avatar system, which provides expressive faces and real time audio lip syncing. WebXR support will eventually be added back in for VR/AR support with hand & head tracking.
The ability to join the network independently comes with the above approach of static hosting and a generic, cheap Cloudflare worker you don’t even have to run yourself.
Addressible places comes for free since worlds are addressed via URLs.
So, what about dynamic mixed media in 3D space, and virtual environments? Let’s cover each of those next.
Dynamic mixed media
A key feature for virtual spaces is pasting in URLs or uploading images, links, documents, and 3D models:
Supporting 3D mixed media like this allows virtual spaces to inherit the full endowment of content from the Internet. It unlocks sharing, memetics, AI generation, and more. It’s an essential part of the future metaverse.
Hubs and Jel store the positions, URLs, and other metadata about media in a space in a simple scene graph data structure. Hubs uses glTF for the scene graph, and Jel used a proprietary data structure designed for ShareDB.
One of the first things I added to Jel was support for a new kind of media: floating HTML pages, labels, and banners, powered by Quill:
This led to the need to store HTML as part of the scene graph, where these floating text panes would live as HTML nodes inside of the rest of the 3D-oriented data structure.
I eventually realized the entire scene graph could be represented as HTML. For example, the position of objects could be encoded as CSS 3D Translation attributes, images could be represented as <img> tags, etc. Even physical units could be used, a critical requirement for VR support!
I quickly zeroed-in on a “dialect” of HTML for exporting and importing worlds in Jel. I imagined people passing around these HTML files to share and remix virtual spaces. This felt like a much more accessible, hackable approach than the glTF format of Hubs or the proprietary data structures of other projects.
But it wasn’t until I started thinking about Webspaces that I realized just how far you could take this concept.
Authoring for the web, through the browser
Before continuing, we have to take a quick detour back to the 1990s².
When the web was created, it was imagined first as a set of protocols for browsing Internet resources, with a focus on hypertext documents written in HTML, with a goal of unlocking global access to information. But what has been largely forgotten is the early impementators didn’t just imagine ubiquitous access to documents, but also ubiquitious authorship of documents.
The original vision of the web called for writing and publishing web pages back to your personal web server directly from the browser. The web would offer two modes: web browsing, and “web editing.” From one of the summaries of the early implementor meetings:
Brower-based authorship of self-hosted web pages never really took off. Projects like Netscape Composer and WebDAV are examples of some of the attempts to realize this goal during the early days of the web.
For Webspaces, I chose to revisit this goal. I already had proven that plain old HTML was viable for representing the scene graph of a 3D space with mixed media. Was it also possible today for a modern browser to offer a full authoring workflow for self-hosted HTML 3D worlds, without any browser extensions or custom downloads?
I saw four main requirements:
- Local editing: you should be able to open an HTML file for a world, edit it in the browser in 3D, and the browser should save the changes back to the file, like a WYSIWYG editor.
- Live editing: once it was published to the Internet, changing the world should also publish the updated the HTML back to the origin server, like WebDAV.
- Multiplayer editing: World editing should be fully multiplayer. You shouldn’t need write access to the origin to make changes if you had permission from someone who did, since they could submit your changes for you.
- Document coherence: While in the world, the browser’s document DOM should reflect the scene graph of the world, and nothing else. For example, hitting Ctrl-S to save it as a new HTML file should create a new, identical copy of the world.
So I began some prototyping to see if these could all be achieved today in a modern browser.
The browser as an HTML world editor
The simplest way to make a self-hosted web page is to edit an HTML file on your computer and then, once it’s ready, publish it to a web server. Similarly, the first requirement for Webspaces of local editing is to be able to edit locally saved HTML worlds using the browser.
The File System Access API¹ makes this possible. Once permission is granted to the folder containing your webspace, file handles are serialized to IndexedDB. For security reasons, you do have to re-approve write access on each page load. (Using the Tab and Space keys to approve the dialog helps make this less annoying.)
From there, the browser will save changes back to your local HTML files, just like a WYSIWYG editor. Once you’re ready, your webspace can be deployed to the web as static files.
The next requirement is live editing: modifying and saving the world once it is up on the web. This is done with web writeback: the ability for an HTML document to write changes to itself back to its origin. This is very similar in spirit to the “web editing” concept that was an original goal of the web.
Since webspaces usually will be served by static hosting services, I first investigated how popular hosting providers do deployments today. It turns out, most of them just deploy out of a GitHub repo. Conveniently, you can make commits to a GitHub repo right from the browser with the GitHub API, using a personal access token³. So if your site is deployed from GitHub, HTML pages can write back updated versions of themselves by storing a personal access token in local storage and landing commits in the repo, triggering a deploy.
GitHub writeback is fully implemented, but other kinds of origins are easy to add. For example, WebDAV support will make it possible to do writeback to static web servers like nginx via the WebDAV module.
One challenge is it usually takes some time after a change lands in git for it to become live on the web. Webspaces does a few tricks to help, like privately passing GitHub raw URLs among peers when new files are added, so they can be immediately accessed ahead of the actual deployment.
So live updates work for those who can write to the origin, but what about multiplayer editing? You don’t want to have to give everyone who wants to collaborate in a world access tokens to your GitHub repo.
Here’s the solution to this in Webspaces:
- Changes to the world are synchronized between peers via networked-aframe and y.js.
- Those who can update the origin prove it by adding their public key to the HTML <meta> tags and advertisting it in presence. This way, everyone knows if someone is present who can deploy changes, and the UI can update accordingly.⁵
- Of those who can, a designated leader then pushes updates to the HTML back to the origin on behalf of everyone else.
Once this logic is in place⁴, as long as one connected user has writeback access, changes to the world made by everyone will be saved.
The last requirement is document coherence. Typically in a rich webapp like Hubs or Jel, the DOM contains all of the user interface elements, using a framework like React. For Webspaces, the DOM should reflect the world’s HTML scene graph, so it appears like the browser is “rendering” the document HTML itself as a 3D world.
The solution to this is to use the Shadow DOM. By creating a ShadowRoot under the <body> tag, the HTML DOM needed for the user interface can be tucked out of sight, freeing up the root document for the HTML scene graph. This means, for example, that if you open the Web Inspector, you can see the document’s HTML update as you change the world.
This also greatly simpifies web writeback, since the root document can just be serialized and commited as-is back to the origin.
So, after several weekends of prototyping, I determined that local editing, live editing, multiplayer editing, and document coherence were all now possible in a modern browser¹. This unlocks the potential for the browser to (finally) become an authoring environment for self-hosted HTML documents in general⁶, and elegantly solves the requirement of supporting dynamic 3D mixed media in Webspaces. Back to those requirements:
Real time, avatar based communication (voice + body language)
The ability to join the network independently
Addressable places
Dynamic mixed media in 3D space
Virtual environments
The last requirement, up next, is virtual environments.
Virtual environments in Webspaces
In my post about Jel, I unpacked what I saw as the core purpose of environments in virtual spaces into several conceptual layers:
Structure — How is it physically laid out?
Purpose — Does it have a clear purpose?
Vibe — How does it feel to be there?
Detail — What are the fine details?
Allowing creation of custom virtual environments that address all of these concerns was the final essential requirement for Webspaces.
The way we approached this in Hubs was with a separate tool, Spoke, a “environment composer” intended for non-artists. First, you’d gather assets from sites like Sketchfab, and then compose them, in Spoke, into environments to use with Hubs.
Spoke is a great tool, but one of the challenges users ran into was avoiding importing exessively heavy assets. Many of the assets on Sketchfab are highly detailed, and aren’t intended to be used as small environment props, so they have large file sizes and are expensive to render. If you accidentally added a few of these large assets in Spoke, your environment would start failing to load. To fix it, you’d have to learn about things like “poly counts”, “texture maps”, and other technical details, leaving behind the dream of simple environment composition and thrusting you into the scary world of graphics optimization.
After this experience, I decided that custom environments in Jel needed to take an approach that would do a better job of shielding users from performance issues. Also, having a separate tool like Spoke always felt cumbersome. I thought the environment creation tools needed to built into the same app, and should be fully multiplayer just like it was for 3D mixed media.
I settled on voxel-based object modeling and built into a Jel a modeling tool similar to MagicaVoxel which was fully multiplayer. Jel also has a hotkey based system, like Blender, to efficiently arrange these voxel objects into environments. (Albeit with a slight learning curve.) Here’s a demo of these features in an environment made entirely in Jel using imported voxelized assets from Synty Studios:
This voxel approach made it so the performance cost per object was much more predictable compared to arbitrary assets from sites like Sketchfab. Voxels are also much easier for non-artists to mess around with than full polygonal modelling tools like Blender, so I hoped it would lead to people choosing to model or remix their own objects in Jel or MagicaVoxel instead of downloading them from Sketchfab.
Overall, I was mostly satisified with this approach, but the last problem was the classic boxy looking meshes I generated from the voxels:
While this isn’t that bad, it’s fairly off-putting as a universal aesthetic, and was unlikely to age well, which is kind of a big deal if we’re talking about deploying this to the permaweb. I intended to improve this using the transvoxel algorithm, but ultimately I found something way, way better.
Enter Smooth Voxels
Smooth Voxels is a hidden gem of a project I discovered that takes voxel based modeling to the next level. By adding an additional modifier pass over the voxels during meshing, Smooth Voxels can turn very simple voxel models into crisp 3D objects of all shapes and styles:
You can play with Smooth Voxels, by Samuel van Egmond, in the Smooth Voxel Playground. Another great feature of Smooth Voxels is the file format is plain text, just like HTML, so is easy to hack on and fun to play with. Here’s that apple in SVOX:
size = 7
scale = 0.15 0.17 0.15
origin = -y
ao = 5 3
material lighting = smooth, roughness = 0.25, fade = true, deform = 3
colors = A:#C11 B:#F60 D:#000 C:#F93
material lighting = smooth, roughness = 1, fade = true, deform = 3
colors = E:#840 F:#000
voxels
------- --AAA-- --AAA-- --AAA-- ------- ------- -------
--AAA-- -BAAAA- -BAAAA- -BAAAA- -BAAAA- ------- -------
-BADAA- BAAAAAA BAAAAAA BAAAAAA -B---A- ------- -------
-BDDDA- BAAAAAA CAAAAAA CABAAAA -B-E-A- ---E--- ---FF--
-BBDBA- BAAAAAA CAAAAAA CABBAAA -C---A- ------- -------
--BBB-- -BAAAB- -CAAAB- -CAAAB- -CCBBB- ------- -------
------- --BBB-- --CCB-- --CCB-- ------- ------- -------
As you can see, SVOX is kind of like “markdown for 3D modeling”, with the voxels laid down in slick ASCII art. (!) I knew it was exactly what I needed to improve both the aesthetics and capabilities of the Jel voxel modeling system.
Once I started working on Webspaces, I took Samuel’s Smooth Voxels library and optimized it to yield a 5x speedup, and packaged it up on npm. In Webspaces, imported MagicVoxel VOX files are always converted to SVOX, and all voxel objects are Smooth Voxel objects. Pasting plain SVOX (like the above apple) into the world works, spawning smooth, 3D models.
The modeling tools were updated so that when you mouse over the object, you see the voxelized version to work on, and when you mouse away, you can see the smoothed out version that will actually be rendered:
But what about the aesthetics? Smooth Voxels allows a wide variety of styling through shaders, but I wanted a consistent look for objects in Webspaces. I decided to match the cel shaded style of the Jel avatars, which arguably delivers a truly timeless look. Below, you can see identical models shaded with the old, boxy meshing and the new, Smooth Voxel cel shading:
I think the cel shaded version looks great!
With Smooth Voxels fully integrated, Webspaces are made up of plain text HTML files (for the scene graph) and plain text SVOX files (for the models.) I think this is a beautiful combination that will make webspaces easy to generate, remix, share, and hack on. This approach nicely satisfies the requirement of being able to create virtual environments.
And with that, we’ve satisfied all of them:
Real time, avatar based communication (voice + body language)
The ability to join the network independently
Addressable places
Dynamic mixed media in 3D space
Virtual environments
What’s Next
Whew! That was a lot of stuff!
You can get started with webspaces at webspaces.space, join the Discord, follow on twitter, or download the code. I hope it provides an easy and accessible new way for you to join the web emergent metaverse. I also hope some of the tricks I’ve outlined here can be inspirational for other projects working on creating the web emergent metaverse. There are still many open issues I am planning on fixing over the coming weeks. Please get in touch if you do something cool with it — and thanks for reading!
¹ To edit files locally requires the File System Access API, which at the time of this writing is only available in Chrome.
² Weaving the Web provides an autobiographical account of the development of the World Wide Web by Tim Berners-Lee.
³ As luck would have it, GitHub very recently released fine-grained personal access tokens, so you can create a very narrowly scoped access token just for origin writeback of your webspace’s content, without exposing any risk to your other repositories.
⁴ At the time of this writing, fully implementing this packet filtering by writers is in-progress, but the establishment of the chain-of-trust necessary to do this is implemented.
⁵ Peers also issue an encryption challenge to all peers over WebRTC for them to prove they actually control the private key for the public key they publish.
⁶ Using these techniques, someone should try to re-implement Netscape Composer or Microsoft Frontpage entirely in the browser.