Intro to AR Foundation

Code and workflows to integrate AR Foundation into your desktop build

What you'll develop in the AR Foundation section

Github branch link:

Functionalities included

  • Setting up AR Foundation + ARKit plug-in

  • Adding AR Foundation GameObjects to game

    • AR Session Origin

    • AR Session

  • Dynamically checking if deployed to AR platform and disabling AR functionality if not

    • Creating AR-specific systems that only run when AR enabled

  • Grabbing Pose Driver value and providing it to ECS to move player

    • Updating input response systems for updated movement controls

  • Pulling ECS data spawn position and updating Pose Driver to move location of AR Camera to behind player

    • Use ARSessionOrigin.MakeContentAppearAt() to update origin of AR session based on game play

  • Dynamically updating UI for AR instructions when deployed to AR platform

  • Updating PanelSettings to be responsive to both desktop and mobile platforms

MacOS development platform is required to deploy to iOS

Unity cannot directly deploy an iOS app. Instead, Unity compiles code to then be further compiled by Xcode (Apple's integrated development environment for macOS, iOS, etc).

A little bit about Augmented Reality (AR)

How AR technology works

Rather than publishing yet another explanation of AR on the internet ourselves (there are probably enough of those), we think it's better to just direct you to what we think is arguably the best explanation of how AR works in commodity hardware.

Matt Miesnieks wrote a post on Medium during his time as the Founder of (company acquired by Niantic) to describe Apple's then-new release of ARKit. If you are interested in building AR applications and want to learn more about the technology, please read the full blog post here:

What technology is ARKit built on?

"Technically ARKit is a Visual Inertial Odometry (VIO) system, with some simple 2D plane detection. VIO means that the software tracks your position in space (your 6dof pose) in real-time i.e. your pose is recalculated in-between every frame refresh on your display, about 30 or more times a second. These calculations are done twice, in parallel. Your pose is tracked via the Visual (camera) system, by matching a point in the real world to a pixel on the camera sensor each frame. Your pose is also tracked by the Inertial system (your accelerometer & gyroscope — together referred to as the Inertial Measurement Unit or IMU). The output of both of those systems are then combined via a Kalman Filter which determines which of the two systems is providing the best estimate of your “real” position (referred to as Ground Truth) and publishes that pose update via the ARKit SDK. Just like your odometer in your car tracks the distance the car has traveled, the VIO system tracks the distance that your iPhone has traveled in 6D space. 6D means 3D of xyz motion (translation), plus 3D of pitch/yaw/roll (rotation).

The big advantage that VIO brings is that IMU readings are made about 1000 times a second and are based on acceleration (user motion). Dead Reckoning is used to measure device movement in between IMU readings. Dead Reckoning is pretty much a guess (!) just like if I asked you to take a step and guess how many inches that step was, you’d be using dead reckoning to estimate the distance. I’ll cover later how that guess is made highly accurate. Errors in the inertial system accumulate over time, so the more time between IMU frames or the longer the Inertial system goes without getting a “reset” from the Visual System the more the tracking will drift away from Ground Truth.

Visual / Optical measurements are made at the camera frame rate, so usually 30fps, and are based on distance (changes of the scene in between frames). Optical systems usually accumulate errors over distance (and time to a lesser extent), so the further you travel, the larger the error."

-From's "Why is ARKit better than the alternatives?"

The core problem that must be solved for in creating AR experiences is for the device to know exactly where it is in 6 degrees of freedom (translation + rotation). The device uses both visual measurements (camera), and inertial measurements (accelerometer/gyroscope) to figure out where it is in three-dimensional space.

The translation + rotation (6 degrees of freedom) of a device is also referred to as its "pose" in AR.

If you know where the device is (its pose), rendering a virtual object to appear in the physical space becomes easier. The same way a Unity camera moving to the right in a 2D game causes the Unity rendering engine to render objects in a different perspective, the same happens with using AR in the real physical world.

AR Platforms

Now that we've learned a little more about how AR works (a highly-accurate pose is determined by device's sensor data), let's take a look at the platforms where you can build AR applications.

Currently, the leading players in the AR space are:

  • Apple's iOS (ARKit)

  • Google's Android (ARCore)

  • Microsoft's HoloLens (Mixed Reality Toolkit)

  • Magic Leap (Lumin)

There are some very interesting libraries to build web-based AR experiences like AR.js, which can run on a phone browser, but these web libraries are limited.

ARKit, ARCore, Mixed Reality Toolkit, and Lumin each provide special APIs to their specific device hardware to help developers create AR experiences.

A lot of these APIs are similar in that they provide the developer with similar AR data, like providing the "pose" of the device. Each hardware manufacturer has their own version of the "pose" API.

Wouldn't it be nice if there was one single way to communicate with all of the leading AR platforms without having to build 4 different implementations...? Introducing: Unity AR Foundation.

How Unity handles AR

Unity Mixed Reality ("XR") Tech Stack

We have been working to improve our multi-platform offering, enabling direct integrations through a unified plugin framework. The resulting tech stack consists of an API that exposes common functionalities across our supported platforms in a frictionless way for creators while enabling XR hardware and software providers to develop their own Unity plugins. This architecture offers the following benefits:

  • Multi-platform developer tools such as AR Foundation and the XR Interaction Toolkit

  • Faster partner updates from supported plugins via the Unity Package Manager

  • More platforms have access to an interface to leverage Unity’s XR rendering optimizations and developer tools

-From Unity's XR platform updates

AR Foundation

Unity's AR Foundation is an API that sits on top of all the major hardware AR SDKs mentioned earlier.

When "pose" is requested from the AR Foundation layer during runtime, AR Foundation automatically translates that request to whatever appropriate implementation.

Unity does not implement any of these AR functionalities itself; it's just a translation layer. AR Foundation calls a platform-specific "plug-in" to get the necessary data from the hardware. So adding the "AR Foundation" package into our Unity package is not enough; we must also include additional specific packages for the AR platforms we will be targeting. In our case, for this section, that additional package will be Apple's ARKit package.

It is important to note that not all plug-ins are made equal. Not all AR Foundation functionalities are available across all plug-ins. For example, both ARKit and ARCore now provide access to a depth API, but the HoloLens does not provide this data.




Magic Leap


Device tracking

Plane tracking

Point clouds


Light estimation

Environment probes

Face tracking

2D Image tracking

3D Object tracking


2D & 3D body tracking

Collaborative participants

Human segmentation


Pass-through video

Session management


Our Approach

In MainScene, we will run a check if "we are an AR system." If we are an AR system, we will create an IsARPlayerComponent singleton.

We will then use RequireSingletonForUpdate<IsARPlayerComponent> for our AR-specific systems.

We will create a new InputSystem for AR called ARInputSystem that takes in screen taps and translates them to "shoot" commands. We will also update our PlayerCommand to take in AR pose. So our ARInputSystem will be sending "shoot" data through screen taps and updated position through grabbing the "pose" that ARKit's provides.

AR Foundation is written using MonoBehaviours, so we will create and update an ARPlayerPoseComponent in an Update(). The MonoBehaviour will use the EntityManager to update our ARPlayerPoseComponent and our ARInputSystem will pull this data to add it to our PlayerCommand.

Unity resources for AR Foundation

Unity documentation for AR Foundation 4.2.3: Refer to this for more information.

Unity documentation for ARKit XR Plugin 4.1.3: Refer to this for more information.

To be best prepared for the code-alongs

Last updated