Shaders
To understand how we pulled this off, let’s start simple. Computers that can support dynamic lighting (about 97% of the computers running ROBLOX) are able to do so by utilizing shaders built into their graphics card. So our first question was, “will we be able to utilize the graphics technology bundled into iPhones and iPads to utilize shaders?” Lucky for us, iPads and iPhones 4 and higher have shader support baked in.A shader is a program that tells your computer how to essentially “draw” things in 3D space. On PC, we are able to control whether an individual object is drawn using a shader (for example, we do not use a shader to draw UI elements and skyboxes). The opposite is the case for iOS. It’s all or nothing. You can either add shading to every single thing you create, or have none at all. In order for us to get our dynamic lighting system to play nice with iOS, we had to make sure that everything you can possibly draw or create on ROBLOX has a version with shaders enabled. This took a while.
Performance
The iPad and iPhone hardware is significantly slower than your average personal computer, which was another development challenge. We worked around this by quickening the pace of our code (more on this later), and by carefully reducing the frequency of lighting updates for areas that are outside of your immediate point of view. This is a tradeoff between lighting lag (i.e. outdated lighting information) and gameplay lag (i.e. the inability to move the camera). If a place with a lot of moving lights begins slowing or lagging, we slow the frequency of lighting communication to speed the game up.
Leveraging CPU Architecture
Rendering lighting is resource intensive–we have to compute complex lighting interactions with a high number of voxels at an interactive framerate. To do this, we use many tricks that help keep our lighting system fast, and some of these tricks rely on the specific CPU architecture.A very powerful and commonly used tool for developers is SIMD. On a basic level, this tool allows you to do several arithmetic operations in parallel without utilizing multiple cores. Instead of having your processor add two numbers together, this tool allows it to add up to 16 pairs of numbers together, at roughly the same cost!
However, nothing comes free–not every algorithm out there can utilize SIMD effectively. In addition, writing SIMD code is hacky and time consuming. This is why we’re careful about only optimizing code with SIMD instructions that executes often enough to make a huge difference performance-wise. Prior to the mobile update, we had optimized most of the lighting code using the SSE2 instruction set.
SSE2 can only be used on Intel architecture–processors in iPads and iPhones are different than the ones found in your average PC. iOS tech runs on ARM architecture, which has an entirely different–and incompatible–instruction set for SIMD called NEON. In order to fully leverage the processing power available in the portable devices, we had to re-write the lighting code to utilize and understand NEON instructions where appropriate.
It takes quite a bit of time to optimize code using SIMD–not every algorithm fits the restrictions that these powerful instructions have, but the gains make it worth the extra work. In the case of lighting, we were able to get our system moving 4x faster as a whole, with some specific areas experiencing as much as a 10x speed up. We were able to achieve this because we carefully optimized the code to use the available enhanced instructions that ARM can understand.
No comments:
Post a Comment