3D Scanning 101

MOD Tech Labs
19 min readSep 25, 2020

This is a transcript taken from a series of lightning talks focused on modern content creation techniques during SIGGRAPH 2020. Enjoy!

ALEX: I’m Alex Porter, the CEO of MOD Tech Labs, and we wanted to give this talk about 3D scanning and give you an overview of what the applications are and what types of scanning video and photo capabilities are out there to capture realistic content. We think that the capture part of content creation is widely open for folks to access and we want to give away the keys to the kingdom on that side, because we think that there are far greater professionals out there than we can ever compete with — in regard to capture. There are much more high end, photographers, videographers, etc, that can access this tech.

ALEX: This is our third startup. We had a previous startup that this spun out of called Underminer Studios, including a very early iteration of this tech. We’ve really been working on this for three years. We were working in the B2B XR space, and the the goal with this tool set that we started creating was to make realistic content more accessible.

Ultimately, what came out of that was this wonderful platform that we’ve created, that is much more open and broadly applicable than what we had previously. In the last four years, we’ve worked across entertainment, media, medical — you name it — and our focus has always been back-end tools creation. We want to enable other people to do their creative work and execute their vision easily with tools that are more affordable, work more effectively, and make their teams that much better.

We’re a venture backed startup — based in Austin, Texas — and for the last three years, we’ve actually been recognized by Intel as top innovators, both Tim and myself. We are really enjoying working in a multitude of ways with these with these large corporations, but we’re really angling a lot of our tech and how we’re working with folks these days to those smaller studios because we think that they’re the ones that have the most opportunity to implement this and change, not only their bottom line, but their team dynamics, how they build their company, and how we can scale content creation across the entire market space. Last year, we’re also awarded a City of Austin Innovation Award.

Now I’m going to let Tim introduce himself.

TIM: Yeah! I’m Tim Porter, CTO and Co-Founder of MOD Tech Labs. I’ve spent almost 20 years in the game and movie industry. Inside of video games, I was a technical artist — in movies, I was a pipeline technical director. Last movie I worked on was Alice in Wonderland: Through the Looking Glass. It was a lot of fun. I’ve worked at different places like Two Bit Circus, Gameloft, and Realnetworks… things like that.

A little bit about myself: I really like the concept of automation. I understand and pick up technology very quickly. But because of my position as being a technical artist, I spent a lot of time with people that just didn’t necessarily have the same technological uptake and ability to access that technology. Even within our environment, there are a lot of companies that just cannot find technical people fast enough.

So, when we decided to create this it was an equalizer. Like, how do we get to mid-tier and small studios and go, “Hey, here’s some new technology… here’s automated tools, now how do we accelerate that? And then for large studios, how do we keep up that quality look and make this new real-time production environment, which involves scanning heavy movies, and how do we make it to where we can continue doing that. So on one end, we’re trying to make VFX much more magical. Then on the other end, we’re trying to give accessibility — The Mandalorian in your backyard — and it’s definitely a possibility with what we’re doing.

ALEX: So, we’re going to give a brief overview of each of the types of capture, and as we go through, we’re going to do some use cases for each and the sort of topical understanding of where these are all applicable and how these can be accessed.

Much of our focus here is sharing the broad capture arena. I will say that some of the information that we’re sharing in regard to best practices is a little bit more specific to our processing solutions, but our processing solutions are really meant to be a very universal application.

So, we intake any kind of data — photogrammetry, scanning, volumetric video — and then output open file types: .fbx, .obj, etc. Our goal is to give you back assets that are implementation ready or can be polished for final implementation depending on the use case, the fidelity needs, and all of that stuff. We’re just excited to share with you. We’ve been in this space for nearly four years really digging into photogrammetry, scanning, and volumetric video, creating not only with our own rigs to test on, but ultimately creating this processing solution that can serve for a lot of people.

We’ll start with photogrammetry. Take it away, Tim.

TIM: Photogrammetry is something everybody in the industry has either played with, looked at, or used it previously, but the biggest thing that you need to remember is that it’s using photos to capture spatial data. Using an aggregation of images, I can figure out — or assume — how far away a camera is from an object.

So what’s cool about that is, I can take these photos that go all the way around the subject, then fill in the points — the points of how far away the camera believes it is from the asset — and then you can end up making a three-dimensional mesh out of it. Of course, this is a very glazed overview on what that is.

You can do physical objects and environments. The “real” world is obviously what you’re trying to do — you’re digitizing reality using cameras. You’re using this information, both photos and sets of electromagnetic imagery, to measure and interpret things. So you can then understand — using the aggregation of cameras and photos — what it looks like in three-dimensional space. Then you can view the asset at all different angles. So unlike using a single depth sensor or a single camera — where it goes one direction — or a single camera and you’re looking at in two or two and a half dimensions (if you add some depth information to it), photogrammetry in concept is a fully three-dimensional object.

ALEX: What I love thinking about is the origin of photogrammetry, which really came from around the 1900s. They started creating maps by going up in hot air balloons and drawing out or sketching out the area around them. Then they would travel a specific distance — say five miles — and would go up in the balloon again and draw the correlating objects that they could see. Then they would actually find those points of interest — that same oak tree that’s in the middle of those two distances — and could actually understand, topographically, what’s happening. So this is the advanced, digitized version of that. But it’s so fascinating to think of its origins and this being a technology and technique that has been used for so long.

TIM: Yeah, it’s a it’s a big game of trigonometry. If anybody likes making triangles, that’s what I do that all day long!

ALEX: Yay, math! So, with regards to asset buildout, we talked about real world objects. The goal here, with photogrammetry specifically, is working with still objects as the prime use and function. There’s a lot of opportunities here, in asset buildout with virtual production. There are many ways to capture a person’s body or frame — A-pose/T-pose — and then you’re putting those in and creating more opportunity by rigging them by adding mocap and all kinds of other things to add in more of that realistic movement. That’s typically the case here.

With scene build out there are a ton of folks out there using drone photography or point and click cameras to go and take scene captures. There are a massive amount of interesting ways to use this for previsualization all the way through to the finalization implementation of environments inside of things like major motion pictures, VR, you name it.

Then with character rigging — your FACs — those facial animation systems you are looking at, you actually take that realism from the human face. And this is the most common way to do this now. Folks are getting into other techniques as well, but there’s a massive amount of function here that has been utilized across industries, for realism. Anything to add, Tim?

TIM: I’d definitely say one of the biggest things to remember while you’re doing photogrammetry and using these use cases, is that there are ways to de-light any of your assets. A lot of people get themselves stuck, especially when they’re new getting into photogrammetry and take an asset directly in. Unity has a de-lighter and a lot of other tools can do that based off of normal mapping.

The other thing is, if you haven’t done a whole bunch of work inside of game engines, a lot of the technology that you’re going to be walking into with photogrammetry — especially as you’re going into real-time production — really does lend itself towards spending some more time looking at what gaming video game paradigms are.

You’re walking away from what is traditional waterfall-style production, and you’re going into more of agile or cyclical methodology: the idea that I could shoot today and do my assets tomorrow, which is just insane. BUT, you can also shoot today, have the assets and have a background version of it that’s low-quality, and then have a higher-quality one so that it can be showcased.

It also doesn’t have to happen where it’s entirely separated — if you look at what they did with The Mandalorian, where they said more than 50% of the shots were actually done in-camera. Now, I suspect they did things like sweetening it at the end with some color correct and maybe some grading…but they said what they did was: use the dome for lighting, then they had a background with a green screen and then they would go ahead and do the capture. Then the green screen was taken away with advanced machine learning algorithms and they were able to add back in these assets. So they can do all these changes and everything all on-set.

And that can happen, here. It’s like a theatrical play. It’s no longer, “This goes to this… and this goes to this… and then it goes downstream… and then you end up rendering things and it comes out. But how does it come out? Who knows?” It can all interplay and you can do all of these things — now, tomorrow, yesterday, or all at the same time.

I think that’s kind of the biggest thing to remember here and the reason why photogrammetry is such a big thing. You know, you could even have a lot of fun with it and be the world’s biggest super-villain and take scans of absolutely every major artifact in the world and have them around you and they would look real because they were.

ALEX: Much less mystery… absolutely.

TIM: Oh no! I believe that those people would be full of intrigue and mystery. I don’t believe any of that here. :)

ALEX: With scanning, we’re still capturing spatial data — physical objects and environments — it’s really similar, but there are some other ways to capture versus using “standard” cameras, if you will.

So, this is looking at structured light, LiDAR, and lots of other miscellaneous scanners out there. There are a lot of benefits to this particular type, especially in regard to environments, as we see there with the drone in the photo (above). Drones can use photogrammetry or they can use scanners — it really just depends on their their data capacity and what they’re doing with all the data they’re capturing. And then regarding their functionality, right? A photogrammetry set is going to be very different in functionality than a LiDAR set.

TIM: Definitely. I think the biggest thing to remember is that new quadcopters are basically little planes, and so you can get even more on that. I’ve seen some of our customers send through LiDAR scans with photogrammetry attached all at the same time.

But then there is another way of doing it that’s a bunch easier: either having two drones, or you do plane LiDAR — which is actually not super expensive. You can run flights back and forth — find a crop duster guy — and you put a LiDAR on the bottom, and yeah… that’s how you do it if you want to do a Mayan temple or something like that. Or you could do LiDAR skylight art, which is white and black, and add on top of that, photos from the ground. Then you combine all of that together and you get some really amazing things.

I think one of the biggest thing that you want to remember in this situation is that it’s a little bit more interplay — there’s a little bit more “Did we get that area?” — because it’s not just one scan, it’s multiple scans. Although LiDAR, on its own, does a really good job as long as you’re not concerned with having colored point clouds with RGBD depth using something like a RealSense camera or something similar, which does combine it all together and then you end up getting a .bag file or something like that out of it. Then you can get some really good stuff. Obviously the quality is limited because it’s a technology that is more consumer grade than professional grade, but there’s a lot of really good answers here.

And there’s a lot of even better answers that are coming, especially as we’re getting better now with things like view synthesis and different things like being able to combine LiDAR better with photogrammetry. It’s been around for a long time, but really the ability for machine learning algorithms to sort out the two assets has gotten a lot better. So, you don’t need a manual back and forth to kind of get it going.

ALEX: As Tim mentioned, these sort of examples are standalone scanning and photogrammetry examples, but the reality is there’s a lot a lot of application, function, and amplification of the technologies when you combine them. We’ll talk about a couple of those going forward well.

ALEX: So, we’re talking about your asset buildout, scene buildout, your character rigging again — very similar use cases. But ultimately, what you’re doing is creating even more depth, more information, and getting a lot more data to deal with. And overall it depends on if you have the money and the equipment to create scans? Or do you have the money in the equipment to create photogrammetry arrays? Are you focusing on capturing in-studio? Are you focusing on capturing the outdoors? So there’s a lot of different ways to slice and dice to figure out the technology that actually serves best to the purpose that you’re you’re working toward.

TIM: Yeah, definitely. In each and every one of these instances, there are good answers. Are there great answers yet in this world? Not without a lot of manual intervention, which is where we’re trying to work right now to change. Machine learning algorithms feeding back into LiDAR to increase the amount of points that they’re capable of is really going to help out because LiDAR, with really high precision, is actually extremely slow.

So being able to fill in those places and still provide a smooth surface that gives sharp edges — adding that with photogrammetry — and doing the same exact concept where we’re filling in the places between the photos with view synthesis is really going to give something great.

Something that we touched on a little bit was using depth data — RGBD and things like unstructured light. Then structured light styles, things like stuff from Occipital — the structure sensor, which is a structured light algorithm. Then unstructured light, which is stuff like time-of-flight (ToF), which comes back from, Microsoft-style captures. And then RealSense, which is “a little column A and a little column B”, depends on which generation you play with.

So, right now it’s kind of an arms race and it doesn’t appear that anyone’s coming out the victor — nor is anyone providing solutions that replace what should be, and you can get some amazing results out of photogrammetry that that aren’t even possible with a combination of photogrammetry and scan. But, it’s a “Do you have the time?” kind of thing — trying to, literally, get somebody going up on scaffolding to capture, that castle in the middle… which would be quite entertaining, let alone months worth of work. So there’s always a trade off.

ALEX: The other minor distinction here is that scanning is more effective — thin and ornate objects, intricate wardrobe pieces, those types of things. Photogrammetry is all about coverage. If you have a million photos of something, it will be great. If you missed a spot, then that’s where a lot of this sort of recreation of data comes in. But ultimately, scanning is a better use case for highly elaborate objects — wardrobe, people and places. Because there’s a lot of intricate detail in a castle versus a skyscraper.

ALEX: Last but not least, volumetric video. So, the goal of volumetric video is really to capture a person or object in motion. The incredibly functional part of volumetric video is capturing human faces. We have macro and micro expressions and things like flushing that is highly easy for us to detect as humans, and trying to recreate that via other alternatives is really challenging.

So that’s when you get to that point where you’re watching that movie, and that CG character is grossing you out or freaking you out and you’re kind of weirded out by it. The “uncanny valley”, right? That’s kind of inevitable, and the massive amount of VFX, humanoid, and virtual human things that are happening in the world (not in just movies anymore) across the spectrum of media — we’re really seeing there’s a great opportunity for volumetric video to take that place and really help bring that humanistic side to characters.

We also think that it’s wonderful to drive facial animation, whether you’re overlaying with an animated character or whether you’re animating it for some other use case. It’s wonderful. And we think that this is really the future of capture. This is where it’s all going to angle toward. What do you have to add, Tim?

TIM: I have lots of things to add, as is always!

So, I think the biggest thing and the reason why Alex is talking about facial rigs is that they’re a lot easier to control. In both instances having a large rig like this (pictured above), where we’re showcasing 120 cameras and 26 computers that are involved, the technological need is extremely high. And the reason why we have so many computers on this had nothing to do with computational capability, but has everything to do with getting the data back.

The amount of data that needs to transform out of that many cameras is so high. We do — for 1920x1080 on all of those cameras — over 100gb, per minute. If you go up higher — 4k 8k — you could easily burn through 10-second sequences that were two terabytes from very large camera arrays. It’s just insane.

And so, volumetric video, if you reduce that down to just a person’s head, and reduce this down to the idea of adding motion capture and things like that, you can go back to a lot of really good technologies that are out there until the time that technology does catch up with it.

Volumetric videos has a lot of really wonderful use cases as well. If you are trying to get background or gross movements — if you have the space the processing time — we’ve proved that you can save a whole bunch of money by doing it in volumetric video. The biggest thing is hat price-to-entry. Doing 360° is a higher quality, but the cost is exponential and doing volumetric it’s the same. Photogrammetry can be a small, DSLR rig — while volumetric video is a very large DSLR rig that needs to have massive amount of precision that’s on it. Or you can use — in this specific case (pictured above) — things like webcams or anything else in between.

So, I love the technology. I think it’s great, and I definitely think that it is the future of video — it is fully three-dimensional. And technology, right now, is moving in leaps and bounds to meet the demand that is requested of what volumetric video can provide — because two dimensional video provides so little information, let’s really be honest about that.

TIM: And it’s not just about actors and cameras and things like that. It provides the ability for the world to be you and you to be in the world and everything to match all together. So I really believe that as we continue feeding this technology — as you look at the character over on the left (pictured above), he’s as real as “real” gets, and that will continue to happen and we will continue to make those things.

ALEX: So our our technology on the volumetric video side does not require a green screen, and the goal there is to just drop that bar — you don’t have to have an entire green screen studio available to you to create volumetric video. You don’t. We have tested everything from webcam rigs all the way up through very high-end rigs with DSLRs. We’ve done all kinds of processing for a wide variety of clients from their rigs, and often we don’t even know what the rig is , in and of itself. We’ll get some data that gives us the some of the basic information, but the reality here is that if you have capture capability, then there’s no reason that you can’t have the ability to extend it to volumetric video if you’re not already doing it.

It’s great for digital doubles, facial animation, and as we mentioned earlier, perfect for the combination of technologies. You can capture a bust and a face and get that realism from the human. And then, you combine that with photogrammetry. You do photogrammetric capture of a body in an A pose or T pose and rig that. Then you can create a mocap process with that. There are a million things that you can do with that. Not only does that minimize your physical footprint for the rig space that you need, but also minimizes your digital footprint because the asset can be significantly smaller because volumetric video is a “hog”… it just is.

There’s a lot of functionality within the shoot itself. So drone matching without GPS — we’ve actually done that with a couple of clients. Match moving crane shots, group and remote location shots, stunt shots — all of these things can be improved, safety wise, time wise, and ultimately, it gives back a lot of that creative freedom that the director has, because you have a full 3D person or object that you can manipulate the scene around. Or you can manipulate them in the scene, vice versa, and it’s going to minimize or just get rid of reshoots, massively. There’s a lot of wonderful ways to execute volumetric video into content creation for movies, games, you name it… really anything.

TIM: To give you an example, one movie that I worked on, had a main actor, and that main actor, we did a full digital double on. The full digital double was originally slated for a minute and a half and had a whole bunch of different things that they wanted to do with it. So, it was a full digital double. Hands, fingers, face, full FACs — you name it, all the way across the board. And an asset like that costs multiple millions of dollars — not just capture process/putting it together.

Then it was pushed down to about 15 seconds, and it was something that was literally just because it was a dangerous capture and they wanted to do a switch out between the main actor and the stunt double. They ended up just using the body pose because they’re like, “Oh, we have this digital double, we have to do something with it.

So, they end up putting it in, when if they had just done it with volumetric video, you could have gotten it through and under $30,000 versus multiple millions of dollars. It just shows that there is a new style of technology and that there are really good use cases for it.

ALEX: We have another talk about best practices for capture. We’ll get a little bit more in-depth about some of the some of the strengths and the weaknesses of capture in this style. We also have a capture guide that’s available on our website. You can scroll to the bottom of our homepage and plug it in and it will give you the PDF right there. And we have an Intel article that we published which is a very technical look at the broad sort of array of capture technologies, and it’s specific to volumetric video and photogrammetry.

You’re also welcome to reach out to us. We love talking to people about these awesome, cool technologies and helping enabling others to do more. Email alex@modtechlabs or tim@modtechlabs. We also have a code that you can use for $500 in free processing credits, MODtalks, and you do that after you register through our website.

TIM: Thank you all very much and I definitely look forward to hearing and seeing more from each and every one of you.

--

--

MOD Tech Labs

Enabling production studios to bring immersive video content to life with fast and affordable SaaS processing. Learn more by visiting www.modtechlabs.com