Explaining How Gran Turismo 7's 'Sophy' AI Actually Works
Sony AI's new R&D Success Becomes a Racing Grandmaster
AI and Games is a YouTube series made possible thanks to crowdfunding on Patreon as well as right here with paid subscriptions Substack.
Support the show to have your name in video credits, contribute to future episode topics, watch content in early access and receive exclusive supporters-only content.
Gran Turismo is the ultimate racing experience on the PlayStation platform; allowing racing enthusiasts to tweak their cars to automotive perfection, all the while staring at all the shiny high-definition graphics. The most recent entry, Gran Turismo 7 came to the Playstation 5 in early 2022 and introduced a suite of new features, with fantasy tracks, new cars, improved customisation, real-time ray tracing, and - most importantly for us - GT Sophy: a new AI driver capable of taking on even the very best of human racers in competition.
But this hasn't just emerged from out of thin air. Far from it. GT Sophy is actually the result of AI research conducted by Sony's new AI division. And to make it all happen, this new R&D team has been collaborating with academic researchers for the last couple of years to make it all happen. Let's take a look at how Sophy works and the steps taken to make it a reality: including all of the quirks, eccentricities and limitations that define Gran Turismo 7's brand-new AI racer.
What is Sony AI?
In order to tell the story of GT Sophy, we need to talk about where it comes from. Back in April 2020, Sony AI was founded to create a division within the corporation dedicated to the research and development of AI technologies that could impact their portfolio of products, but with a particular focus on entertainment. While this could mean it impacts the likes of their music, television or movie labels, it unsurprisingly is going to influence their Playstation division as well. Right now Sony AI currently operates out of two main labs based in Zurich and Tokyo.
While it's a big deal that Sony has created this R&D division, the recent success and innovations found in deep learning, have led many videogame studios and publishers to create their own AI research teams. Sony AI sits alongside the likes of Ubisoft's La Forge and Electronic Art's SEED as research labs dedicated to supporting future game productions using groundbreaking AI techniques. Meanwhile, the likes of Xbox rely on Microsoft Research which has existed in the corporation since the early 1990s.
Much like all of these other R&D labs, the research coming out of Sony AI is often a collaboration with university researchers, given they can tap into the expertise of teams all around the world who work in relevant disciplines. For academic researchers, this isn't just about being able to solve problems that impact game development, or the prospect of having their name appear in a Sony videogame, rather it's also about the potential of solutions that solve these types of problems being able to impact areas outside of games.
One of the best examples of this is in fact racing games. As you'll no doubt be aware, research in autonomous vehicles, meaning the ability of AI to assume control of cars and other machines is a big area of research and development at the moment. Unsurprisingly, AI for real-world driving is incredibly difficult and even the very best autonomous drivers still suffer in many circumstances, to the point drivers and pedestrians are being hurt and in some instances even killed. This is because the challenges that emerge when driving a car come in a myriad of forms, and once let loose in the real world, a lot of AI methods will struggle to find the right answer for that situation at that moment.
This isn't to say that solving AI for Gran Turismo would mean Tesla's could now drive the Nurburgring in record time, but rather that there's still a lot of value in researching how to solve these kinds of problems in games, given they're not just a safe space within which to conduct this kind of work, but also they're some of the most accurate simulations available to work with. Gran Turismo, like many of its competitors, strives to create a physics simulation that embodies the challenge of motorsports: capturing several thousand parameters that can influence the performance of a car both internally and externally.
The physics system in Gran Turismo is so realistic, that it has subsequently been approved by the governing body of real-world motorsports: the Federation Internationale de l'Automobile or FIA. This led to Gran Turismo Sport on the PlayStation 4 being one of the official games of the inaugural Olympic Virtual Series in 2021, in which players from around the world raced to be crowned champion of the Motorsport Event. The winner was Valerio Gallo of Italy and grabbed a nifty Olympic award to go along with it.
What Does Sophy Try To Do?
Sophy is a deep learning AI system that was trained to learn how to race in Gran Turismo at peak performance: ensuring it clears tracks as fast as it possibly can, all the while operating each car at its limit. It's important to recognise that racing is essentially a complicated battle with the physics of a car that is hurtling around corners at incredibly high speeds, with frictional forces being applied to the body and tyres of a car at different levels. And it's a system that is in constant flux. Sophy's job is to try and manage all of that information and figure out how to follow the road.
But it's not as straightforward as driving as fast as possible. In order for Sophy to work, it has to be fed information not just about the race conditions, but the expectations of the designers working at Gran Turismo's studio Polyphony Digital. Given the franchise's focus on replicating the culture of Motorsport, Sophy was trained to encapsulate four distinct elements of player skill:
The first and most obvious is what we just talked about, the control of the vehicle itself: understanding how to balance the dynamics of a vehicle as it's flying around the track at high speed.
The second is tactics: motorsport requires a driver to know how to pass an opponent that is up ahead but also block other racers from trying to do the same to you.
These first two elements, as we'll see shortly, were the initial focus of the training process in building GT Sophy. The challenge is how you feed this information to a deep-learning AI so it can process and learn from it. And while these are very complicated problems, they are still quite attainable, because the information required is largely established: you can see the track, you can see the other cars, you can feel how the car is handling, and from there, you can figure out how to act. But this then leads to even bigger challenges.
The third element of motorsport is strategy. You need to get an idea of how your opponent drives, and based on what you observe, figure out how to counteract it using the tactics Sophy has already learned. This means you're often dealing with unknowns, you don't have an intimate knowledge of how other racers drive. So you base it on experience, of what you've seen happen before when racing in similar conditions.
The fourth and final element is the idea of racing etiquette. Motorsport racing, both in real life and in games, is meant to be conducted safely and professionally. There's an expectation that you race cleanly, not cutting corners, avoiding colliding with other cars, or even pushing them off the track. This is, for obvious reasons, super important to the real-world Motorsport community to ensure safety, and as such racers are given penalties in the event they break the rules. The same principle is then applied to Gran Turismo as well. But as we'll see in a moment, the rules of racing etiquette are hard to formally define, meaning it's actually quite hard to turn that into guidelines for how an AI car should drive without making it too cautious.
It’s worth mentioning that this isn't the only time we've talked about racing games on AI and Games. I have previously talked about the Drivatar AI system that exists in both the Forza Motorsport and Forza Horizon games. It's worth taking a moment to acknowledge that Gran Turismo's Sophy and Forza's Drivatar are not the same thing. Drivatars are AI drivers that learn how to race by using existing data captured from human players in-game. This means that the resulting driver will learn to race, but it's doing so in a way that is meant to approximate how you would drive in any given situation, complete with all your flaws and eccentricities. Sophy, on the other hand, is learning how to race entirely on its own. In fact, the closest equivalent to Sophy is actually in a different kind of motorsport: since 2019 the MotoGP series developed by Milestone has been creating AI racers using deep learning techniques. This system, known as ANNA, does much the same as Sophy by learning to race by feeding in relevant game information and from that learning how to race bikes in a variety of conditions.
And with that, it's time to go a little deeper and explain exactly how Sophy works, how the Sony AI trained it and the steps taken along the way in order for it to work as envisaged.
Sophy's 'Brain'
Gran Turismo Sophy is trained using a process known as 'mixed-scenario, model-free deep reinforcement learning using quantile regression soft-actor critic'. Okay, that's a lot of words, let's unpack this a bit.
Deep Reinforcement Learning is an approach to training a large and complex artificial neural network to solve a problem - which is typically known as Deep Learning. As the name implies, this approach relies on the use of a Reinforcement Learning algorithm to do so, which means that it learns through experience by trying to play Gran Turismo and understanding what are the right and wrong things to do. In this instance, Sony AI actually designed a new variant of an existing RL algorithm called Quantile Regression Soft-Actor Critic or QR-SAC to train Sophy.
As stated, during development Sophy actually plays the game directly and learns from that experience, this is what is meant by model-free learning. But it cannot be left on its own to simply figure it out by itself. Unlike AI such as Google DeepMind's AlphaStar, which learned to play StarCraft 2 largely by playing against itself - a topic I covered previously - GT Sophy needed to get a stronger understanding of how humans would play in a variety of different situations. Hence, it uses a process of mixed-scenario training, where it would learn to play a number of competitive and cooperative situations, often with some of the racing scenarios being handcrafted by expert Gran Turismo designers and players.
Sophy's 'brain' is a neural network that is being fed in a whole bunch of information about the race track. The neural network acts as what is known as the 'Policy': it's a system that says for any situation in the game, it will always know what action needs to be taken. Now at first, this Policy is going to be pretty dumb, and it's going to make a lot of mistakes. Once Sophy begins to find out what are good or bad actions to take in the game, then the Policy, the network itself, gets updated (typically by changing internal weight values in the network). And that's how it learns, the more it plays the game, it begins to remember good strategies, and discard bad ones. We'll talk about how the learning takes place in a minute.
Now in order for the network to be able to race, it is fed a ton of information about the race track and the current state of the vehicle it's driving. The amount of information fed into the network gradually increased during development, because it wasn't trained at first to simply learn how to play the complete game. Instead, the team focussed on distinct milestones that they wanted to accomplish. So for example, the first phase of traning, was done in single-player time trials, and eventually, this got expanded to complete races, where it could also get information about the other racers.
The complete list of information it is given to race is as follows:
The car's velocity, was modelled in three dimensions so as to indicate vertical velocity as well as any lateral speed as it swung around corners.
The 3D angular velocity of the car (so it's also modelling turning speeds of the vehicle)
The car's acceleration also modelled in 3D
The load on each tyre
The tyre-slip angles, which are the difference between the orientation the tyre is pointing at and the actual direction the car is moving, and is used to help model drifting.
The progress the car had made along a particular track.
The local inclination of the surface of the road, so it could tell if it was going up or down a hill.
The car's orientation with respect to the course centre line, which is modelled with respect to the left, right and centre lines of the track.
A set of course points, broken up into 60 chunks of equally spaced 3D points on the left, right and centre lines. This was always modelled at each timestep relative to the speed of the car, so it could effectively 'see' the track it would come up against over a period of several seconds in front of it.
A flag to tell the car if it was colliding with a barrier.
Another flag to tell the car if it was considered 'off course': which was determined by whether three or more tyres were out of bounds of the race track.
Now all of this is a fairly comprehensive amount of information for racing, but this was just to train it for time trials. To have it then race against other cars, it had even more data being passed into the network:
The set of all cars that were within a certain range, both in front and behind the AI racer. These cars were modelled using their position (based on their centre of mass), alongside their velocity and acceleration, and were ordered based on their relative proximity to the AI racer.
A value to indicate if the car was experiencing slipstream effects from a car in front, as well as an approximation of its strength
And lastly, a flag to tell the car if it collided with another vehicle.
So all of this information is fed into the network, it processes it all, and then it provides an output. But the number of outputs it generates is actually incredibly small. Sophy can only control three elements of a car: the throttle, the brake and the steering. In fact, this was then squashed into only two outputs: one value for braking and throttle, scaled between -1 and +1, indicating how hard the foot should be on either pedal. Plus a second output, again scaled between -1 and +1 indicated how hard it needs to turn left or turn right.
That's it! There are only two outputs. However, there is a hidden aspect of this, in that before it commits to a final action, there is a step before it. The neural network's original output is actually a squashed normal distribution of possible actions. Meaning that when the network is fed all the input and processes it, it doesn't just give one output, it presents several different outputs, each with a probability of their likelihood to be the best outcome. Sophy then selects what it considers to be the one with the highest probability to be the best action to take.
As stated, it only controls throttle and steering, so Sophy as it currently stands doesn't change gears, nor can it influence the loadout of the car, so tweaking the ABS or traction control is completely outside of its capabilities. The cars it can race with have been designed by a human, and it simply drives using what it is given.
So this is how the system interfaces with Gran Turismo, but as I said, when it starts playing, it doesn't really know how to race. So it's just going to do doughnuts or crash into barriers. So now, let's talk about how it actually learns to play the game.
Playing By Yourself
Given that Sophy is advertised as this big innovation as part of Gran Turismo 7, you might be surprised to learn that it wasn't developed using Polyphony's latest entry of the series for PlayStation 5. Instead, it was trained in Gran Turismo Sport on the PlayStation 4.
The reasons for this are actually pretty obvious - when a game like Gran Turismo 7 is in development, it's going to be in something of a broken state as new features are added, and bugs and unforeseen issues arise. As such, it's difficult to have a fully functional version of the game that is in a steady state that enables a machine learning algorithm to learn. And even if it did try to learn from GT7, every time the developers fixed a bug, or tweaked a part of the game, then the system would have to learn all over again, given what it learned before was a reflection of the game at that time. Hence it makes sense to use a more stable game, and Gran Turismo Sport was already out on store shelves when Sophy started development.
And if you thought that was weird: Sophy wasn't trained by plugging it into the game engine or by hooking it directly into a build of the game, it actually learned to race in Gran Turismo Sport, by playing it on PlayStation Now.
For those not familiar, PlayStation Now is a cloud gaming infrastructure that allows users to play a variety of PlayStation games from the company's back catalogue by streaming the games to PS4 and PS5 consoles over the internet. While the paid-for subscription service of the same name was folded into PlayStation Plus in 2022, it's still a big part of the companies offerings, given at the time of writing, it is still the only legitimate way to play PlayStation 3 games on modern consoles given no official emulation software has ever been developed by Sony.
To train GT Sophy, a custom hardware architecture was built inside PlayStation Now that enabled 'trainer' units to be deployed that would connect to PlayStation 4 consoles and control race cars externally. In fact, the system could deploy trainers to control multiple race cars at once via multiple consoles. To a point that Sophy was learning by playing on up to 20 PlayStation 4's at any given time.
While it's controlling multiple cars at once, it uses the neural network I mentioned earlier to make decisions, but it also pays attention to the current situation each car is in, the action it decides to take and the reward it receives. The reward is essentially how Sophy determines whether what it's doing is considered good or bad at any given point, and we'll be talking about that next. But every time it captures this information for all of the different cars it's racing, it stores it in what is called an Experience Replay Buffer or ERB. This buffer is storing all of the most recent events and the rewards it's receiving from them. From this, it is then using this information to make changes to the Policy, by changing the neural network.
The input data I mentioned earlier, and the output which controlled the cars, were handled using a special programming interface created for GT Sport so that Sophy could read the game and play in it smoothly. This interface only allowed for the AI to read and act 10 times a second. This combined with the fact that it ultimately only controls the throttle and steering means it's effectively playing without exploiting internal game engine knowledge that a human player could not figure out on their own.
Rewarding Good Driving
As mentioned earlier, as the system controls multiple cars at once, it fills the Experience Replay Buffer with all of these different gameplay moments it has played. From this, it figures out the good and bad decisions it has made and updates the Policy, and the Neural Network so that it's smarter in the future. This is the Quantile Regression Soft-Actor Critic algorithm I mentioned earlier. I'll forego going into the technical detail and the mathematics of how the updates work, and instead, I want to focus on the rewards.
In order for the algorithm to update the network, it has to know at any given point whether what it's doing is a good idea or not. This is how reinforcement learning AI is designed: you give it the state of the problem, it takes action in that state and then it's told whether that action was 'good' or not based on the result. Over time, the positive rewards it receives will reinforce good habits, while the negative rewards discourage bad habits. So what did those rewards even look like?
Sophy actually receives 8 different types of rewards, and are designed to address specific parts of racing performance, either to encourage good habits or penalise it for making bad choices. The list in full is as follows:
A course progress bonus: how far since the last interaction with the game has the car progressed along the track? This is ultimately the main reward the car receives: is it moving along the track as intended? This was actually tailored in a research project conducted alongside researchers at the University of Zurich. Given it gets rewarded for each meter of the track it completes without going off course.
Off-course penalties: where it penalises the car for going off the track and is largely used to make it avoid taking shortcuts. This penalty is scaled relative to the speed the car is moving. So if it goes crashing off course at 100mph, that's a very big penalty.
Wall penalties: where time spent colliding with a wall is penalised, and again it’s proportional to speed when the car hits the wall.
Tyre-Slip penalties: meaning that when the car is going in one direction and the tyres are pointed in another (i.e., the car is skidding out of control), then a penalty is also applied.
These first four are just for learning to drive, and the next four handle both racing against others, and the need for the AI to learn a bit about sportsmanship.
First of all, there's a passing bonus for whenever the car gains ground on an opponent, and then it overtakes them. This was also a project conducted in collaboration with academic researchers at the University of Zurich and ETH Zurich. One of the biggest challenges faced in this work - and also the funniest - is that it had to be designed to prevent what is called 'positive-cycle reward loops'. If you give the AI a reward for successfully overtaking a car, you also need to make sure you punish it for having the same car overtake it at a later point. Otherwise, the AI learns to overtake, then fall back and overtake again. Because if it gets rewarded to overtake a lot, then the easiest way to do it, is to simply overtake the nearest car over and over again.
Secondly, there's a Collision Penalty, meaning that whenever an AI racer collides with any other car, even a car that the AI is also controlling, it considers that a bad move and punishes the car appropriately.
There's also a special penalty specifically for a collision that a car makes hitting the rear-end of another car. This is not just to discourage normal collisions, but specifically to stop Sophy from driving into the back of cars that it's chasing.
And lastly, there was the ‘Unsporting Collision’ penalty: this was the hardest penalty to get down. Given what you're essentially asking the AI to do, is to avoid hitting other racers. But then if you make that too strong, it actively discourages Sophy from getting too close to cars, which will impact their overall performance. Hence this is a very specific extra penalty, that is only used in very specific situations. In fact, during development, it was restricted to only being used on the first and final chicane on Circuit de la Sarthe.
Building a Competitor
Now we know all of the technical details that you need, we'll move this to a close by talking about the story of how it all came together.
As mentioned already, Sophy emerged as a result of a process of iterative development. The course progress bonus emerged from research conducted prior to the final driver being created. Plus the whole process of overtaking was also conducted as a separate research project.
Meanwhile, the etiquette issue took a lot longer than how it was described earlier. For all you motorsport fans, you might have noticed that etiquette is built in Sophy to penalise on contact with another car, rather than when the time penalty is applied - which typically happens on fixed areas of the track. This is because Sophy needs to learn why it is penalised, rather than receiving the time penalty a few minutes later without any real insight as to why. While the unsporting collision penalty is quite simple, the devs experimented with more complex systems to evaluate the severity of a penalty based on the situation. However, it caused a problem given some penalties were considered more severe than others, and by giving it the freedom to consider some penalties as less important, Sophy was considered too aggressive by many of the test drivers it went up against. Hence, as discussed already, it's now a lot more conservative, given it is penalised for pretty much any collision it takes, but also the extra penalties it experiences on tracks like Sarthe which are, in the eyes of human players, known problem spots.
The original testing process was conducted purely in time trials, with Sophy playing Gran Turismo Sport for around eight days. Despite this significant amount of time, it only actually took around two days before it was already surpassing around 95% of time trial data that the research team collected from Kudos Prime: a popular website for players to share their time trial scores for others to compete against. The training only stopped once trial times were no longer improving.
With the time trials performance looking ready to go, the next step was to get ready for human competition. To test out how well Sophy would work against the very best of human players, it then trained again for around 12 days, this time focused on head-to-head racing. Given that Sophy is trained by allowing the system to play on multiple PlayStation consoles at once, and often with it controlling multiple cars in the same race, Sony AI estimates that it clocked up 45,000 hours of driving time during training. That's just over 5 years. This means that Sophy has officially played more Gran Turismo Sport than any human possibly could. Even if a human played the game for 24 hours a day without rest, the game hasn't even been available to players for 5 years at the time of publication.
All of this led to the main event: a competition between Sophy and top human players. In July of 2021, Polyphony invited a handful of professional players from around the world to participate in time trials and in head-to-head team races against Sophy.
The competition ran on three tracks:
Dragon Tail Seaside: a fictional track based in Croatia that is just over 3 miles in length.
Lago Maggiore GP: another fictional track based in Italy, totalling around 3.6 miles.
And lastly, Circuit de la Sarthe: an 8.5-mile-long track, famous for being the venue of the 24 Hours of Le Mans race.
The time trials ran entirely online, during which three top players - Emily Jones, Valerio Gallo and Igor Fraga - were tasked with beating the time trial records Sophy had set on these tracks, being able to see Sophy's ghost at all times. In all three tracks, Sophy retained its record, with the human team unable to beat the times set by the AI player. Quite often it was found on specific tight corners and chicanes, Sophy was capable of handling itself at a much higher level than its human competitors. To quote Emily Jones in a follow-up interview:
“It was really interesting seeing the lines where the AI would go, there were certain corners where I was going out wide and then cutting back in, and the AI was going in all the way around, so I learned a lot about the lines. And also knowing what to prioritize. Going into turn 1 for example, I was braking later than the AI, but the AI would get a much better exit than me and beat me to the next corner. I didn’t notice that until I saw the AI and was like ‘Okay, I should do that instead.”
Emily Jones
Meanwhile, the head-to-head competition took place in the Polyphony digital offices but was limited to top Japanese players given the restrictions brought on by the COVID pandemic. Two teams of four were devised, with Sophy controlling all four cars for its team, meanwhile, the human team was composed of Takuma Miyazono, Tomoaki Yamanaka, Ryota Kokubun, and Shotaro Ryu. In each case, points were awarded based on the final positions of individual drivers on the same three tracks as the time trials, with the final race on Circuit de la Sarthe counting double. The human team succeeded, scoring a total of 86 points over the three races, compared to Sophy's 70.
But, far from defeat, the Sony AI team went back and spent another three months improving the system. How? Well, that's not exactly clear. What little Sony mentioned, was the neural networks' internal structure got bigger, and they made some modifications to rewards, training features and other aspects. But afterwards, they returned to run the same head-to-head competition again in October of 2021, during which Sophy won with a total of 104-52.
While this success achieved the goals that Sony AI sought, they are the first to acknowledge the system still has limitations. Right now it still lacks a lot of strategic decision-making, which often is then punished. A good example noted by the developers is that it will often take the earliest opportunity it can slipstream on a long straight, rather than simply wait until it's impossible for its opponent to do the same thing in return. But also, despite the efforts to curtail it, the AI racer is still a little too aggressive. One issue is that it's prone to passing opponents in risky situations where it might get away unscathed, but it could force the other car to receive a penalty - which of course goes against the spirit of racing etiquette. So while it has largely solved the 'drive fast, time good' part of the problem, the more nuanced elements of head-to-head competition are still an open challenge.
Closing
Gran Turismo Sophy is one of the most high-profile innovations of Deep Learning in video games to date. It's been a slow process, but machine learning is having a big impact in many facets of game production, but here we're seeing it front and centre as it highlights the potential to change racing AI in a big and meaningful way. It sounds like there's still more to come from Sony AI and the development of Sophy, and you can be sure I'll be following along as it happens.
References
"Neural Networks Overtake Humans in Gran Turismo Racing Game" [Nature]
"Outracing Champion Gran Turismo Drivers with Deep Reinforcement Learning" [Nature]
"Superhuman Performance in Gran Turismo Sport using Deep Reinforcement Learning" [IEEE Robotics and Automation (ICRA)]
"Autonomous Overtaking in Gran Turismo Sport Using Curriculum Reinforcement Learning" [IEEE Robotics and Automation (ICRA)]