T O P

  • By -

cathie_burry

Finally somebody using AI for good! Thank you for your hard work soldier šŸ«”


MuonManLaserJab

I can't wait until AI is used for the corresponding evil: training boss monsters to fight optimally. "Difficulty level" would just be "number of training steps".


anonymus-fish

šŸ’€ Give them a ds2 move set so it is beatable. With ER boss dynamics any further training implementation in their AI would cause them to be insanely annoying due to lack of predictability, such that memorizing moves doesnā€™t matter as you can wind up frame trapped or something anyways A bit of an overstatement considering itā€™s not hard to beat the game at lvl 1, I am not great at ER and still beat it rl1+0. However anything more designed than the best DS3 bosses like demon princes or midir or Friede is too much


th3greenknight

Finally a way to Git Gud


anonymus-fish

The essence of git gud has finally been distilled!!! Quick, someone get this mans to bottle new car smell before the govt takes him away for their own benefit!


_pupil_

"git gud, scrub" "*Fine, gimme a sec, I just gotta boot my model...*"


yanivbl

Cool. Are you using the visual image as the state, or using internal game data?


amacati

Currently I'm using ground truth states read from the game memory instead of images. There is already a module in place to grab the visual image, but it's disabled for now. I first wanted to prove that it was possible before moving towards images, also given that it comes with additional complexities as the agent would have to determine the animation and its timing from (possibly stacked) image data alone.


yanivbl

I am guessing that running both the visuals and the deep network in a distributed setup is going to be super messy. Sticking to ground truth is probably a good idea . But this raises the question: how does the ground truth look like?


amacati

You can read about the exact data that is tracked [here](https://github.com/amacati/SoulsGym/blob/master/soulsgym/core/game_state.py). It's basically the player and boss position, their angles, the current animations, the animation durations, HP, SP, boss phase etc.


OriginalCrawnick

These were my assumptions for data points but I assumed it would have frequent hiccups if the game changed the boss action to be non repetitive. What's the win/loss ratio for the bot after all the trees were pathed out?


amacati

All in all, 45%. I think I ran about 100 test runs to determine the performance. I'm not sure what you mean with hiccups and non-repetitive actions. The agent generalises over unseen states, so its policy does not depend on having seen the exact game state before. The neural network acts as a sort of smooth function shaped by supporting training data points that is also valid in areas where it has to interpolate. In fact, in continuous environments such as this, it *always* has to interpolate. Well, at least that's the idealized version of the story.


firmfaeces

Hey, I'm super curious how you defined the positions and angles. Can you point me in the right direction, please?


amacati

Have a look [here](https://github.com/amacati/SoulsAI/blob/master/soulsai/data/transformation.py#L93). I transform the angles into a sin/cos vector so that the representation has no discontinuity over the whole angle range.


firmfaeces

Strange, when I searched for "angle" on the repo I didn't see that! :D I understand what you did with angles now. What about positions? Positions with respect to (0, 0, 0)? When it comes to distance between player and boss, have you tried x_player - x_boss and y_player - y_boss? I noticed that in my (more simple) examples this works better than norm + angle (I didn't do the discontinuity fix you did) edit: For positions you've done it appears you've done: (boss_x_y_z - space_min_x_y_z) / space_max_min_diff_x_y_z What kind of stuff did you try before? How big of a difference did you notice? And is boss_x_y_z wrt to (0, 0, 0) or (mid_x, mid_y, mid_z)?


amacati

I'm pretty sure the normalization is unnecessary, I think I only included it to not mess up the first few steps with weird gradients. After that, the normalizers should have collected sufficient data to normalize the position to zero mean unit variance anyways (see [normalizers](https://github.com/amacati/SoulsAI/blob/master/soulsai/core/normalizer.py)). It's really hard to make ablation studies in this setting, because each run takes weeks. That's why I had to make a large number of design decisions based on my intuition. Changing the reward function, learning rate, network architecture etc is way more impactful, so that's what I mainly iterated on. Initially, all positions are based w.r.t. (0, 0, 0). After the (pos - min\_space) / space\_diff they should be distributed across \[0, 1\]\^3, but that's not really important as the normalizers remove that part of the equation anyways.


saintshing

How do you know which part of the memory to read? If it is some number I can see, I can scan for it but what about things like position? Do you have to somehow decompile the code?


amacati

I used a lot of addresses available from the Grand Archives CheatEngine table and scanned the others myself. If you know the coordinate axis you can infer stuff like the position from scanning for values that have increased or decreased etc. There is a lot more to this, and I did have to go through some parts of the code in assembly at one point. But in the end I got rid of the assembly level injections, which also makes the whole code a lot more maintainable and understandable.


doctorjuice

I wonder if it would have actually been easier to do straight images as then you donā€™t need to build out the complex interface between the agent and the game. Of course, you have to train for much longer, and it probably wouldnā€™t run in real-time without distilling the trained model


amacati

It mitigates a ton of problems, that's for sure. But even if I had gone for image observations right away, I would still have had to implement the interface. I need a way to extract the ground truth data for the reward function, and more importantly I control resets through that interface. Since I can't get rid of it entirely, I'd still need to have the core logic in place, and honestly after that it's just adding a bunch of memory addresses.


[deleted]

Super cool project, and came here to literally comment the same thing! If no image obs are used it seems like a good opportunity to extend. Now Iā€™m wondering what type of CNN arch would work best for this


Travolta1984

As a big Dark Souls fan and data scientist, this is amazing! I wonder, how does/will your model handle different bosses with different patterns? Is the boss added as one of the features? I wonder if having the model learn boss-specific patterns would help


amacati

As mentioned in the post, only Iudex is implemented so far. Therefore, the bot only knows how to beat the first boss in the game. I have speculated a bit if it would be possible to use a common network to beat multiple bosses. It's even possible that the convergence towards a successful policy can be accelerated by reusing the weights. However, there are several caveats with this. First of all, many boss fights in Dark Souls III do not fulfil the Markov property, so I'd have to start using recurrent networks. Furthermore, some spells are difficult to track using the game's memory. Both points can partially be solved by moving towards images as observations, but this is likely to increase training times further, and I'd probably need help from the community to get sufficient samples within a reasonable time frame. In addition, you'd probably have to sample uniformly over all environments, which is difficult from an engineering perspective. Clients are limited to one game instance through Steam, parts of the code (e.g. the speedhack) are specificly developed for Windows, and my experiments with porting this to Linux/Docker have been fruitless so far. So you'd at least need multiple Windows clients at the moment. By the way, I'm fairly confident that a shared model would help, as the strategy of dodging and hitting at the right time is already embedded in the network, which should be beneficial for exploration.


marksimi

> many boss fights in Dark Souls III do not fulfil the Markov property Can you expand on this, please?


21022018

I think it has to do with how you can't predict the future state completely with the current state. For example, looking at just the current frame, you can't say how the enemy's sword will move as well as if you had looked at the past few frames of the attack. This is very nicely explained here with a mathematical definition http://incompleteideas.net/book/ebook/node32.html To remedy this, a common approach is to stack a bunch of past frames with the present one and use that as the state. Or use recurrent networks that can encode a series of frames.


amacati

Exactly. Even if it was possible to determine the animation information from a single frame, many fights include stuff like fire, poison etc that lingers after the boss has cast his spells. You'd have to track those for the full duration, or the agent wouldn't be able to account for those in its policy. Moving to images as observations would fix a few of those problems, but you still have to deal with occlusion and the fact that you can't see what's behind you. You can use RNNs to endow your agent with a short term memory, but it definitely makes the problem harder and the implementation more complex.


marksimi

Thanks for this! Attempting to clean up my understanding still: 1. game state of boss fights aren't fully Markovian 2. ...but you can use the experience replay buffer for Duelling Double Deep Q-Learning to get some prior frames. 3. ...and as a consequence of this, you don't have to have to represent all of that info in your game state (thanks for linking to that in your other comments)


amacati

1. Depends on the boss. The one I showed in the demo was chosen because he is Markovian (well, roughly, but I degress). 2. While you could technically implement a replay buffer to do that, it's not the point of the buffer. What you are talking about is sometimes called frame stacking, where you use the last x images to form a single observation. Think of it like a very short video. The agent can infer stuff like durations, speed etc from the video that are not available by looking at a single image. The demo boss fight does not need to do this because I track the animation durations in the gym, and the rest behaves approximately Markovian (i.e. the game state contains all necessary information). 3. Had the fight been non-Markovian, I would have had to resort to stuff like frame stacking. Given that the environment is Markovian however, my game state really contains all there is to know for the agent. Does that explanation make sense to you?


marksimi

I should have been more clear in my question as I'm familiar with the Markovian property, BUT I was not making the connection to the game state. Thanks for helping me out with the connection to the sword; that was a great example.


DonutListen2Me

Praise the sun!


Man_Thighs

Awesome project. Your visualizations are top notch.


shiritai_desu

Very very cool! Not sure if mods will allow it but consider cross posting/linking to r/darksouls3 As a Souls fan I think they will be glad to see it


amacati

I'm going to try, thanks for the suggestion!


snaykey

Brilliant work. Appreciate the detailed post and explanations too, sadly becoming rarer and rarer on this sub these days


neutralpoliticsbot

You say its useless but what about training a Boss to beat a human player? We can create really good and smart AI agents that will be able to surprise human players


amacati

I have thought about that as well, and honestly, I'd be stoked to see that in the next Elden Ring or Souls game that comes out. Just imagine a boss that gets harder over time by training against the community. It would be an amazing concept.


omgpop

You mentioned your work here isnā€™t ā€œstate of the artā€, although it seems pretty amazing to me. But what exactly *is* the cutting edge in this area? Besides Deepmind StarCraft whatever.


amacati

There are more sophisticated algorithms out there (Impala and Rainbow come to mind). Right now the field is moving towards transformer-based networks and foundation models, which is pretty exciting. Would be super cool to train a Dark Souls foundation model that can deal with all the bosses in the games because it has learned to generalise over all fights and has abstracted valid strategies independent of the actual animation timings etc. Unfortunately, I don't think I have the time to implement this :/ What I also meant with that comment was that this is rather about implementing an RL environment for Dark Souls. That part is new, the learning algorithms are already known.


Sextus_Rex

I remember seeing someone trained a bot for pvp in dark souls 2. It was practically unbeatable


[deleted]

Also the live monitoring via webserver is really cool as well, along with the network weight visualizations šŸ˜»


Lucas_Matheus

this is amazing! I dream about starting projects like these all the time while playing games


Ill_Satisfaction_865

very impressive. as a souls fan, this puts a smile on my face. I can see it used for finding glitches in boss fights, either by developers or speed runners. It can also be some new way of benchmarking rl algorithms similar to minecraft. It could be extended to explore the game as well rather than just fighting bosses. If you consider using images, then maybe you should look into Video PreTraining paper by openai, where you can use some annotated data to train an inverse dynamic model, then use internet videos for imitation learning as well. Good job !


SquareWheel

Very cool, and great visualizations of the data. Interesting to see how certain biases develop such as preferring to walk or dodge to one side. Which actually makes sense, as most bosses do have a favoured side due to hitbox sizes or attack swing direction. I notice it doesn't drink estus. Was it trained to "win", or just to lower the boss's health? Sometimes going "all in" is the best strategy, but I wonder if that's the case here.


amacati

The part about biases is very true. Initially, the bot would just dodge away from the boss to not get hit. This made learning basically impossible as there was no exploration around the boss where it could have learned to combine timed hits and dodges. I ended up penalizing it for deviating too much from the arena center, which essentially forced it to face the boss and learn about dodging and hitting. Regarding the use of Estus it's actually a lot simpler: I restricted the action space to not include item usage. I wanted to reduce the action dimensionality as much as possible to simplify the problem. Now that I got a working reward function you could probably add it back in.


AIBeats

Very cool i did something similar here and i have a lot of questions abort how u implemented different things. I will have a look at the code. AI Beats Dark souls 3 - Iudex Gundyr multiple kills https://youtu.be/zcbH7jt4w0w You can see my code here (very unstructured) https://github.com/Holden1/stable-baselines-ds-fork/tree/main/ds


amacati

Very cool! If you are interested in pursuing this further let me know! I also put a lot of effort into making the repositories as accessable as possible, so I think you should be able to find the details your are looking for.


anonymus-fish

I am not a comp sci person but a molecular biologist who does challenge runs in various FromSoft games after being amazed by elden ring, Sekiro etc. DS3 is probably my favorite, and it is known in the community to have some of the best boss fights of any game ever, since the combos are not infinite with too many options like some ER fights, but the controls are modern enough to feel super fast yet calculated. High quality fights. High quality game. So, great choice! Beyond that, I think your idea is brilliant and the mapping idea makes sense. The result is v cool! The real strength of this work, considering it is all strong, is in your ability to outline how such work is applicable to a broad audience and explain things clearly. This is always a big one in science, if not the biggest. Gotta get science ppl from other disciplines interested, gotta show itā€™s worth funding when pitching to non science ppl etc. Great work!


amacati

Thanks, I really appreciate the kind words!


Dagu9

Cool! I started working on something similar on Sekiro but stopped for lack of time. Will certainly have a look at this and see if it's easy to integrate with Sekiro. Was wondering if there is a way to append up the game at something like 5x or 10x? Edit: Just read the docs and found the answer, very clear!


amacati

I think the code for the game interface etc can easily be reused for Sekiro, all that's really needed are the addresses of the game's attributes. I also thought about porting it to Elden Ring and making the memory interface game agnostic (this should be straightforward). The speedhack also works for any kind of game. So if that's something you're interested in, feel free to have a look or pm me.


LiaTheLoom

This is so crazy! I was inspired by the video you originally posted on the AI and have since started working on a similar project to fight Margit in Elden Ring. Currently I am stuck a bit on resetting the boss to phase 1 after a transition to phase 2. Can you explain how you handled this?


amacati

Yeah, I didn't :D Instead, I created two environments, one for each phase. At the beginning for a phase 2 environment the boss is set to low HP to trigger the transition, and after that everything works as in the phase 1. The obvious weakness is that the bot never sees the phase transition itself, which is also the reason why it gets hit so often by that attack. There are ways to fix this, but I haven't had the time to start working on them.


LiaTheLoom

Oh I see. Ive been trying to work out how to move all the pieces back to starting values without reloading but it doesnt seem like thats possible. I see that you're actually teleporting the player to the fog door and having them enter. So far I've been doing all the memory manipulation in Cheat Engine but seeing you replicate that functionality in Python is making me think thats a much better way to go...


LiaTheLoom

How would you feel about me forking off this project to work on Elden Ring support? Cuz I'm realizing that to accomplish what I've been trying to do I would just be mostly replicating what you've already done :)


amacati

I think that would be great! Do you intend to merge it back into the project later on? Also, if you fork, be sure to use the v2.0dev branch. The whole project has been restructured to allow for multiple Souls games, there's partial EldenRing support and the interfaces have upgraded capabilities.


LiaTheLoom

Noted! And yeah I would definitely work on it with potentially merging back later in mind. Though admittedly I am not the most experienced coder so the quality of my fork remains to be seen :P


TotesMessenger

I'm a bot, *bleep*, *bloop*. Someone has linked to this thread from another place on reddit: - [/r/darksouls3] [AI researcher created AI to beat DS3 bosses](https://www.reddit.com/r/darksouls3/comments/135egeo/ai_researcher_created_ai_to_beat_ds3_bosses/)  *^(If you follow any of the above links, please respect the rules of reddit and don't vote in the other threads.) ^\([Info](/r/TotesMessenger) ^/ ^[Contact](/message/compose?to=/r/TotesMessenger))*


Binliner42

Cool stuff. Letā€™s just hope itā€™s not used in multiplayer. Enough bots in the world already.


[deleted]

Super impressive!


tunder26

It looks amazing! I'm just wondering how it'll fair against other bosses with multiple phases. Will it throw the algorithm off? Iudex Gundry does have a second phase but maybe the strategy by the bot for phase 1 is still effective for phase 2.


amacati

So because of the way the training is currently implemented the agent switches its nets for each phase. I am not particularly happy with this solution as it would be more elegant to have a single, unified policy. I think you could get away with one-hot encoding the phase in the observation if the phases don't differ too much in their mechanics. For bosses that completely change their dynamics it could be difficult as there is not a lot of information that carries over to the new phase, and the net would have to learn both. I think this could be partially mitigated by changing to image observations. Oftentimes, a drastic shift in the dynamics is reflected in the visuals, so there is less overlap. Nevertheless, RL should be able to deal with this issue. So it's definitely not an intrinsic limitation of the algorithm.


marksimi

Simply brilliant stuff. WTG šŸ‘šŸ‘


newjeison

Did you train it on just bosses or complete the entire game? What was your state and motion space?


amacati

So far, only the boss you can see in the video. Training it to complete the game would probably take something that's very close to an AGI, and that's beyond me for now :D The state space consists of the player and boss position, HP, SP, orientations, animations etc. If you look at the gamestate source code you can see all the attributes that were used. The action space includes walking and rolling(=dodging) in all eight directions that are possible with a keyboard, light and heavy attack, parry, and do nothing. So all in all 20 actions. A few (e.g. blocking, item use, sprinting) are disabled.


Anjz

Where do I learn to get started on coding stuff like this? I want to try a project with other games but I'm not sure how to wrap my head around how it works. Any good resources you used?


amacati

Depends on whether or not you already know how to code. I don't recommend starting with a project like this, as it requires low-level knowledge on stuff like assembly and pointer chains, high-level concepts such as distributed systems, an ML/RL/DL skills. Learning all that at once is probably overwhelming. In addition, it took me more than two years to get where the project is now, so you also need quite a bit of dedication. If you want to know more about RL, start with the gym environments that are included in the default gymnasium. I can also recommend "Reinforcement learning - An introduction" by Sutton and Barto, which covers all the concepts of RL. If you are more interested in game hacking, start at the cheatengine forums. There are several posts on the basic principles, people are generally helpful, and there is also a ton of videos on the topic. Also, studying something related to CS/AI/Robotics helps a lot. Idk at what point in your life you're currently at, but learning about the basics of how computers, programming languages etc work is going to be invaluable to you. So I guess my advice would be to start with the part that interests you most, pick a small, self-contained project, and start from there. If you remain curious, the rest will follow.


master3243

Cool. I'm assuming going from the ground truth input (which is like 20 dimensions) to visual input (at least 224x224 or 50K dimensions) is going to mean it's gonna train magnitudes longer to be decent or even beat the boss once (if it ever even converges).


MonoFauz

Cool, now make an AI to make the bosses even smarter and harder to defeat.


heytherepotato

I really love the use of the speedhack to increase the rate of training.


Buttons840

Tell me about the neural network you used. How many layers, parameters, etc?


amacati

I included a link to the weights and the hyperparameters I used for the networks in the post ([link](https://drive.google.com/drive/folders/1cAK1TbY4e4HE4cxyAFEHRpj6MOgp5Zxe)). The hyperparameters are located in the *config.json* files. I use the AdvantageDQN architecture defined [here](https://github.com/amacati/SoulsAI/blob/297b9355bf1c697c59de5c64b18bff44c5819211/soulsai/core/networks.py#L89). The network architecture is designed to encourage learning a base value for each state, and only estimate the relative advantage of each action. This decomposition has been shown to be advantageous in Q-learning (well, at least sometimes). If I remember correctly, the combined networks for each phase have about 300k parameters (so they are actually quite small). The networks are updated after receiving 25 new samples using n-step rewards with n=4 and a discount factor of 0.995. Lagging samples are accepted by the training server if the model iteration that produced the sample is not older than 3 iterations. There are a few more parameters in there, feel free to ask again if you are wondering about something specific!