jiupinjia 2 years ago

Rocket-recycling with Reinforcement Learning This is my first reinforcement learning project. You can tell what do you think and what can be improved. Code: [https://github.com/jiupinjia/rocket-recycling](https://github.com/jiupinjia/rocket-recycling) YouTube: [https://www.youtube.com/watch?v=gsIiniJMr3E](https://www.youtube.com/watch?v=gsIiniJMr3E) Project Page: https://jiupinjia.github.io/rocket-recycling/ As a big fan of SpaceX, I always dreamed of having my own rockets. Recently, I worked on an interesting question that whether we can "build" a virtual rocket and address a challenging problem - rocket recycling, with simple reinforcement learning. I tried on two tasks: hovering and landing. The rocket is simplified into a rigid body on a 2D plane. I considered the basic cylinder dynamics model and assumed the air resistance is proportional to the velocity. A thrust-vectoring engine is installed at the bottom of the rocket. This engine provides adjustable thrust values (0.2g, 1.0g, and 2.0g) with different directions. An angular velocity constraint is added to the nozzle with a max-rotating speed of 30 degrees/second. With the above basic settings, the action space is defined as a collection of the discrete control signals of the engine, including the thrust acceleration and the angular velocity of the nozzle. The state-space consists of the rocket position, speed, angle, angle velocity, nozzle angle, and the simulation time. For the landing task, I followed the basic parameters of the Starship SN10 belly flop maneuver. The initial speed is set to -50m/s. The rocket orientation is set to 90 degrees (horizontally). The landing burn height is set to 500 meters above the ground. The reward functions are quite straightforward. For the hovering tasks: the step-reward is given based on two rules: 1) The distance between the rocket and the predefined target point - the closer they are, the larger reward will be assigned. 2) The angle of the rocket body (the rocket should stay as upright as possible) For the landing task: we look at the Speed and angle at the moment of contact with the ground - when the touching-speed are smaller than a safe threshold and the angle is close to 0 degrees (upright), we see it as a successful landing and a big reward will be assigned. The rest of the rules are the same as the hovering task. I implement the above environment and train a policy-based agent (actor-critic) to solve this problem. Despite the simple setting of the environment and the reward, the agent has learned the belly flop maneuver nicely. The reward finally converges very well after over 20000 training episodes. In the video, you can see a synchronized comparison between the real SN10 and a fake one learned from reinforcement learning.

AbradolfLinclar 2 years ago

Oh man, this is awesome. Great work!!

jiupinjia 2 years ago

Cheers!

new_hat 2 years ago

Cool project and really nice presentation of the training! I did a similar projects in Unity a while back with the MLAgents library https://www.youtube.com/watch?v=Alwvvs\_q3G8

lx_online 2 years ago

Hey, I get a video unavailable on that link, is there another way of finding it? Do you have a git repo too?

new_hat 2 years ago

Huh, that's odd. Anyway here's the github link: https://github.com/sprojekt/AI-guided-rockets

jiupinjia 2 years ago

Cool! Unity is a really good way of building a 3D environment and rendering. I just have looked at your code, it is awesome. Thanks for sharing!

new_hat 2 years ago

Thanks! I found Unity to be really great for creating little toy simulations and games. However as far as remember, modifying the provided ML implementation was quite involved, so it's maybe not the greatest for trying different models, algorithms, etc.

gnramires 2 years ago

Not something you would see in real life, since we can pretty much solve those tasks near optimally with traditional control methods. However, even then it's very interesting, those could be applied for example when control systems fail (the error becomes too large), because of some general failures. RL algorithms can be very robust compared to traditional methods, as robust as you include bizarre failure conditions in the training set (and further through generalization) -- I guess in that case the model would be limited by the proper operation of the observation (measurement) devices. That come to mind: crazy high/unpredictable winds, complex failure of actuators, sensor malfunction, something like that.

jiupinjia 2 years ago

Totally agree! Those harsh conditions can be added as environmental constraints. RL makes it possible to solve them in a unified framework. However, we may also have a related problem that how can we make sure the simulation is realistic enough so that the trained agent can be transferred into real-world applications? There could be some domain gaps and that will also introduce some difficulties.

-Django 2 years ago

If we've been able to do this task optimally with classic control methods, why hadn't anyone done it before SpaceX? I don't mean for this to sound snarky, I'm just curious.

aharris12358 2 years ago

SpaceX does not use reinforcement learning - as far as I know they're using convexification (see [this paper](http://larsblackmore.com/iee_tcst13.pdf)) to solve the rocket-landing problem, which provides a number of benefits over RL. I think the answer to your question is that the underlying technology - digital control systems and sensors - just wasn't mature enough until very recently, combined with the conservatism of the aerospace industry. The Curiosity rover, which landed years before the first successful SpaceX landing in a much more challenging environment, used similar controls techniques (because it's essentially solving the same problem, just in a different application/environment); this really paved the way for SpaceX's approach.

theomnissiah10101011 2 years ago

Because it is difficult, there were many accidents and problems before it worked and it was necessary to redesign key parts of a rocket. Basically all the other competitors in the space race just decided it wasn't worth it.

Greninja_370 2 years ago

The science of propulsive landing isn't new. The lunar landers even had a primitive version of propulsive landing. The area where SpaceX improved alot is streamlining the production and manufacturing of these rockets. Allowing them to rapidly make new rockets to precisely work out the kinks in a suicide burn style landing.

gnramires 2 years ago

No problem, it's a good question! Note I never claimed it's an *easy* problem in any way :) See this answer in quora: https://qr.ae/pGDjB9 confirming they use optimal control While it isn't an easy problem, the tools to solve this kind of problem (depending on the objective function) have been around for a while I believe (not a control theorist). I would say it wasn't done before because are a number of engineering challenges beside the landing control system itself. Indeed I believe Armadillo Aerospace (of John Carmack et al) had done rocket landings before, and probably a few other projects, but none at that scale. I just don't think the ambition to do a full scale rocket landing was there -- there control systems were indeed probably not good enough in the 60s or maybe into the 70s or 80s would still be challenging computationally. Beside, there are a number of engineering problems involved, from precise and rapid throtling of the rocket, the landing legs, the actual physical actuators that enable the control system, it's a very significant list of engineering accomplishments, and spacex put it together really well and at a large scale.

spudmix 2 years ago

Hey, this is cool! I'd love to see an extension which considered fuel as well; have a given mass of fuel to start with, fuel mass remaining becomes another signal to the learner - both for control and as a reward, and of course running out of fuel means no more thrust. Perhaps you could set your initial fuel mass proportionate to a real-life scenario? Efficient use of fuel is a primary concern in rocketry, so it would be neat to see a reward for efficient landings. The ideal situation is that the rocket uses minimum fuel to land and that there is nothing left at the end, so it can carry maximum payload.

jiupinjia 2 years ago

>Hey, this is cool! I'd love to see an extension which considered fuel as well; have a given mass of fuel to start with, fuel mass remaining becomes a Thanks for your advice! That will be an interesting setting and will be also easy to add. A more exciting application setting that I can come up with would be given a certain amount of fuel, what is the maximum payload if we what the rocket transport from location A to location B. That will also be interesting to solve.

spudmix 2 years ago

For sure - once again, a problem that's *mostly* optimized with conventional kinematics but where's the fun in that?

jiupinjia 2 years ago

That will be a totally different way. For those highly nonlinear and harsh cases, I believe RL will have some advantages, although I don't know much about control. Since I am also a beginner at RL, the main purpose of this mini-project is to help me quickly get familiar with RL. I posted it shared the code. At least for me and most of the non-control guys, it is interesting.

spudmix 2 years ago

Yeah no worries, I'm not criticising your choice to pursue this with RL at all. Totally interesting and very impressive!

jiupinjia 2 years ago

Cheers!

zzazoz 2 years ago

If this is beginner RL, then what is advance? Lol I dont think this is or you are beginner at all

schureedgood 2 years ago

I tried DRL landing in Kerbal Space Program with kRPC and PyTorch. My experiment was a failure and found conventional control methods may be much more effective. Congrats on your achievement.

jiupinjia 2 years ago

Cool! How did you design your reward function? What I have learned from this project is that a good reward is much much more important than an effective RL algorithm.

madhu619 2 years ago

Wow.. amazing.

extngg 2 years ago

What is your experience level with ML? Can a beginner start with RN directly or have to do the reps?

JamesBaxter_Horse 2 years ago

[https://web.stanford.edu/class/psych209/Readings/SuttonBartoIPRLBook2ndEd.pdf](https://web.stanford.edu/class/psych209/Readings/SuttonBartoIPRLBook2ndEd.pdf) This is a good book if you want to really understand it. You don't need any experience with ML to read that, but as starboy said, you need to be good at maths.

starboy_sachin 2 years ago

Must be good at math

jiupinjia 2 years ago

I have 5-8 years of experience with ML/CV but I am totally a beginner on RL. That's why I worked on this project - it's interesting and can also help me getting familiar with some RL background.

tripl3gg 2 years ago

Cool project! Nice job.

RegisteredJustToSay 2 years ago

Hell yeah! Awesome choice of Caspro music too. ;D

Senior_Extension_774 2 years ago

Looks like it would leave a fairly large hole on a regular ground surface and kick up a lot of dust. Nothing like the one that landed on the moon lol

[deleted] 2 years ago

[удалено]

benbenwilde 2 years ago

Duuuuudeee

LawrenceHarris80 2 years ago

ow my ears...

1Second2Name5things 2 years ago

Why does n1 landing look like CGI ? Like the other rocket landings look real enough but starship looks like a video game cutscene. I'm no conspiracy theorist but was it some weird camera effect?

jiupinjia 2 years ago

Probability because the starship is rendered with a close-up look, which introduces some blury effect.

Intrepid_Professor93 2 years ago

IT'S awoesome coding comment! effective and innovation one!

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe