Skip to main content

Reinforcement Learning for Autonomous Exoplanetary Landing Systems


It's been a while since my last post, but I've been working on a pretty big project. Please note that this post is going to be different from usual. I won't be containing any specific code for this completed project since that would make this post far longer than I intend. Instead, I will simply be discussing the theory behind my project and the results I acquired. Since I'm mostly switching the focus of my projects from computer science to physics, expect most of my future posts to follow this format - more theory and less code.

As long-range autonomous space exploration becomes more prevalent, the need for efficient and reliable autonomous exoplanetary landing systems, especially for previously unmapped terrain, is becoming increasingly crucial. To address this need, I proposed a novel approach to training autonomous space vehicles to land on variable terrain using value-based reinforcement learning techniques. In my experiment, I generated terrain procedurally from a noise function and demonstrated the effectiveness of the proposed approach in allowing autonomous spacecraft to land in remote locations.

Compared to existing self-landing autonomous space vehicles, which primarily rely on pre-programmed trajectory planning, the proposed approach enables greater flexibility and adaptability in responding to unforeseen situations. 

The simulated lander is a robust single-stage spacecraft, specifically designed for autonomous exoplanetary landing. It has one main engine and 16 RCS thrusters mounted in 4 clusters. If you haven't noticed yet, it's meant to strongly resemble the LEM from the Apollo missions, but smaller and lighter. The actual specific design of the lander shouldn't matter too much, though, since this general theory is meant to be applicable to lots of different spacecrafts.

Unity, the Unity MLAgents package and PyTorch were used to train the lander. For procedural terrain, I used generative meshes and an iterative, octave-based Perlin noise algorithm, with a total simulated terrain size of 2500 square meters (see my previous post on procedural terrain.) A new terrain mesh was generated each episode using a randomized seed. The agent was trained in a -1.62m/s^2 gravitational field, simulating lunar gravity.

The reinforcement learning process is a Monte-Carlo algorithm in the form of a Markov decision problem. The policy
π:SA
The reward for each episode is given by
ri=rL+rFrXrVrB
where rL is the reward for a collision on the bottom of the landing legs, rF is the reward proportional to the remaining fuel after landing, rX is the penalty for collisions with the chassis, rV is the penalty for the velocity on landing, and rB is the penalty for leaving the bounds of the experiment. The state-value function is given by
Vπ(s)=Eπi=1Tγii1risS
and the Q function is given by
Qπf(s,a)=(1αL)Qπ0(s,a)+αL[ri+1+γimaxa(Qπ0(s,a))]
It total of thirty-six hours with time acceleration and parallel processing, for the model to train on my local server, with 1 million training episodes completed in total. I received the following results.

The final success rate was 94.6%, which is certainly less than ideal for a real-world space mission, but I believe this number could be increased with more training time, better hardware and better optimization. Nevertheless, there is a strong positive correlation in the average reward over time. I hope to continue this project by implementing imitation learning in the future and a convolutional neural network to generate more realistic procedural terrain from existing data.

Thanks for reading!

Comments

Popular posts from this blog

Emotion Classification NN with Keras Transformers and TensorFlow

  In this post, I discuss an emotional classification model I created and trained for the Congressional App Challenge last month. It's trained on the Google GoEmotions dataset and can detect the emotional qualities of a text. First, create a training script and initialize the following variables. checkpoint = 'distilbert-base-uncased' #model to fine-tune weights_path = 'weights/' #where weights are saved batch_size = 16 num_epochs = 5 Next, import the dataset with the Hugging Face datasets library. dataset = ds . load_dataset( 'go_emotions' , 'simplified' ) Now, we can create train and test splits for our data. def generate_split(split): text = dataset[split] . to_pandas()[ 'text' ] . to_list() labels = [ int (a[ 0 ]) for a in dataset[split] . to_pandas()[ 'labels' ] . to_list()] return (text, labels) (x_text, x_labels) = generate_split( 'train' ) (y_text, y_labels) =...

Pure Pursuit Robot Navigation Following Interpolated Cubic Splines

I've been working to improve my school VEX team's autonomous code for my robotics class, and have created a pure pursuit robotics PID that I thought I would share. The code here is in Python, and I'm only using matplotlib as an output to visualize the robot's movement. However, I do hope to rewrite this code in C++ soon, getting a movement vector output which will then be applied to a VEX robot. First is the spline class. I'm currently using a simple parametric cubic spline class. Keep in mind that this is a really  bad way to implement splines, as it demands increasing x-values along the domain which isn't ideal for a robot's path. I am definitely going to rewrite all of this in the future to have a periodic domain, but I thought I would share what I have right now anyways because it might be usef A spline is defined as a piecewise function of polynomials, and in the case of a cubic spline, the polynomials of choice are cubic polynomials. Therefore, the fir...

Exploring Active Ragdoll Systems

  Active ragdolls is the name given to wobbly, physics-based character controllers which apply forces to ragdolls. You may have seen them implemented in popular games such as Human Fall Flat  and Fall Guys . This post introduces a technique I developed to create active ragdolls for a personal project, implemented in Unity. The system I will demonstrate is surprisingly simple and only requires a small amount of code. Unity has these beautiful things called Configurable Joints , which are joints that can, as the name suggests, be configured, with simulated motors on the X and YZ axes providing force to the joints. What we can do with this is map the motions of a regular  game character with an Animation Controller (an "animator clone") to our active ragdoll. Doing this means we only have to animate the animator clone for the active ragdoll to automatically be animated with it! Firstly, I created a ragdoll from a rigged character. (Side note: Mixamo is a great tool to q...