Your daily selection of the hottest trending tech news!
According to The Register (This article and its images were originally posted on The Register July 30, 2018 at 06:39PM.)
Human hands are surprisingly dexterous. Difficult tasks like knitting or playing piano might take some practice, but we can get the hang of it much quicker than robots can based on the latest findings.
Researchers at OpenAI trained a robot system, Dactyl, with about a hundred years of simulated experience to teach it how to rotate a cube into different orientations. Dactyl uses a Shadow Dexterous Hand, a robot hand complete with five fingers, force sensors and has 24 degrees of freedom – pretty close to a human’s 27 degrees of freedom.
Here’s a video of Dactyl in action. The cube contains a different letter and colour on each of its six faces, and it has to figure out how to manipulate the cube so that it matches the correct configuration given to it.
Over time, it discovers certain techniques often used by humans, like gripping the cube between the thumb and little finger and spinning the cube around with its other fingertips.
The perils of machine learning
What’s more interesting, perhaps, is the way Dactyl was trained. Deep learning algorithms have helped scale up the process of training in perfect simulated environments, but it’s often difficult to apply that to the messy real world.
Dactyl, however, was more robust. Despite being trained only via simulations, it was able to directly transfer what it learnt to a real humanoid hand. The trick is use a method dubbed “domain randomization.” It’s nothing new and other researchers have been exploring this for a while as a way to close the simulation to reality gap in robotics.
There is still a bit of a gap, however. The robot performed better in simulation with a median of 50 successes compared to 13 in the physical set up, according to a results published in an arXiv paper.
The robot is trained on a range of simulated environments where some of the variables such as surface friction, size of the object, lighting conditions, hand poses, textures and even the strength of gravity are changed randomly.
“Randomized values are a natural way to represent the uncertainties that we have about the physical system and also prevent overfitting to a single simulated environment. If a policy can accomplish the task across all of the simulated environments, it will more likely be able to accomplish it in the real world,” Open AI explained in a blog post.
Dactyl was able to rack up so many hours of experience in a such a short time by using Rapid, a system that trains 384 “worker machines” each with 16 CPU cores with the Policy Policy Optimization (PPO) algorithm. Each worker machine is a simulation of the Shadow Dexterous Hand that trains from a distribution of randomized simulations.
A general training system
The system is trained on two different neural networks: one tracks the cube’s position from images, and the other predicts the future rewards given its current state. PPO uses reinforcement learning, and Dactyl learns the best strategies to manipulate the cube by chasing points as it completes tasks, with a five-point bonus for success and a 20 point penalty for failure.
Using Rapid and PPO together was similar to the setup used to train OpenAI’s Dota bots, but applied to a different architecture and environment with tweaked hyperparameters.
“After we saw the success of the Dota team with their 1v1 bot, we actually asked them to teach us the ways of Rapid, and we reached parity with our previous learning infrastructure – which we’d spent months building – after only a couple of weeks,” Jonas Schneider, member of the technical staff at OpenAI, told The Register.
“Still, we were pretty surprised to see that we can even use the exact same optimizer code, and treat Rapid as a black-box optimizer for a simulation problem that’s completely different from the Dota problem it was developed for.”
At the moment, Dactyl can’t do much beyond rotating objects. It can do this with different shaped objects like a octagonal prism, but struggled more with spheres.
- Got any news, tips or want to contact us directly? Feel free to email us: email@example.com.
To see more posts like these; please subscribe to our newsletter. By entering a valid email, you’ll receive top trending reports delivered to your inbox.