By Torie Wells

A wide-eyed, soft-spoken robot named Pepper motors around the Intelligent Systems Lab at Rensselaer Polytechnic Institute. One of the researchers tests Pepper, making various gestures as the robot accurately describes what he’s doing. When he crosses his arms, the robot identifies from his body language that something is off.

“Hey, be friendly to me,” Pepper says.

Pepper’s ability to pick up on non-verbal cues is a result of the enhanced “vision” the lab’s researchers are developing. Using advanced computer vision and artificial intelligence technology, the team is enhancing the ability of robots like this one to naturally interact with humans.

“What we have been doing so far is adding visual understanding capabilities to the robot, so it can perceive human action and can naturally interact with humans through these non-verbal behaviors, like body gestures, facial expressions, and body pose,” said Qiang Ji, professor of electrical, computer, and systems engineering, and the director of the Intelligent Systems Lab.

With the support of government funding over the years, researchers at Rensselaer have mapped the human face and body so that computers, with the help of cameras built into the robots and machine-learning technologies, can perceive non-verbal cues and identify human action and emotion.

Among other things, Pepper can count how many people are in a room, scan an area to look for a particular person, estimate an individual’s age, recognize facial expressions, and maintain eye contact during an interaction.

Another robot, named Zeno, looks more like a person and has motors in its face making it capable of closely mirroring human expression. The research team has been honing Zeno’s ability to mimic human facial communication in real time right down to eyebrow – and even eyeball – movement.

Ji sees computer vision as the next step in developing technologies that people interact with in their homes every day. Currently, most popular AI-enabled virtual assistants rely almost entirely on vocal interactions.

“There’s no vision component. Basically, it’s an audio component only,” Ji said. “In the future, we think it’s going to be multimodal, with both verbal and nonverbal interaction with the robot.”

The team is working on other vision-centered developments, like technology that would be able to track eye movement. Tools like that could be applied to smart phones and tablets.

Ji said the research being done in his lab is currently being supported by the National Science Foundation and Defense Advanced Research Projects Agency. In addition, the Intelligent Systems Lab has received funding over the years from public and private sources including the U.S. Department of Defense, the U.S. Department of Transportation, and Honda.

What Ji’s team is developing could also be used to make roads safer, he said, by installing computer-vision systems into cars.

“We will be able to use this technology to ultimately detect if the driver is fatigued, or the driver is distracted,” he said. “The research that we’re doing is more human-centered AI. We want to develop AI, machine-learning technology, to extend not only humans’ physical capabilities, but also their cognitive capabilities.”

That’s where Pepper and Zeno come in. Ji envisions a time when robots could keep humans company and improve their lives. He said that is the ultimate goal.

“This robot could be a companion for humans in the future,” Ji said, pointing to Pepper. “It could listen to humans, understand human emotion, and respond through both verbal and non-verbal behaviors to meet humans’ needs.”