Buster - A Voice Controlled Raspberry Pi Robot Arm

Overview

Buster is a fully voice interactive robot built around a tabletop robotic arm. He acts on commands given in spoken English, such as: LIFT THE ARM TWO CENTIMETERS or CLOSE THE GRIPPER. He also answers questions about his status, such as HOW HIGH IS THE ARM or WHAT IS THE ANGLE OF THE ELBOW.

Buster accepts considerable variety in the syntax of the spoken commands, recognizing that LIFT THE ARM TWO CENTIMETERS is roughly equivalent to MOVE ARM TWENTY MILLIMETERS UP. Buster's responses also include a variety of messages about the system. For example, he will tell you if the command given would put the arm outside its range of motion and will not follow the command. As a bonus, Buster will also engage in brief conversation. He can answer questions from a small knowledge database. He can also act as a talking calculator and give the answers for some basic calculations.

As a project, Buster is both affordable and accessible. He is built around the Raspberry Pi and other inexpensive, garden variety hobbyist parts, and programmed using PocketSphinx and other open source tools and libraries. To that end, he represents not only a certain level of personal achievement, but also the truly exciting state of the robotics hobby. With the availability of increasingly capable Linux-based single board computers and mature libraries, robots with solid speech recognition and synthesis are now within every hobbyist's reach.

Buster in Action



Configuration Details

Electronics

  • Raspberry Pi 2 running the Raspian Jessie operating system
  • Arduino (ATMEGA328P in a "barebones" setup)
  • Logitech C270 USB Webcam (currently only using the microphone)
  • Generic powered speaker

Mechanical Hardware

  • MeArm v 0.4

Software/Programming

  • C++ 4.9 - GNU Compiler Collection (GCC)
  • PocketSphinx library for voice recognition
  • Flite library for speech synthesis

Natural Language Processing

Buster's natural language processing capabilities can be broken down into two major, distinct phases. The first phase is speech recognition, during which speech captured by the microphone is analyzed and decoded to text. The second phase is parsing, during which the decoded speech is further analyzed for its syntax in order to deduce its meaning.

For speech recognition, the system relies on the PocketSphinx decoder and related resources from the CMUSphinx speech recognition toolkit - a free and open source library from Carnegie Mellon University. Speech recognition technology remains a tricky business, and selecting a tool was not a cut-and-dry process. PocketSphinx seemed to strike a good balance among various priorities. I specifically wanted the project to be self contained, which ruled out a whole range of web-based speech APIs where the actual decoding is handed off to a remote server (including Google's famous free tool). The Sphinx project has a long history - more than 20 years - and is under active development with a large user base. PocketSphinx is specifically designed to target smaller, embedded systems. The system is very Linux friendly and there are a number of other successful Raspberry Pi projects demonstrating its use and effectiveness on the platform.

Although PocketSphinx is capable of recognizing any word by its phonetics, the implementation is not based on an unlimited vocabulary. Rather, a corpus of phrases and sentences is developed around the anticipated domain in which the system will operate. With PocketSphinx, speed is related to the size and complexity of the corpus, and accuracy is related to the quality of the corpus. Building and optimizing a corpus becomes a big undertaking.

Buster's speech parsing routines make up a large part of the project's uniqueness. The parsing relies heavily on the text analysis and manipulation capabilities of Regular Expressions - often abbreviated regex. Regular Expressions is sometimes described as a language within a language, with many programming languages offering extensions and support for regex use. Regexes excel at teasing patterns from strings of text - a very common real world example would be validating that an e-mail address entered on a web form is in the proper format.

Within Buster's command parser, regexes are put to work sifting through the sequences of words looking for syntax patterns and extracting key information for processing. For example the following regex will match a great variety of potential commands:

(^MOVE|^ROTATE) (THE |)(ARM) (TO THE |)(UP|DOWN|LEFT|RIGHT|FORWARD|BACKWARD) (.*) (CENTIMETERS|CENTIMETER|MILLIMETERS|MILLIMETER|DEGREES|DEGREE)

That regex would match both MOVE ARM UP FIVE CENTIMETERS and ROTATE THE ARM TO THE LEFT TWENTY FIVE DEGREES. In addition to matching, the Regular Expression engine also captures the items grouped with parenthesis. With the application of some additional logic, the incoming command can be translated to a structure that looks somewhat like this:

Command:  ROTATE LEFT
Object:   ARM
Quantity: 25
Units:    DEGREES

Speech Synthesis

Buster's speech synthesis is provided by the Flite library, which like PocketSphinx hails from Carnegie Mellon. Flite is a lightweight version of the more substantial Festival speech synthesis tool, intended for small, embedded systems. The library is very easy to install and use. While it is possible to dig deeper, speech synthesis can be accomplished simply by including Flite in your program and passing text to be spoken. There are a handful of voices to choose from and the possibility of adding more. While the translation from text to speech is excellent, the actual quality of the voices might be slightly disappointing, but that is part of the trade for efficiency.

It's noteworthy that for both PocketSphinx and Flite, one of the most challenging aspects of the installation process has proven to be the integration with the actual sound system for microphone input and speaker output. In the course of several months I've installed the packages on Raspian Wheezy, Raspian Jessie and my desktop Ubuntu, each install with slightly different time consuming trip wires related to configuring ALSA (Advanced Linux Sound Architecture).

Interfacing The Arm

Buster's Arm is built from the open source MeArm, ordered as a kit from the creator/manufacturer Phenoptix (now MeArm Robotics). The kit includes all of the pre-cut acrylic parts and four standard 9-gram hobby servos. To interface with the arm I use an Arduino - actually a bare ATMEGA328P microcontroller setup as an Arduino and programmed with the Arduino IDE. While there are a number of libraries which would enable the Raspberry Pi to drive servos directly through the GPIO pins, for the moment I'm taking the general strategy in Pi projects of offloading GPIO tasks to microcontrollers. I'm linking the Arduino and Pi via SPI using a method I describe in some detail here: Raspberry Pi to Arduino SPI Communication.

It required some work to figure out a good approach to controlling the arm movements. Technically all of the possible locations of the gripper can be described as a position in X-Y-Z coordinate space. However, in giving a command to Buster, I (or any other user) wouldn't necessarily know that position. So I decided it made more sense to give the arm commands relative to the current position, such as MOVE THE ARM UP TWO CENTIMETERS. I also decided, somewhat arbitrarily, that while up/down and forward/backward operations should move in a straight line, left/right operations would move in the natural arc formed by rotating the arm on its base.

With all the links and pivot points, at first glance it looks as if working out the movements of the arm might be hopelessly complex. Fortunately, with the parallel nature of the links, once you begin decomposing the mechanism things become much simpler. One chain of links (connected by the triangle link at the top) serves only to keep the gripper parallel to the ground and for most purposes can be removed from the calculations. As the remaining links are also in parallel, when all is said and done the correct angles for the "shoulder" and "elbow" servos for any given height and extension of the arm can be calculated using a little trigonometry to solve just two triangles.

Conclusion

Hopefully this project write-up has given you some inspiration for your own projects. Buster remains in active development with a roadmap that includes such goodies as visual object recognition and the ability to take extensive multi-statement commands. Of course as with any tech roadmap, the individual tasks are easier said than done. In the meantime I'll be working over the coming months to add more pictures and videos, as well as some more detailed write-ups and related tutorials.

January 28, 2016

Links and Resources

Buster's Project Page at RobotRebels.org

CMU Sphinx including PocketSphinx

Flite

MeArm Robotics


About the Author: Ralph Heymsfeld is the founder and principal of Sully Station Solutions. His interests include artificial intelligence, machine learning, robotics and embedded systems. His writings on these on other diverse topics appear regularly here and across the Internet.


Other Posts

An Arduino Neural Network
An artificial neural network developed on an Arduino Uno. Includes tutorial and source code.

Haar LBP and HOG - Experiments in OpenCV Object Detection
I've spent some time lately coming up-to-speed and playing with OpenCV - especially the object detection routines. Three that caught my eye for further investigation were Haar Cascades, Local Binary Patterns (LBP), and Histogram of Oriented Gradients (HOG).

iCE40 and the IceStorm Open Source FPGA Workflow
Project IceStorm is the first, and currently only, fully open source workflow for FPGA programming. Here, the software and hardware are discussed and a small sample project implemented.

Back to Basics
After spending quite a while exploring various approaches to walking robots and other mechanical conundrums, I'm turning my attention to machine learning and building a simple but robust platform to experiment with neural networks.

Migrating to the 1284P
The ATMEGA1284P is one of the more capable microcontrollers available in the hobbyist and breadboard-friendly 40-pin PDIP package. Here I discuss migrating the neural network project to the 1284p to take advantage of its relatively generous 16K RAM.

Getting Up and Running With a Tamiya Twin-Motor Gearbox
Tamiya makes a full line of small gearbox kits for different applications that are capable for their size and an easy, economical way to get a small to medium size wheeled robot project up and running.

Flexinol and other Nitinol Muscle Wires
With its unique ability to contract on demand, Muscle Wire (or more generically, shape memory actuator wire) presents many intriguing possibilities for robotics. Nitinol actuator wires are able to contract with significant force, and can be useful in many applications where a servo motor or solenoid might be considered.

Precision Flexinol Position Control Using Arduino
An approach to precision control of Flexinol contraction based on controlling the voltage in the circuit. In addition, taking advantage of the fact that the resistance of Flexinol drops predictably as it contracts, the mechanism described here uses the wire itself as a sensor in a feedback control loop.

LaunchPad MSP430 Assembly Language Tutorial
One of my more widely read tutorials. Uses the Texas Instruments LaunchPad with its included MSP430G2231 processor to introduce MSP430 assembly language programming.

K'nexabeast - A Theo Jansen Style Octopod Robot
K'nexabeast is an octopod robot built with K'nex. The electronics are built around a PICAXE microcontroller and it uses a leg structure inspired by Theo Jansen's innovative Strandbeests.



Home