RoboAlan

The natural evolution of the Boosto project, I guess:

It’s a boosto of a friend of mine who happen to be a comedian too. So I really thought it would have been fun to have a speaking boosto, which meant I had to first model the 3d face, and then learn some 3d modeling skills to accomodate a servo motor to move the mouth, a couple of led for the eyes, and a speaker, all connected to a raspberry pi in the base, internet controlled to speak with my friend’s voice.

The project has had its challenges, mainly fitting everything together and getting a very low quality servo to move properly a poorly 3d designed “mouthpiece”. It’s been a journey but everything worked out in the end, and this thing not only has been operating by mimiking my friend over the last few weeks, but also featured and had some lines in the last comedy show of the season: incredible stuff.

Technically speaking, let’s go through the various steps of this little project, which took on and off about one moth to complete.

I first 3d modelled the whole thing, starting with the boosto ai generated head model. I then carved a hole inside, and experimented with different designs to cut the mouth and have it driven by a servo motor. The latest design was this:

After designing it, multiple prototypes were needed to test the idea and make sure everything was working. Working with real stuff can be pretty tricky

The head was designed to be open, so you could do maintenance and fit the speaker in the upper part.

After this, the base was designed to host the raspberry with some holes to pass the cables to the head with the speaker, servo and leds for the eyes.

After too many tests, I finally started writing some software on the raspberry that would be fit in the base. I wrote two simple services, the first listened on the audio card output (connected to the speaker) and opened the mouth kind of in sync with the audio – I actually experimented multiple ways of doing it in the end realizing I didn’t need all this complexity and a simple open/close mouth while some audio is playing was enough. The second was polling a web server of mine where with another simple webapp you could upload text or audio. if it was text, it would use espeak to TTS it in real time. Fast, effective, but low quality very robotic voice. If you uploaded audio, it was streaming it and playing it on the speaker directly. So I had a basic way for controlling it remotely, and it was working! But a new issue appeared:

The original Alan suggested RoboAlan could feature in his comedy show. Which was amazing, but it needed some more realtime-ish way of say something. The current setup with the poll and the streaming part was quite slow, sometimes taking up to 8 seconds to actually play the thing after you clicked it in the app. So the day before the comedy show, we spent a couple of late night hours to fix this by introducing client side caching of the audios (streaming was pretty costly, especially with bad connectivity which we were expecting at the comedy venue), and even some conversion from compressed audio like ogg/mp3 to uncompressed wav, trading disk space (we had plenty) with cpu cycles – a bit more constrained on a outdated raspberry pi I had sitting in my office.

Performance day was incredible, everything worked perfectly even if with still some delays probably due to the poor wifi setup we used.

I will try to update this post with a couple of videos when I can