Amazon Alexa, meet Logitech Squeezebox

Posted November 2016


Alexa

Alexa is currently the hot topic (especially here, after the much-delayed UK launch), Amazon’s Alexa providing rich ground for consumer voice-driven AI in areas previously unthought of.

Their API, whilst a little smaller than I’d hoped is growing fast and imagination is often the bottleneck hence perhaps their bribery financial encouragement to developers to publish to the Skill Store.

Home automation

Having spent too goddamn much a significant amount of my free time over the last few years automating almost everything I can at home, I jumped at the chance for a more futuristic interface. No matter how many nicer apps and interfaces I install, my internal UX voice weeps a little when witnessing the cognitive load to, say, turn on a light:

  1. switching on the phone
  2. authenticating
  3. avoiding distracting notifications
  4. finding and loading the app
  5. navigating an array of categories and switches
  6. interacting with the right one
  7. …and then verifying the results back in the real world (as it’s not quite reliable enough still).

So it’s ripe for a more natural UI, so how about voice control? Plenty of work is being done in the Smart Home skills area (a big enough topic for another post), but music being my other passion, having this aspect directly controllable was one of the first things I wanted to do. My home audio systems are all run using Squeezebox on Synology…

Squeezebox

Squeezebox Duet receiver

Logitech’s squeezebox, now cruelly abandoned as a product, was/is a much-loved pioneer in the Home Network Audio market. A lot of what attracted people to it (rather than, say, Sonos) is the active community and the Open-Source model. The Perl, not so much for me, but the server is very reliable and provides far more music metadata correctness than most (think: multiple tags, i18n support, good classical music support - composer, performer etc) plus reliable low-latency syncing.

In the past I’ve integrated this into Quod Libet, a project I’ve been involved in for some time, so I already had some spare code / knowledge of the Squeezebox TCP/IP API.

Writing the Skill

Platforms

I chose Python 2.7: Node is too, err, well, I don’t like plain JS very much at all (though ES6 is a lot nicer). Java 8 could be good, but the JVM startup hardly suits the concept of Lambda really. Hopefully Amazon will add Python 3.x as a runtime soon.

Going native: Alexa AudioPlayer interface

What’s not fun is trying to say Alexa, ask Squeezebox to set shuffle on. Try it!. Go on. It’s almost tongue-twister territory. Aware this isn’t much better UX than my phone situation, I was fascinated to see whether there was scope for (ab)using the Alexa AudioPlayer Interface, to use “native” Alexa utterances. TL;DR: there is scope, I did, woohoo. It’s much nicer to just say Alexa, pause.

Packaging Lambdas

As soon as the lambda got bigger I realised using the Web UI was a pain. You can upload a zip, but this is a bit GUI for any “real developer”’s workflow. Enter Rackspace’s lambda-uploader(https://github.com/rackerlabs/lambda-uploader/) which is a great tool for automating this process, including support for pip and virtualenv

However it does tend to push everything including the kitchen sink (it pushes pip and dependencies themselves) - my project went from a few KB to a few MB. A fork of this and a small fix got back to the fast turnaround, and in doing so I pushed a small bugfix PR for excluding files.

Security: hard, and not always fun

On the advise of a wise colleague, I chose to use the excellent, but lesser-known stunnel to encrypt and sign this communication. It’s now running as a service on my Synology, thanks to Majikshoe’s article on using Upstart on Synology, having installed ipkg and installed the ipkg package.

On the client side, I tried doing this at a higher level, but ended up needing to get down and dirty with SSL sockets directly in Python. Gnarly but fun, and I learned a lot in the process. This did mean I had to ditch any existing libraries for Squeezebox (pylms) as there would be too much hacking necessary to swap out the telnetlib dependency.

Overall this architecture feels like the right way to go for my situation, yours may of course vary. Others have / are approaching similar problems (connecting Sonos or Squeezebox to Alexa via The Cloud) in varying ways:

  • HTTP(S) Node Proxy to forward requests (probably among the best / easiest solutions)
  • Use Amazon SQS to push messages on a bus (interesting, but I worry about the latency, and hassle of setting this up)
  • Plugin for LMS that exposes a port for JSONRPC, having done some auth / validation.

Where now

Well, the plan is:

  • Push it all to Github, or BitBucket (publicly)
  • Keep developing more Squeezebox integration including better voice commands (Alexa, tell Squeezebox to play me some Jazz and Hiphop might be nice)
  • Document how others might use it (this post I guess is a starting point).
  • Think about a publishable, multi-user skill. This would need to store user details, which makes me very uneasy, unless perhaps some kind of Alexa OAuth 2 flow in an LMS plugin could abstract this problem away. Or maybe not, I’m not sure:

    OAuth2 Flow

    OAuth2 Flow

I get that it’s not a viable or user-friendly setup, but I’m confident the combination of security, low-ish latency, and native utterance support is an attractive proposition. But seeing it written down reminds me it was harder than I thought…