Voice OS

Live music for Voice

We were commissioned by the BBC to work on the BBC Skill for the Alexa smart speaker and working on numerous different concept pieces Crafting a voice experience is about crafting conversation, but mostly the design of a voice interface is designing how to deal with failure user failure.

The complexity arises based on the number of options that the user can be provided. Traditionally when it comes to interaction design, a designer can rely on some sort of mental model as a point of reference to apply and visualise hierarchy around, when you’re constructing a conversation it’s not about heroically, its about framing the conversation.

 

When we engage with a smart speaker you immediately use the wake word, and then you’re directed to what we would call a direct intent “Ask the BBC to [x]”. This, of course, is after the user has decided to engage with the Alexa skill – the initial always-on functionality is “Alexa open the BBC Skill” but once they’ve reached that far, we introduce a sting — an intro clip outlining what the BBC Skill can do.

 

We set the terms of the conversation, and direct them forwards from there.

As an example, I’m going to use the concept skill we developed for live music events, focusing on BBC Glastonbury 2020.

We went through a process of research that identified a commonality across all music festivals, they all have wayfinding of some kind, the key elements being the stages and the talent for the music festival is heavily marketed, so when they came to using the event skills, we wanted to frame the conversation around stage or artist.

The complexity there was dealing with situations when the music wouldn’t be playing, and what does the system respond with.

 

We tested the hypothesis for the best case based off of a top-level user flow that branched always around stage choices and tested variations of how the user might want to experience the stages in audio format with the direct intent being “Alexa Ask the BBC, what’s on at Glastonbury” and it would play audio previews of what is currently playing, and then a prompt asking the user if they would like to play an artist or hear the stages again.

(“Ask the BBC” is an Amazon-specific requirement to activate a specific skill)

 

It met pass criteria out of the 5/6 participants tested with a simple Mechanical Turk of a playlist of sound recordings, based on the users choice and additional prompts with questions to gauge if the users understood what the system was doing.

We looked at dividing stage activity into Pre/Interval, Live, and closed, and had different scenarios to cater to each one. The first iteration of the skill was as rough as a prototype could be, the team prototyped it using dropbox paper, and co-created the script together to articulate the best case journeys.

Insert title

With the proposition proved we then started to do the diligence around modelling all the different scenarios around the branching logic and what happened around specific contexts as discussed previously. We looked at the wider context and how that could apply to other music festivals.

 

All events typically have a pre-event phase and a post-event phase. Those phases were then used as a framework to build out the strategic purpose of those phases and then what the user would expect from them.
Affectionally is known as the hype phase, where we frame expectations of the user during this period of interaction with the skill and how the BBC meets the user’s expectations.

We would frame historical content and run up content for headliners acts. As well as past content and lineup information and entertainment news and rumours.
Finally, after the event, we would meet users needs and represent the BBC by re-living the event, reliving key moments and maintaining the feeling through on-demand content.