voicify

064 - Jason Fields: Combining Voice and Visuals - Multimodal

Hear this podcast anywhere:

Jason Fields, Chief Strategy Officer, Voicify

Jason Fields, Chief Strategy Officer, Voicify

Jason Fields is Chief Strategy Officer at Voicify, a top CMS (content management system) for designing voice experiences on Alexa and Google Assistant. Jason and Emily discussed the meaning of multimodal design for voice assistants and why this kind of conversation design matters. How can brands create experiences for customers to interact with a voice assistant from different devices with varying screen sizes or no screen at all? It’s all about context.

Overall, the question becomes: How do we connect and organize a variety of communicable assets in a way that meets basic (and reasonable) audience expectations? Jason and Voicify have created a free downloadable guide about modality for brands.

Topics:

Amazon Echo Show 5 - an entry level multimodal smart speaker (voice + visual)

Amazon Echo Show 5 - an entry level multimodal smart speaker (voice + visual)

  • Multimodality in voice experiences

  • Johnnie Walker tasting Alexa skill - good example

  • Saucony is doing a nice job in audio responses and visual components with emotive vs instructive images in specific parts of the conversation (this is sensitivity to multimodality)

  • Images should match the conversation tone (e.g. a dispassionate conversation about product features should be accompanied by a feature set image, not models wearing the product out in the world)

  • “How do I get to your store?” should show a map - seems obvious but isn’t being done often enough

  • Use case for multimodal experiences: a woman getting ready for a flight. The experience could contain or present information to assist customer with: packing, organizing, car service, check flight time, traffic, terminal location, gate, TSA status, etc. - all the devices information can be displayed should take advantage of screen space and contextual data such as location.

  • Voicify can detect type of voice assistant device (such as Echo Auto or smart TV or mobile phone or smart watch) and respond appropriately based on context and device, even offering secondary information such as gate update

  • Key: suss out what information is most useful to user at that moment and how best to present it visually and with sound: first, map user intention

  • Brands have been assembling digital assets for twenty years: we have vast libraries so it should be simple to assign a framework to these assets

  • Jason’s podcast recommendation: Armchair Expert with Dax Shepard

Connect with Jason and Voicify: