Is Voice-Enabled Internet of Things Shifting From Smart Homes to Enterprises?

Contributed content / By Andrew Makarov / 31 May 2019

Voice-activated technology continues to grow in prominence and has become essential for businesses to stay relevant. Use these insights to learn how to integrate voice-enabled IoT devices in the office. 

The average person speaks about 15,000 words a day, and for a considerable percentage of the world’s population, it’s their favorite method of communication. 

Speech is convenient. There’s no need for spell check, and it’s personal. So, it should come as no surprise that according to a survey carried out by NPR and Edison Research, 53 million people in the U.S. now own at least one smart speaker.

Almost a quarter of the U.S. population (21%) uses voice-activated devices on a regular basis, and that’s not counting the other members of the household who also have access to the device. The number of smart speakers that people in the U.S. own has grown by 78% in just a year.

Number of Smart Speakers in the US Growth

Despite these stats, the world of enterprise is still reluctant to embrace a technology that, according to Amazon CTO Werner Vogels, is already a major game-changer.  

Enterprise Is Dragging Its Heels on Voice

In 2018, Globant carried out an in-depth survey of more than 600 senior-level decision-makers. The findings outlined in the 2018 Voice-Activated Technology Report demonstrate the gap between voice technology in the home and the workplace.

According to the report, 44% of senior employees use voice-activated technology in their personal lives on a daily basis, while 72% use it weekly. Meanwhile, only 31% of these same decision-makers use voice-enabled internet of things in their professional lives.

Clearly, there is a perceived difference between the use of voice activation in the home and using the same technology in the workplace.

Even more surprising is that according to the same study, 73% of companies see voice as a technology initiative but have yet to take any meaningful steps to incorporate it into their internal and customer-facing processes.

Many will point to the investment required to adopt new technology in and enterprise company as a mitigating factor. The company must spend on technology while also training their employees to use it correctly. 

However, the training aspect of this argument rings hollow as voice-enabled internet of things devices are often easy to use.

Despite the fact that voice technology offers a new user experience, it’s quite common in personal devices meaning that most employees probably use it every day already. 

What is a Voice-Enabled Solution From a Technological Perspective?

In voice-enabled technology, there’s always “a start button” or trigger. 

Most solutions have actual buttons like “press and start talking” to tell the system when to start processing the incoming voice data. Newer voice-enabled services, however, “listen” to you all the time to catch the trigger word. We all know such triggers, like “Hey, Siri”, “Ok, Google”, or “Alexa.”

Next, we have voice recognition. This is based on a special neural network using AI-powered technology. Often, this is held in the cloud due to the considerable computing power required. 

Still, this doesn’t mean voice recognition can’t be done on an IoT device. Once we get the voice stream, we then transfer it to text to extract the gist. The system then matches it with known commands (some pre-programmed skills).

Finally, it’s the execution stage. 

If you were to ask your voice assistant to turn on the light - it will send the command to whatever system is in charge of the lights. This could be named as "execution logic" and the device often has to interact with 3rd party software or middleware. 

The logic could be elaborate and based on a bunch of different voice commands (sequences), like a chatbot, but with a voice.

Based on all this, there’s no great challenge integrating voice enabling technologies to almost any existing software. Your back-end receives commands to be executed and there’s really no need to rebuild much of the existing architecture.

On the other hand, voice-enabled technologies are sensitive to hardware and external conditions. 

There’s often a stage zero for IoT: cleaning the audio stream of noise, and for some cases, it’s quite a challenging task.

Creating a Smart Mirror: IoT Solution With Voice Assistant for Corporate Usage Case Study

Using a Smart Mirror as an example, we illustrate the possible approaches and pitfalls when creating voice technology for an enterprise company.

Smart Mirror

There are two ways to create voice-technology for your enterprise business, the "long way" and the "short way." 

The “Long" Way

The more time intensive method of integrating voice technology into the workplace involves building your own type of software and developing a solution specific to your business. 

Creating a custom solution with an existing voice recognition core must start with the architecture and cloud/client structure. This can be challenging to complete. You should use this "longer" method if you’re looking for your software to be highly customized, running on a private cloud, and securing your data.

The "Short" Way

You can also contact Amazon and add in customized Alexa commands to your existing software. 

Some of the skills (voice commands) are predefined (like “Alexa, what’s the weather like now”). Others could be created based on a client’s needs. 

You can also share your custom skills to all other Alexa devices and users all over the globe. This can help to teach Alexa to shop from your e-commerce business, so any user could ask their Alexa to buy a pair of shoes from your store.

We integrated Alexa with an internal CRM to provide data about employee birthdays and company events. 

Employees were then able to use custom voice commands to operate the system. Alexa app was run on a Raspberry Pi and as a result, we got a full working Minimum Viable Product (MVP) with high recognition accuracy within a short period of time.

The Future of Voice in Enterprise

Google CEO Sundar Pichai announced during his Google I/O keynote that 20% of queries on its mobile app and on Android devices are voice searches. It’s safe to say that voice is playing an integral role in how we search and shop.  

More businesses are adapting to this trend with a Pindrop study finding that 85% of companies plan to invest in voice technology for customer interaction.

Nearly all managers (88%) questioned in the survey felt that voice technology would give them a competitive advantage while 57% believed that it would increase operational efficiencies.

Voice technology continues to grow in power in the workplace. 

The Neural Network “Heart” of Voice-Enabled Technology 

The magic of voice technology is about transferring speech to text. Voice data inputs to a neural network (often convolutional) to get text as an output. It seems like a complex task, but it’s not all that difficult to create your own one.

There are three main aspects required to succeed: length of phrases for recognition, diversity of phrases to be recognized, and recognition accuracy level.

To create a recognition model with high accuracy level, you’ll need plenty of data to train it. The good news is that there are plenty of general datasets and pre-trained models that already exist. 

However, in the case of some specific business phrases which need to be recognized, you’ll have to improve the model with additional data.

Moving to the next step, your network should not only “understand" words but also intonations as well. Integrating these subtle intonation differences can be integral to the success of voice technology. 

Distinguishing intonation can be done with separate networks integrated into one with a tree-like structure. 

As such, there's one data entry point, spreading into different branches to be analyzed and joining the results that could influence each other. Networks are able to differentiate the gist of the same words, but with different intonations.  

By recognizing different and diverse phrases, your voice technology is set up to efficiently answer questions or carry out tasks. 

The Constraints of Voice Technology

Although there are obvious benefits to the adoption of voice-enabled technology, there are some apparent constraints, which may indicate why things are moving at such a slow pace in the enterprise world. 

Security and Privacy

First and foremost is the issue of security. While personal devices in the home are often used for turning on the lights and choosing music, devices and assistants in a business setting will be used for more technical and analytical processes. 

This means that the devices will be storing sensitive data that can be easily accessed by a user through a simple voice command.

While this data can be protected through a password, you can appreciate how insecure it would be to speak this password each time it’s required. Developers are attempting to deal with this issue using voice biometrics, which analyzes what voice belongs to which person. 

User Experience

According to Voicebot Smart Speaker Consumer Adoption Report, almost 50% of smart speaker owners aren’t exploring new voice applications. Despite there being over 80,000 being available on Alexa alone. 

We have a strange situation whereby voice commands are incredibly easy to use and convenient, but people aren’t used to speaking to machines and are unwilling to experiment. As such, fewer people use voice commands and hence dislike having to talk to a device rather than type in the command. 

Invest in Voice Technology

Voice technology is here to stay but it’s difficult for people to break from their habit of using their hands to control a machine as opposed to their voice, particularly in the world of enterprise.

Voice technology is entering our life and work environments, but not quite as quickly as many predicted. We, as humans, are still struggling with the idea of telling a machine what to do but once we cross that hurdle, the rewards will be incredible.

Companies who invest in voice technology, need to consider it as a mid-term investment. It will take time to condition and train the workforce to use voice technology on a daily basis but the ROI is well worth the wait.

Consult with an IoT company for more information on how to integrate voice technology in your workplace. 

owner

Want to Hire a Service Provider?

Get a free shortlist of best-fit companies from a Clutch Analyst

Based on your budget, timeline, and specifications we can help you build a shortlist of companies that perfectly matches your project needs.

Tell us about your project