Is voice over internet ready?

Some companies are now offering PBX and contact centre functionality where the voice (or video, or both) is delivered entirely over the internet.

The adherents of this approach maintain that it is only legacy providers that think that this isn’t ideal, and that the new, hip approach where everything goes over the internet is the future.

This includes some well-known companies: Microsoft use it in Office 365 Cloud PBX, as do Fusion, to name just two. And it’s easy to see the advantages – using the internet can be quicker and cheaper than installing private network connections to each customer.

It may well be the future, but is the current technology and infrastructure able to support this approach today?

Today’s voice platforms (telephone systems, unified communications applications or contact centres – whatever you call them) all use networks to deliver the audio and video. The quality of a voice call is dependent in three principal factors: packet loss, latency and jitter. Let me explain…

Loss, latency and jitter

Networks deliver information by taking that data and splitting it into many small parts, sending each fragment (or “packet”) separately and putting them together again at the far end. It’s a bit like taking a book, tearing each page out, and sending them individually in the post, relying on the page numbers to put it back together again at the other end if they arrive out of order.

That works very well for a book that you’re going to read later, because if page 91 goes missing then you just ask the sender to give you another copy of that page. So for emails or web pages, if there’s a little bit missing, you ask for the missing bit again, and wait for it to arrive.

Receiving voice or video, however, is like trying to read the book as it arrives. If page three arrives sometime after page 21, you won’t be able to read the rest of the book until it arrives – you either have to wait, or guess what it might have said and carry on regardless, hoping the plot will eventually make sense. Missing pages are known as “packet loss”, and anything above 2.5% is considered bad. Over 5% means you can’t really understand the other person.

In addition the pages could take too long to get to you – that’s ”latency”. If the latency in one direction is more than about 200 milliseconds, a voice conversation (which is supposed to be two-way) becomes difficult. If you’ve ever been on a conference call where everyone’s talking over each other, you’ve been a victim of latency (or rude colleagues!). When latency is high, the normal pause we subconsciously allow in order to give other people the opportunity to speak isn’t long enough, and participants start getting ratty.

Finally, there’s jitter – that’s whether pages get delivered regularly or not. Going back to our book analogy, it’s as if you had to read each page aloud as it arrived, but sometimes you have to wait for a page to arrive, and sometimes a whole load arrive at the same time.

At the mercy of the network

All of these things either always exist (latency) or can happen at any time (packet loss and jitter) on any network – whether it’s the network inside your office building or the public internet. Companies that make voice and video platforms have found ways to minimise the impact of the problem, but are still at the mercy of the network.

The only real way to fix this problem for sure is to configure the network in such a way as to guarantee what’s become known in the industry as “Quality of Service”, normally abbreviated as QoS.

Quality of service

QoS is a set of policies whereby all the packets (pages of the book) that have audio or video are specially marked as “urgent”, and get sent before anything else. Essentially, everything else gets a second-class stamp, whilst the voice or video gets a first-class stamp.

This solves the problem really well, and can provide absolutely perfect voice quality – unless of course one of the postmen along the way can’t tell the difference between first-class and second-class stamps. If that happens you’re back to packet loss, latency and jitter.

And on the internet, no-one ever looks at the type of stamp so all traffic is equally important – or equally unimportant.

So how do voice/video companies get over these problems? There are quite a few things they do to minimise the impact of the problem. The most common is to use an advanced “codec” (known as an adaptive codec), the format in which the voice is encoded before it’s sent down the wire.

How do you make it work better?

Amongst other techniques, they automatically vary the “bitrate” (to make the pages of the book bigger or smaller based on available bandwidth) and put a copy of the last page of the book that was sent (but at a lower resolution), along with the current page. That way, if you didn’t get a page, you can still read the story and get the gist. That’s what Skype does under the hood to make voice and video work where the network isn’t quite as good as it should be. It’s not perfect, but it does help.

When you’ve got only one person on the end of a network talking to someone somewhere else, it works quite well. It’s not so great, however, when you have ten or so people all trying to talk over the one connection.

The technical challenge that still needs to be overcome is how to deliver voice reliably to an office over the internet.

Here at Maintel we have our own unified communications cloud platform, called ICON. This uses the internet to deliver voice and video calls to remote workers and home workers, and it works really well – particularly as most people can make do with a mobile phone if their internet goes on the blink.

However, unless an office location is quite small we don’t normally use the internet to deliver audio or video to it, delivering them over a private connection from our core network instead.

And there’s a reason for that.