It is no secret that there is a great abundance of market solutions that specialize in facilitating video meetings, conferences, chats, etc. Unfortunately, it is often the case that the user experience for these applications is sub-optimal. Issues such as sudden disconnects, poor audio and video stream quality and performance issues are, unfortunately, common occurrences. However, it can be tricky to nail down the real reason why such poor experiences happen.

The issue could lie in your network’s performance, meaning supported bandwidth, latency, etc. Or the problem may lie with your service provider’s supporting infrastructure. But there is another potential problem with that is sometimes easy to overlook. Creating a direct, peer-to-peer connection between clients requires a complicated set of operations from a network perspective.  Thus, the quality of the peer-to-peer media exchange depends quite heavily on the user’s network setup. This involves configurations and settings of various network devices, firewalls, etc, which may be so restrictive that they lead to suboptimal experiences or prevent peer-to-peer connections altogether. 

The task then is to distinguish between cases when poor quality is the result of application failures, and cases where this is the result of suboptimal network conditions. And, if it’s the latter, much can be gained by empowering application users to troubleshoot their own network conditions and identify the restrictions that are impairing their peer-to-peer connections. 

Here at Star, we have developed and adopted techniques and guidelines that can be effectively used to troubleshoot network conditions with regards to video meeting facilitation. In this two-piece article series, we will review these tips. In the first part, we will explore the intricacies of the peer-to-peer connectivity process. This will allow us to see which connections need to be established and maintained during the media exchange session lifetime. In the second article, we will take a closer look at the diagnostics process and execution in practice. 

WebRTC protocol overview

In order to start talking about the connectivity process, we first need to familiarize ourselves with the key technology user in web-based media exchange. It is called the WebRTC protocol. From the technical perspective, WebRTC is a collection of APIs, protocols and standards. And the bulk of the WebRTC framework is encapsulated within its browser side API. The interface heavily involved in maintaining the connection is called RTCPeerConnection. This is the powerful abstraction which does a lot of behind the scenes magic to facilitate the peer-to-peer interaction, and the part of creating a WebRTC connection lies in creating and properly configuring an instance of this interface. However, when dealing with WebRTC, proper interaction with its Javascript API is only part of the story from the engineering perspective. A much bigger picture must be considered in order to start developing troubleshooting techniques. Specifically, in terms of connecting with supporting infrastructure pieces. 

The first thing to realize here is why all the complexity exists in the first place. The main reason for this is something called Network Address Translation or NAT. NAT is a process in which an internal network address is translated into one or more external addresses. This process allows for providing internet access to local hosts. NAT process usually happens on the firewall or the router device. There are different ways of mapping a source to an external address, which in turn impacts  how peers need to bypass a NAT in order to connect. The most challenging scheme to bypass is a symmetrical NAT – the mapping which assigns a new external IP address and port for every request. As the result of being situated behind NAT, neither of the peers usually are directly accessible for other peers to connect.

In order to overcome the challenge of the peers of a public network being situated behind NATs, WebRTC utilizes the protocol called ICE – Internet Connection Establishment. The essence of the protocol is the following: Prior to establishing the connection, peers will need to acquire information on how other peers can connect with them. This will be accomplished with the help of special external infrastructure entities called STUN and TURN servers. STUN stands for Session Traversal Utilities for NAT. This is a protocol to discover peers’ public address and to determine any restrictions in the router that would prevent a direct connection.

Traversal Using Relays around NAT (TURN) is meant to bypass the Symmetric NAT restriction by opening a connection with a TURN server and relaying all information through that server. In order for connecting peers to gather their network information, they will make requests to STUN and TURN servers, and they, in turn, will generate entities called ICE candidates in a process called, ICE candidate gathering. ICE candidates, in simple terms, are public network addresses that can be used by other peers in order to establish a connection.

During the candidate gathering process, the following candidate types may be collected: 

Host: generated by the client by binding to its locally assigned IP addresses and port.

Server Reflex: generated by sending STUN messages to a STUN/TURN server.

Relay: A query message sent to the TURN server which creates a NAT binding. That binding is used, but the binding will be sent to and from the relay server.

Peer Reflex: generated after the actual candidate gathering process during the connectivity checks. This process may reveal a better way to connect. For example, in spite of the symmetric NAT, there may be a way for media to flow directly between peers.

From here, we need to consider the negotiating process called Signaling. This process occurs after ICE candidates are gathered (depending on the implementation, all candidates, or at least one) and the initiating peer is ready to initiate a WebRTC session. During the signaling process, peers will exchange the information necessary to establish a connection. While the direct peer-to-peer connection is not established yet, the intermediary infrastructure piece, called the Signaling Server, must be involved in this process. WebRTC does not describe the standard for the server implementation. While the negotiation process may reoccur numerous times during the lifetime of the connection, utilizing continuous connection styles in the implementation of signaling server APIs, i.e. websockets, may be a suitable solution.

When we combine the connection establishment within a single flow, it will look something like this:

Here, the Signaling Server will serve as the intermediary between peers to deliver messages called SDPs (session description protocol). SDPs contain all the necessary media and network information for peers to connect to each other. Messages that are generated by the initiating peer are called offers, and responses from another peer are called answers. ICE candidates may be sent as part of these offers and answers, or they may be sent via standalone requests. After ICE candidates are delivered to the answering peer, WebRTC will start performing connection checks to determine the best way to connect. During the lifetime of the WebRTC connection, additional negotiation may be required. For example, if the network conditions changed and new ICE candidates are available.

As we’ve covered the general flow of ICE, it is useful to summarize all the connections made by clients in order to engage into the WebRTC-powered interaction:

  • connect to STUN to gather server reflex candidates (UDP)
  • connect to TURN to gather relay candidates or in order to use it as a relay (UDP or TCP)
  • connect to the signaling server to exchange SDP messages
  • connect to another peers via ways discovered during ICE process

Network constraints for WebRTC connections

From here, it is possible to come up with the list of network constraints that can be applicable as the requirements for the general case of the WebRTC connection. Some of these constraints must be met, and others are advisable, because failure to do so may result in a suboptimal experience, but will not prevent users from interaction. The network constraints are as follows:

  • Accessibility of signaling server (required)
  • Accessibility of STUN and TURN servers (required) – the user does not have to connect to both servers. However, failure to connect to neither STUN nor TURN, will result in WebRTC connection failure. 
  • Allowance of STUN/TURN protocols (required) – these may be disallowed completely for certain very restrictive networks
  • Non-Symmetrical NAT configuration to allow for server reflex candidates to be used (desirable)
  • Accessibility of outbound UDP on all ports (for outbound connections)
  • Accessibility of inbound UDP on all ports (for direct connections to other peers)
  • Accessibility of inbound UDP on ports 49152 – 65535 (to use TURN as the fallback)
  • Accessibility of TCP inbound (in order to use TCP relay as a last resort option)

Stay tuned for the second part of the article where we will discuss how to create efficient network diagnostics techniques.

About the author 

Andriy Shevchenko is a Senior Software Engineer at Star and has 8 years of experience developing custom software solutions for businesses. His main area of expertise is web application development based on JVM languages such as Java and Scala. Andriy has successfully delivered a number of products as a team leader.