WebRTC calls without using STUN/TURN or Signaling server

Updated 26 Mar 2022

Ever since I first heard that WebRTC enables web applications to connect to and stream data between browsers without requiring an intermediary, I have dreamt of using it for personal communication without relying on third parties. It didn’t take too long to realise that WebRTC is intimately tied to a few server-side dependencies. This adds complexity to self-hosting the entire stack and is not very appealing, so the plans never went anywhere.

Recently I got some time to play around with WebRTC and think about ways of minimising the server side dependencies for setting up the calls. This yielded a simple html/js page and nginx combo that, if certain constraints are satisfied, allows two users to catchup over WebRTC calls. The constraints are usually satisfied for smartphone users connected over Indian mobile networks¹. This post provides an overview of the need for server side dependencies while using WebRTC, and also the workarounds used to side step them.

Why WebRTC needs server side dependencies

Signaling ²

Even though the WebRTC protocol is built for addressing the problem of peer to peer communication, it is designed with the assumption that the peers have a separate existing means of communicating between themselves. This is a non-negotiable requirement, and the communication channel needs to be present even before initiating the peer to peer connection. This separate channel is used for peers on different networks to be able to find and connect to one another. Details about things needed for setting up audio/video streams such as supported codecs are also communicated using this channel. These communications are carried out by both peers connecting to a pre-agreed intermediary server. This process of exchanging network/codec details is referred to as Signaling.

Exchanging candidates with metadata needed for setting up audio streams

NAT

Before a peer to peer connection is initiated via WebRTC, the peers need to exchange their publicly reachable ip address details, as mentioned in the section on Signaling. In case a peer is on a private network connected to the public internet via NAT ³ they won’t be able to find out their publicly accessible ip address. In this case, the peer needs to rely on a STUN server to find out the correct public ip to share. If a peer is behind a symmetric NAT then the ip address used to connect to STUN server won’t necessarily be re-used when they try to connect to a remote peer. In this case, WebRTC will fall back to using a relay server (TURN server) for connecting the peers if configured, instead of using a peer to peer connection.

Workarounds for these server side dependencies

NAT

The peers need to find and exchange their public ip address details over the Signaling channel as a first step towards initiating a WebRTC connection. The process of finding the public ip address is complicated by the presence of NAT. The simplest way to make sure we avoid NAT is for the peers to connect to the internet via an IPv6 enabled network.

For users who are on an IPv6 network, instead of using a STUN server we can set up a nginx route to return client ip address when called. To make sure that clients use IPv6 while calling nginx I set up a subdomain that is only configured with an AAAA record and thus not accessible from IPv4 only networks⁴. Since we are foregoing the use of STUN servers here, the browser will not be aware of the public ip address of the peer when generating metadata to be exchanged during signaling. So the web app establishing the WebRTC connection has to modify metadata to use the public IPv6 address obtained from nginx before sending it over to remote peer.

This is the primary reason that drove towards considering the problem from point of view of peers connecting via mobile phones. IPv6 adoption is very high among mobile network providers in India and the user base is also quite large. Being on a mobile phone has another crucial benefit explored below.

Signaling

We need a means to exchange signaling information between the peers before a WebRTC connection can be established as mentioned earlier. If both peers are on a mobile phone, they can use sms for exchanging the information.

The problem with using sms as a signaling channel is that the peers need to transfer signaling data even after the initial connection is set up. This would become quite cumbersome. As a work around we make use of WebRTC’s capability to set up data channel’s between peers. Basically we can use sms for exchanging just enough information for setting up a peer to peer data channel and use that channel as the signaling channel going forward.

Demo App

You can try out a simple demo app built along these lines here.

The requirements for it to work as expected are:

Users need to be connected to an IPv6 network.
Users should be able to exchange text messages with peer over a separate channel

These requirements are likely to be met if you access the demo on a smartphone connected to Indian mobile service providers. You can also open the demo on two separate tabs if there is no one around to test things with.

Conclusion

As it turns out, we can use WebRTC enabled communication without having to set up a STUN/TURN or signaling server. As long as both peers are on an IPv6 network. And if they put up with a below par initial connection experience. There is also the drawback that this only considers one-on-one communication. Even considering all this it turned out much better than I expected after the initial explorations, especially since the demo app also works on laptops tethered to mobile connection.

Tested on mobiles connected to Airtel and Jio. Also tested on a tethered laptop running Fedora Linux and on a macos after enabling IPv6 in Network preferences. ↩
MDN seems to be exclusively using the american spelling for signalling, so I too will be sticking with it. ↩
Wiki entry for NAT ↩
AAAA is a DNS record type for adding IPv6 address ↩