Using FreeSwitch SIP server to build a video conferencing software like callab.me— compared with Kurento Media Server

13 min readAug 19, 2020

This article describes our experience using the FreeSwitch (FS) SIP server for a video conferencing SaaS Callab.me, replacing Kurento Media Server for our professional users.

Callab.me uses the FreeSwitch technology to provide HD-conferencing and live streaming for its customers

First of all, the setup and understanding of FS is much more complicated compared to Kurento. Therefore I want to first say some words about the Kurento Media Server (KMS). It is an European open source project from Spain, still maintained and very easy to use. It has several tutorials that allows you to create your simple video conferencing software withing a very short time. See more here: https://doc-kurento.readthedocs.io/en/6.14.0/tutorials/java/tutorial-groupcall.html

Kurento has support for NodeJS and Java in backend and also provides a JS lib for the frontend: https://github.com/Kurento/kurento-utils-js As it is an older library that makes use of outdated and deprecated WebRTC methods, we refactored it and made the results open source. While writing this article, the PR was not merged yet. I recommend using our updated version as it works on all major browsers incl. mobile browsers: https://github.com/Kurento/kurento-utils-js/pull/34

Kurento allows you to passthrough the videos of participants to each other. This allows you to build a simple Video conference, similar to Jitsi or other solutions that passthrough the video stream. It allows you also to use a MUX, in Kurento context it is called a composite.

A Mux combines video streams of many users into one stream, giving each a position in the mux. This reduces the performance necessary on each client and also increases the audio quality. The reason: Imaging you have 10 people in a conference room. Everyone sends its video to everyone else. The result, everyone receives 10 video + audio streams and send 1 audio + video streams, so 22 streams in total. When using a MUX also with 10 participants, you receive 1 audio + 1 video stream and you send 1 audio + video. That makes 4 instead of 22. Good for the client, it does not have to mix the audio streams together into your speakers and don’t have to handle 22 parallel connections. Good for the network as well. On the other hand, your server hosting the media server will have more work to do for generating the mux, combining all video streams + audio streams together. So using a mux makes sense for bigger conferences, but requires CPU power.

The reason we changed to FreeSwitch was the customizability of the video mux. While in Kurento you can’t almost do any manipulation regarding position or colors, in Freeswitch you can design very flexible your mux layouts and change colors, even background images. Another reason is the ability to stream it. While in Kurento it is theoretically possible to stream the mux e.g. to YouTube or Facebook using ffmpeg, FreeSwitch supports streaming via RTMP by default (note: not yet supported is RTMPS). And finally, FS is a SIP server, usually used by telephony providers, what allows you to offer the users a dial-in via phone feature very easily.

A big mark against FS is the documentation. It is much more complex than Kurento, as it is very open and has a bunch of useful modules. Also at some points its not that well maintained. A good way to get started with it, is by learning the basic methology with some books or online courses.

In this article I will give you a quite short introduction into the basic setup and usage for a video conference.

In FreeSwitch we have the following Keywords that are important:

— Directory: This is a list of users allow to login into the FreeSwitch server and register themselves here. Registering in a SIP server is basically what your SIP phone does after you enter the credentials: It tells the server: “I am here, waiting for calls”. After registering, the user can receive calls or make outgoing calls, if permitted.

— Dialplan: This is a complex list of actions, that the server will do, when you call a number. There are plenty of examples in the demo installation, like voicemails, ordering food, valet parking and more, that shows what is possible.

— XML_CURL: This is a module for FS (https://freeswitch.org/confluence/display/FREESWITCH/mod_xml_curl) that allows us to request configuration files, that are usually stored as XML files on your server, from your WebAPI, that could be connected to a database. We will do this for the directory and the dialplan.

· ESL (Event Socket Library): The ESL is an awesome feature that allows us to connect any application (for example a backend Java application) with the freeswitch server and perform some actions. When you installed Freeswitch on your server, it offers you the fs_cli command, to perform actions. It is exactly the same usage of fs_cli as it is for ESL. You can receive all events happening on the server and perform actions. There are client implementations for Java and NodeJS (even for Python, Ruby and more).

Setup the WebAPI

Clone our demo repository (https://github.com/mxk1011/fs_demo) on your webserver and make it available for the FreeSwitch server. Our FreeSwitch will connect to it, whenever it needs the dialplan or the directory. This should not be available to the public!

Lets take a deeper look at both, starting with the dialplan:

The dialplan

<?php
    // User $_GET['Caller-Caller-ID-Number'] is calling a number

    if($_GET['Caller-Caller-ID-Number'] === '1000' || $_GET['Caller-Caller-ID-Number'] === '1001') {
        $conferenceName = 'internal';
    } else {
        $conferenceName = 'external';
    }
?><document type="freeswitch/xml">
    <section name="dialplan" description="RE Dial Plan For FreeSwitch">
        <context name="public">
            <extension name="Media Server">
                <condition field="destination_number" expression="^(999)$">
                    <action application="answer"/>
                    <action application="conference" data="<?php echo $conferenceName; ?>@myprofile"/>
                </condition>
            </extension>
        </context>
    </section>
</document>

We see, that FreeSwitch is giving parameters via URL-query to this php script. Try to get them all and take a look at them. There are many of them, including internal variables, servername, calling number and much much more useful information that you can use for scripting in the WebAPI. In this simple example we just use the caller’s number to find out, in which conference room the user should be added. User 1000 and 1001 are added to the conference with the name “internal” and all other users to “external”. The dialplan used here is very simple:

1. Listen on all calls incoming matching the number with regex ^(999)$ what means exactly 999 is being called.

2. Answer the call immediately

3. Bridge the user to the conference with the name that we set in our PHP script.

The good thing is, that FS requests this file every time someone starts a call. So there is no cache and you can use it as a tool for dynamicly routing users through your SIP application. You can take a look at the example dialplan from FS showing much more possibilities for dialplans: https://github.com/signalwire/freeswitch/tree/master/conf/sbc/dialplan

The directory

As said above, the directory is a list of users allowed to login into the FS context and perform calls/being called. We make usage of XML_CURL, therefore we don’t store the users in a local XML file on the FS server, what is the default usecase when you install FS from scratch. Take a look at a default directory from FS here: https://github.com/signalwire/freeswitch/tree/master/conf/testing/directory/default

In that default directory we have a XML file for each user setting name, password and specific settings for the user. In our example, we will define all of them in our PHP file and output the result as XML.

<?php

    $users = [
        [
            'number' => '1000',
            'password' => 'mypassword',
            'name' => 'User 1000',
            'myVar' => '1234',
        ],
        [
            'number' => '1001',
            'password' => 'mypassword',
            'name' => 'User 1001',
            'myVar' => '1234',
        ],
        [
            'number' => '1002',
            'password' => 'mypassword',
            'name' => 'User 1002',
            'myVar' => '1234',
        ]
    ];

?><document type="freeswitch/xml">
    <section name="directory">
        <domain name="example.com">
            <params>
                <param name="dial-string" value="{presence_id=${dialed_user}@${dialed_domain}}${sofia_contact(${dialed_user}@${dialed_domain})}"/>
            </params>
            <groups>
                <group name="default">
                    <users>
                        <?php
                            foreach($users as $user) {
                        ?>
                        <user id="<?php echo $user['number']; ?>">
                            <params>
                                <param name="password" value="<?php echo $user['password']; ?>"/>
                            </params>
                            <variables>
                                <variable name="accountcode" value="<?php echo $user['number']; ?>"/>
                                <variable name="user_context" value="public"/>
                                <variable name="effective_caller_id_name" value="<?php echo $user['name']; ?>"/>
                                <variable name="effective_caller_id_number" value="<?php echo $user['number']; ?>"/>
                                <variable name="outbound_caller_id_name" value="<?php echo $user['name']; ?>"/>
                                <variable name="outbound_caller_id_number" value="<?php echo $user['number']; ?>"/>
                                <variable name="mySuperVariable" value="<?php echo $user['myVar']; ?>" />
                            </variables>
                        </user>
                        <?php } ?>
                    </users>
                </group>
            </groups>
        </domain>
    </section>
</document>

You can do it as in the example and output all existing users, or you get from the URL parameters, which user is trying to login and just output this one. This will decrease the necessary traffic. Also here, this file is not being cached by FS and is being requested everytime someone tries to login. To prevent DDoS the webAPI I recommend to make use of Fail2Ban at the FS server, which may be added soon to this tutorial, to ban all users with e.g. 5 failed login tries.

Setup the Freeswitch Server

You can of course read the docs and install the FS directly on your server or build it manually from sources (https://github.com/signalwire/freeswitch), but in this tutorial we will build the server with Docker. We prepared a Dockerfile, see here: https://github.com/mxk1011/fs_demo/blob/master/mediaserver/freeswitch/Dockerfile, that you can run easily on any server having Docker installed.

git clone xxx
cd mediaserver
vi .env

Edit the .env file and put the web address of your webapi for MAINAPI_URL

docker-compose build
docker-compose up

This will run the FreeSwitch server and our example Java application. You can alwaysrun

docker-compose exec freeswitch fs_cli

to get into the FS cli application.

The JS Frontend

There are plenty of SIP implementations in JavaScript, we use the JsSIP library: https://github.com/versatica/JsSIP

You can use our mini example or use this from Harold Thetiot: https://github.com/cordova-rtc/cordova-plugin-iosrtc-sample/blob/master/www/js/index-jssip.js — Actually this is very useful when going the next step and developing an iOS or Android app with Apache Cordova. The package iosrtc is the missing link between native WebRTC and rending and hybrid apps with cordova. You can find it here: https://github.com/cordova-rtc/cordova-plugin-iosrtc

Simply enter the hostname of the FS server and the user we defined in the webapi. Then Call 999 and you will be connected to the Video conference. When you open the index2.html and do the same in another browser or browser tab, you will see your own video twice in the conference and see the video mux in action.

The video mux

As said above, it IS possible with FreeSwitch to manipulate the video mux. Here are a few examples of what is possible. To control the mux, you need to use the ESL (see below) or the fs_cli (see above).

Let’s say you openend the fs_cli with docker-compose exec freeswitch fs_cli and enter, when you are connected with the JS frontend into a conference:

conference list

It will output something similar to this:

freeswitch@748c8fabad44> conference list
+OK Conference b5a018c0-db16–11ea-ae02-a7ad2187db66 (1 member rate: 48000 flags: running|answered|enforce_min|dynamic|exit_sound|enter_sound|video_rfc4579|livearray_sync|transcode_video|video_muxing|minimize_video_encoding|manage_inbound_bitrate|json_status|video_required_for_canvas)
102;sofia/internal/529075307@veedeo.me;12505e57-c0c7–4464–8ef5-c5ab10410826;Veedeo.me Web;529075307;hear|speak|video|res-id:1;0;0;200

This gives us information about all active conferences and its members. So we have here a conference with the ID b5a018c0-db16–11ea-ae02-a7ad2187db66 and the user №102 (dialed in with number 529075307) which is currently hearing, speaking, has a video and placed on reservation-id 1.

Lets do now following:

conference b5a018c0-db16–11ea-ae02-a7ad2187db66 kick 102

And bam, you are gone, kicked out of your own conference. Let’s connect again and try some more commands:

conference b5a018c0-db16–11ea-ae02-a7ad2187db66 vmute 102

This will mute your video.

conference b5a018c0-db16–11ea-ae02-a7ad2187db66 vid-banner {font_face=Rubik-Regular.ttf,font_scale=5,bg=#f5f7fa,fg=#514654,min_font_size=8,max_font_size=14}Hello

This will place a banner below your video stream with the text “Hello” by using the font Rubik-Regular (open google font) which is in the folder fonts.

Check out this page to find all available commands: https://freeswitch.org/confluence/display/FREESWITCH/mod_conference

Video Layouts

When you take a look at the configuration file conferences_layouts.xml you will find a lot of predefined layouts that your application can use. For example

<layout name="2x2" auto-3d-position="true">
    <image x="0" y="0" scale="180"/>
    <image x="180" y="0" scale="180"/>
    <image x="0" y="180" scale="180"/>
    <image x="180" y="180" scale="180"/>
</layout>

The logic of this is actually very simple:

A grid per definition has always a width of 360 units (not pixels!) and a height of 360 units. You can give an image a width and height (both at once) with scale or use hscale and scale together to have different height and width. Images may also have following attributes:

reservation_id: A unique ID for that image
zoom: If true, it will always fill the full image without black squares at the sides
overlap: If images are overlapping, let FS know and it will take care that it is not flattering

To change the current layout, just do a

conference xyz vid-layout 2x2

You will find some layouts that have a reservation id, like this one:

<layout name="presenter-overlap-large-top-right">
    <image x="0" y="0" scale="360" floor-only="true"/>
    <image x="180" y="0" scale="180" overlap="true" reservation_id="presenter"/>
</layout>

The reservation ID is a tool to give each image an unique ID and place the user on this special position. To do that, use the command:

conference xyz vid-res-id presenter

Of course the layout that has the reservation_id presenter should be active currently. A good concept to manage the full positioning by yourself is, to give ALL images a reservation_id and manage the placement by your application server connected via ESL. In that case you will always know the x, y, width, height of your users and could place a special UI over the video canvas, as we do it at Callab.me

Recording and streaming

Yes, FreeSwitch allows you also to record your sessions to a file and even stream it to Twitch, YouTube or any other RTMP compatible service. Please not, that RTMPS is not yet supported by FreeSwitch. Let’s see if the community will integrate it soon.

To record the current conference to a file, simply do

conference xyz record /var/videorecordings/mysession.mp4

Or to stream it, do:

conference xyz record rtmp://my-service.com/?id=1234

The Event Socket Library

FS allows you to perform all actions and receive all events as in the fs_cli with a remote socket connection. In our example repository we opened already the port 8021 for internal socket connections. Attention: Always close this port on your server, exposing the ESL port can be very dangerous as attackers could act as admins, especially when you don’t change the default password “ClueCon”.

For this example we will use the Java ESL lib https://github.com/esl-client/esl-client . There are also implementations for NodeJS, Python and other languages.

public class FreeSwitchSocketService {
    private static final Logger log = LoggerFactory.getLogger(FreeSwitchSocketService.class);

    protected Execute executor;

    public void start(String host, int port, String password) {
        final Client inboudClient = new Client();
        try {
            inboudClient.connect(new InetSocketAddress(host, port), password, 10);
        } catch (Exception e) {
            // Handle error
        }

        inboudClient.addEventListener((ctx, event) -> {
            if (executor == null && event.getEventHeaders().containsKey("Unique-ID")) {
                executor = new Execute(ctx, event.getEventHeaders()
                        .get("Unique-ID"));
            }

            try {
                log.trace(event.getEventName());

                switch (event.getEventName()) {
                    case "CUSTOM":
                        if (event.getEventHeaders().containsKey("Action") && event.getEventHeaders().get("Action").equals("conference-create")) {
                            log.debug("Conference " + event.getEventHeaders().get("Conference-Name") + " with profile " + event.getEventHeaders().get("Conference-Profile-Name") + " created");
                            conferenceCreated(event);
                        } else if (event.getEventHeaders().containsKey("Action") && event.getEventHeaders().get("Action").equals("add-member")) {
                            memberAdded(event);
                        } else if (event.getEventHeaders().containsKey("Action") && event.getEventHeaders().get("Action").equals("start-talking")) {
                            log.trace("User " + event.getEventHeaders().get("Caller-Caller-ID-Number") + " start talking");
                            speakingEvent(event, true);
                        } else if (event.getEventHeaders().containsKey("Action") && event.getEventHeaders().get("Action").equals("stop-talking")) {
                            log.trace("User " + event.getEventHeaders().get("Caller-Caller-ID-Number") + " stop talking");
                            speakingEvent(event, false);
                        } else if (event.getEventHeaders().get("Action") != null) {
                            log.debug("Action: " + event.getEventHeaders().get("Action"));
                        }
                        break;
                    case "CHANNEL_HANGUP":
                        log.debug("User {} hangup, cause: {}",
                                event.getEventHeaders().get("variable_sip_from_user"),
                                event.getEventHeaders().get("Hangup-Cause"));
                        memberRemoved(event);
                        break;
                }
            } catch (Exception e) {
                // Handle exception
            }
        });
        inboudClient.setEventSubscriptions(IModEslApi.EventFormat.PLAIN, "all");

        log.info("FS Client connected");
    }

    public void clearAllReservationID(String room) {
        doAPICommand("conference", room + " vid-res-id all clear");
    }

    public void applyLayout(String layout) {
        doAPICommand("conference", "vid-layout " + layout);
    }

    public void removeAllMembers(String room) {
        doAPICommand("conference", room + " hup all");
    }

    public void doAPICommand(String command, String args) {
        if(executor == null) {
            log.warn("FS not ready yet, cant execute command " + command + " " + args);
            return;
        }

        log.debug("Executing " + command + " with args " + args);
        executor.BGApiCommand(command, args);
    }

    protected void speakingEvent(EslEvent event, boolean speaking) {
        log.debug("User {} {} speaking", event.getEventHeaders().get("Caller-Caller-ID-Number"), speaking ? "started" : "stopped");
    }

    protected void conferenceCreated(EslEvent event) {
        // Do something
    }

    protected void memberAdded(EslEvent event) {
        // Do something
    }

    protected void memberRemoved(EslEvent event) {
        // Do something
    }
}

When executing your Java application also in docker (by defining it in docker-compose.yml) you can then easily connect to your FS with:

FreeSwitchSocketService socketService = new FreeSwitchSocketService();
socketService.start("freeswitch", "8021", "ClueCon");

and perform actions as described above on your FS server from your Java application. You can develop in Java (or other langs) a websockets service, that is connected to your user frontend, that manages chat, joining, admin permissions and all that basic stuff for you.

Dial In via Telephone

Yes, this is a SIP server and there you can join the conference by dialin a number on your telephone or mobile phone. You will need a SIP trunk provider and buy a telephone number though. When doing this, you will get your own telephone number (can be anywhere on the world) + credentials for the SIP trunk.

You can put them here: https://github.com/mxk1011/fs_demo/blob/master/mediaserver/freeswitch/config/sip_profiles/external/mysip.xml and FS will automatically register itself on the SIP server of your provider. Basically it does the same as the JS client we developed, it registeres on the server and is waiting for incomming calls. We need another dialplan for that, so you can use that one:

<extension name="public_did">
    <condition field="destination_number" expression="^(496131123456789)$">
        <action application="answer" />
        <action application="set" data="dtmf_type=rfc2833"/>
        <action application="start_dtmf" />
        <action application="set" data="sound_prefix=$${sounds_dir}/veedeo/en" />
        <action application="lua" data="veedeo_dialin.lua"/>
    </condition>
</extension>

This will answer the call, enable the session to receive DTMF commands (this is basically the numbers you can press on your dialpad while you are in a call) and then execute veedeo_dialin.lua — which is a script written in the language lua. You can find it here: https://github.com/mxk1011/fs_demo/blob/master/mediaserver/freeswitch/scripts/veedeo_dialin.lua — this script will ask the calling person for his conference ID and his PIN. Then it will do a CURL request to the file check_pin.php, which we have on our webapi. Please change the URL in the lua file to your webapi. As this check in our webapi is very simple:

<?php

if($_GET['pin'] === '1234' && $_GET['id'] === '00000000') {
    return 'my-conf-name';
}

we can join the conference “my-conf-name” by entering first 0000 and then 1234.

Summary

We created a FreeSwitch Server, checked how we can write our application server that could use Websockets and how we can introduce a WebAPI. Of course the application server and the webapi can be the same application connected to the same database, but this should explain, how you can seperate the responsibilities with the use of FreeSwitch. I hope, this tutorial helped you, thanks for reading btw, and that it will help to increase the FreeSwitch Developer and Open Source community by giving a high-level overview about the possibilites and the stack.

If you have any questions, feel free to contact me. A special thanks to Harold Thetiot for recommending FreeSwitch to us and consulting in our questions, providing useful sources like the Docker configuration, examples etc. as well, and for his courage with the iosRTC open source project.