Welcome to the roadmap for BigBlueButton. The post is dedicated to helping people grow the market for online learning by having more effective conversations online. More importantly, access to quality education is an alarming issue, as outlined by the United Nations.
To begin with, you're going to be learning all the core features you would expect in any commercial web conferencing system (but under an open-source license.)
Fred Dixon is our keynote speaker. The CEO of Blindside Networks, the company that started the Bigbluebutton project to create the world's best web conferencing system for online learning.
Here are some highlights from the episode, an excerpt from a live YouTube stream that you can watch in its entirety here.
Let's Get Started With The Three Current BigBlueButton Architecture.
Current Architecture 1: RTP Topology
BigBlueButton uses a mixed RTP because it has two different media servers in action, namely Kurento and FreeSwitch. Kurento serves as the Selective Forwarding Unit (SFU) for Webcam, Screen sharing, and Listen to the only function, while FreeSwitch serves as the Multipoint Control Unit (MCU) for Microphone.
Current Architecture 2: Media Signaling
Media signaling is how the Session Description Protocols (SDPs) are transported back and forth between the client and the server. In BigBlueButton, there are two different media signaling pathways- FreeSwitch and BBB-webrtc-sfu.
Current Architecture 3: HTML5 client for WebRTC
- Each media type- webcam, screen sharing, and audio- has a decoupled implementation form.
- SIP.js and kurento-utilis.js are used as WebRTC wrappers.
CHALLENGES WITH THE ARCHITECTURE THAT WILL BE IMPROVED by
Paulo Lanzarin, software developer at MCONF gives challenges that come with Bigbluebutton architecture and are categorized into five.
1. User experience
Challenges include the Microphone versus "listen-only" separation, which many clients complain about, the negotiation time - how many seconds it takes you to join Microphone, conference, or web cameras. Also, although the media which goes through Kurento is usually fast enough in ideal scenarios when Kurento is not overloaded, FreeSwitch is a Bit more complicated for some reasons.
2. Client peer structure
- It currently allows for one peer connection per media stream, meaning that the whole ICE lifecycle is run N times for a session. (N is the number of media streams in a session lifetime).
- It is also inefficient in terms of network, bandsockets, memory, a d CPU- both on the client and server-side.
- Having two different client-facing media servers makes things harder.
- Also, undocumented client and server media components make entry-level to the code harder.
2. Kurento challenge.
The challenges with Kurento include:
- Stability (WebSocket stack locks, crashes, and memory fragmentation
- Scalability is vertical and horizontal, which is not ideal. Vertical scalability is ideal.
- Negotiation efficiency - the "30+ cameras at once" problem.
- Media processing efficiency
- Feature set (WebRTC), which is lagging behind.
- RMB only, no Simulcast.
3. FreeSwitch challenge
- Scalability is vertical, which is good. However, it is still an MCU, so it is very CPU intensive. This is what is also responsible for the "listen-only" feature.
- Negotiation times are also a problem.
- Profile stickiness.
- Audio quality.
- Feature set is also lagging behind, just like Kurento
STEPS TO MITIGATE THE CHALLENGES
1. Finding an alternative to Kurento
Due to the limitations posed by Kurento, work is ongoing to solve the issue, and two options have been identified. One is to keep trying to improve Kurento (a process called prototyping), and the other is to look for something else in the open-source ecosystem.
In the process of transitioning, however, it is important to maintain the basic features or principles such as feature parity, gradual rollout, and avoid burning bridges.
Available alternatives to Kurento are the following:
- Janus (using Video Room),
- MediaSoup (with no built-in recording),
- Jitsi-video bridge (protocol translation, flexibility),
- Pion (dabbled with webcams)
- OWT (which is Intel's Kurento).
It’s worth noting that there are various Load tests executed as the prototypes are being developed. Current load tests are checking if 1450 webcam streams at 200kps, dynamic camera profile and media servers. However, Paulo Lanzarin Paulo focuses on media servers
LOAD TEST METRICS FOR MEDIA SERVERS
- Idle CPU.
The higher, the better because it frees up more CPU. Hence the media service uses less CPU. Kurento is the least efficient in terms of idle CPU with 31% idle CPU, and Janus follows with 52% idle CPU. MediaSoup is generally effective, with MediaSoup (8W) having 72% idle CPU and MediaSoup (1W) being the most efficient with 82% idle CPU.
- Used Memory.
The lower, the better. Janus has the most efficient used memory of all. MediaSoup and Kurento have similar memory usage, but while MediaSoup releases up the memory, Kurento only releases a small fraction of it.
- Context Switches.
The lower, the better. Because of the problem of negotiation time with Kurento and being a structural problem, context switching might be a problem too. Compared to other media servers, Kurento uses higher context switches, making it less efficient.
The lower, the better. Interrupts is a metric to measure the negotiation time and the context switching problems Kurento has.
2. Client restructuring
Here are steps that need to be taken in client restructuring.
- Make client media bridges configurable and extensible. This means that you can choose if you are using Kurento to switch or Janus or other media servers.
- Reduce the number of concurrent media signaling Websockets.
- Replace kurento-utils.js
- Address the client's peer structure problem, which is largely dependent on the chosen media server.
3. Improving the client's feature set
This can be achieved in the following ways.
- Screen sharing mirror effect, which has been a long-standing UX problem.
- Prioritize floor's camera quality.
- Webcam background/ blur effect.
- Simulcast (after a media server is chosen).
4. Single client-facing media server
This can be done through an SFU. This would imply simpler maintenance overheard, simpler architecture, better quality, and better negotiation times.
5. Work towards horizontal scaling
This will allow for media servers to be scaled horizontally in separate servers.
6. FreeSwitch for dialing
FreeSwitch remains as part of the features, just that it will no longer face the client.
IMPROVING PRIVACY IN VIDEO CONFERENCING WITH BACKGROUND BLUR
- No support for background blur/ virtual backgrounds
- Other platforms support it, namely Jitsi, Zoom, Skype, and Microsoft Teams.
- There is demand for it.
- Where to begin and why is needed.
How to tackle these challenges
Solving these problems is not expected to come with many difficulties since other video conferencing mediums have the solutions in place already. Hence, it is not necessary to re-invent anything or start all over again.
Therefore, all that needs to be done is to look at the working systems in other open-source projects like Jitsi or Volcomix. Also, the technologies needed should be identified. TensorFlow, MediaPipe, and Bodypix are down technologies worthy of consideration.
How to integrate the solutions into BigBlueButton
- HTML5 Client
- TFLite Models and WebAssembly
Chart of the Flow
The chart above can be divided into two parts.
- The Create Virtual Background Service
- Start effect
However, these are not enough to show the manipulated output to the client.
Showing the result to the user
- Replace the user camera stream in the video preview.
- Save the "virtual background" state.
- Call the state using video service.
- Instantiate WebRTC Peer Connection
- Replace the camera stream during instantiation.
- Use the actions menu or the video preview component to toggle the effect.
- The feature works on multiple platforms
- The PR can be found on GitHub.
This post is a general review of how BBB works and how to improve it.
You also can learn more on this video initially from BigBlueButton