Real Internet on an RTOS

Zephyr IoT

Our engineers here at Endian Technologies AB have decades of experience with real-time embedded systems. During the past couple of years we have become experts on Zephyr, a real-time operating system for secure IoT devices. In this article I’d like to highlight a major improvement in its networking drivers and what it means for our future projects.

Say you’re building an IoT device. How does your device get on the Internet? What technology you choose depends on your application. Maybe your application is coupled with a gateway and you can use 6LoWPAN via BLE. Perhaps you even need cabled Internet and you use an Ethernet controller.

These technologies are good, but they are not always right. Sometimes you need cellular or Wi-Fi. Suppose you decide to use a cellular modem with LTE. If your project uses a general purpose OS like Debian then things are pretty easy to get working.

What if your device is so small that it can’t run a general purpose OS? Generally this has meant that your project would be a second-class citizen when it comes to Internet access. You have had to rely on vendor-specific AT commands.

AT vendor commands for networking

To be very concrete, let us say your project is using a Nordic nRF52840 SoC and a cellular NB-IoT modem. Your firmware uses Zephyr and runs on the SoC, which communicates with the modem through a UART. How do you get on the Internet with this combination?

Let’s say you’re pretty green to this cellular modem thing. You would open up the documentation for your modem and find three things within easy reach:

  • Configure an APN (AT+CGDCONT) for the data connection
  • Vendor-specific AT commands for HTTP, FTP, etc
  • Vendor-specific AT commands to open and use TCP and UDP sockets

You do some experiments in minicom and the commands appear to work. With this in mind you get going designing how your device will communicate with the network. You wanted to use CoAP over DTLS but it turns out the modem doesn’t support that. You need data to be encrypted on the network, so you grudgingly decide to use classic HTTP and TLS. Besides, the server guys are already familiar with it, so everyone is happy.

This appears to be a good solution and you start writing the library that will talk with the AT vendor commands for HTTP. After working way too long on this, you finally get the right contact person for your cellular operator and they tell you that you shouldn’t use TCP on NB-IoT. They don’t recommend it at all. Oops!

Your mind is looking for a way out and finds it: Zephyr has an Internet stack and you can adapt it to work with the AT vendor commands for sockets. Luckily for you, you find that Zephyr already has drivers like these, called socket offloading drivers. You decide to use DTLS with a socket offloading driver. It quickly turns out that this combination is not supported since the modem doesn’t support DTLS. But we were going to use Zephyr’s network stack? Turns out the offloading drivers recently went through a change where they hook into the network stack in such a way that Zephyr’s own DTLS/TLS support is not used. Oops!

Real Internet protocols to the rescue

Things are looking pretty desperate for you. You need encryption, but the modem doesn’t support DTLS, which your RTOS does support, but your RTOS doesn’t have a driver that supports DTLS in combination with your modem. TCP is not recommended on NB-IoT, but you’ve meanwhile found out that even if it worked it would mean that your data is sent in clear text on the modem’s UART.

Why are things not this bad on Linux? If your application was using Linux then you wouldn’t have to go anywhere near AT vendor commands. Why not? Because you would use PPP. PPP is like having running water in your house. Networking with AT vendor commands is like bringing water in buckets.

Zephyr 2.0 (September 2019) added support for PPP. Zephyr 2.2 (March 2020) added a GSM modem driver that uses PPP. Zephyr 2.3, which is not yet released, adds GSM 07.10 multiplexing. Much of this work was done by Jukka Rissanen at Intel. It is difficult to convey how significant and important this work has been.

PPP in combination with GSM 07.10 lets Zephyr use cellular modems in just the same way that Linux uses them. The GSM 07.10 protocol provides multiple virtual UARTs over a single physical UART and makes it possible to use AT commands while PPP is up and running. PPP is a full duplex serial protocol with framing and checksumming. It has control protocols (NCPs) for the serial line itself (LCP) and protocols for negotiating Internet addresses and DNS (IPCP and IPV6CP).

Here is a comparison:

  • PPP: full duplex – AT commands: half duplex
  • PPP: framing – AT commands: line-oriented ASCII protocol
  • PPP: checksums – AT commands: no checksums
  • PPP: no waits when transmitting – AT commands: wait for a transmit prompt
  • PPP: DTLS/TLS handled in the RTOS – AT commands: modem handles TLS keys
  • PPP: no application plaintext on the UART – AT commands: plaintext on the UART
  • PPP: supports all Internet protocols – AT commands: support select protocols
  • PPP: standard protocol with RFCs – AT commands: vendor-specific protocol

PPP is better in every way. There is of course some place in the world for networking via AT commands. If you’re using a PIC processor that can’t run Zephyr then they might be your only option. But if you have any chance at all to use Zephyr, then you’re better off with PPP.

I’ve built applications in the past using AT vendor commands, but those days are gone now that Zephyr has PPP support.

AT commands and TCP

Before finishing I would like to point out one particularly bad combination of protocols. Here it is: TCP over AT commands.

TCP/IP has built-in checksums, retransmissions and flow control. TCP over AT commands is just not TCP. It might be used to simulate a remote serial port, but it can’t carry any serious amount of data or communicate reliably with a real Internet server.

Bit errors and dropped bytes on serial lines are a common occurrence. With AT commands, a dropped byte on the UART results in a dropped byte on the TCP connection. No applications are written to work when a byte is lost on a TCP connection. When using a real network stack those types of errors are corrected by the network layer before they reach the application. But with AT commands there is no chance to correct the error. By the time the byte has gone missing, the modem has already sent an acknowledgement to the server and there is no way to correct the error.

TLS over such a lossy TCP connection is just not viable. Any error on the UART results in a fatal error that breaks the TLS connection. There is some theoretical hope for DTLS over UDP over AT commands, which would work because DTLS does its own checksumming and handles lost packets. But TCP over AT? Don’t even bother trying.

Summary

In a contest between how we used to do it (AT commands) with how we’re doing it now (PPP and GSM 07.10), the new way wins every time. Where we’ve been using PPP, the amount of network problems experienced during development have not just diminished by some factor; they have completely disappeared.


If you find this article interesting you can find other posts by Endian here.