The State of MAVLink in 2021
It’s about the end of the year, a perfect time for a ranting post on the state of certain open-source projects.
MAVLink has been the standard protocol for drone communication and control for a long time now. It’s been updated and extended significantly for over a decade, and is currently in use by both the PX4 and Ardupilot open-source autopilot projects, as well as others. I want to make it clear that this post is not an attack or insult on MAVLink at all; it overall does its job well.
That being said, this post wouldn’t be here if there weren’t major issues surrounding the project (at least in my view). This post aims to summarize the current state of the protocol, pain points, and issues that I have run into while using the protocol.
This is a disclaimer that the contents of this article are written entirely off of my own understanding and experiences with MAVLink, and does not come with a guarantee of accuracy. With that out of the way, here is the actual content.
How the MAVLink Protocol Works
In order to understand some of the issues with MAVLink, we should first describe how the MAVLink protocol works.
A MAVLink network is comprised of systems, which represent physical systems such as drones, ground stations etc.. Each device in the system is identified with a component ID, which has standard values to describe the type of the component.
Each device will give itself a system and component ID, and publish heartbeats on the network (as in most protocols) to let other devices know it’s there. This heartbeat contains the type of the device (IMPORTANT FOR LATER!), autopilot info if applicable, and some other information.
Devices can publish messages (e.g. for telemetry). Messages have fixed message IDs, and other devices can listen for a certain message ID to subscribe to the message. Other devices can also read the system and component ID from the message to figure out where the message came from.
Although MAVLink is a publish-subscribe protocol, it is capable of emulating the traditional service-client (1 server, 1 caller) architecture. The most common example of this is the Command Protocol, which is used to execute commands such as arming, takeoff etc.
There is a whole collection of microservices supported by MAVLink, which involve exchanging a series of messages to support services such as writing parameters to a vehicle.
An Addendum on Routing
Routing is perhaps one of the more complicated parts of MAVLink, and given that even the standard protocol libraries don’t seem to implement it properly, it deserves a section here.
All devices on a MAVLink system have a system ID and a component ID. The system ID identifies a complete system, while the component ID identifies a component on the system; so for example, a drone’s autopilot might be (sysid: 1, compid: 1), while that drone’s gimbal might be (sysid: 1, compid: 154).
Realistically the only commonly used compids are 1 (autopilot), 190 (ground control), and rarely 154 (gimbal).
One important issue to note is that one connection != one system. Multiple drones/systems can exist over one UDP/serial/whatever connection. Strangely all of the existing common libraries assume that there is only one drone per connection, something that will be ranted about down below.
Speaking of rants:
The Ranting Section
Now that you have some background on the protocol, let’s talk about what’s wrong with it and the library ecosystem surrounding it.
Just Routing, like, the whole thing
It is very rare that will you find a system that uses two IDs to identify devices instead of 1. That being said, this is excusable because we want to know which system the device is part of, and encoding that into every message is a reliable way of doing so.
That’s a small nitpick, however. The more pressing issue is the none of the current largest MAVLink libraries implement the routing protocol properly. Technically, the raw C header library does, but it literally only sends and receives messages, so we can’t count it. Here’s a rundown:
- Pymavlink, the simple reference Python library, assigns a
master.target_componentbased on the first heartbeat it reads. That being said, you can also manually feed in your own system and component IDs so it technically works, but like the C header library it’s very low level.
- Dronekit still has not managed to implement it in master. The issue seems stale as of last year.
- MAVSDK technically has multi-drone support in C++ but, like DroneKit, it is one-system-one-connection. Its bindings require a (simple) workaround for multi-drone support — set the mavsdk_server port to different ports, and it’ll start separate servers, allowing multi-drone control; but again, it is still one system per connection. Its heavily abstracted design removes the concept of system and component IDs for the user for the most part.
MAVLink also follows the practice of hard-fixing meanings to component IDs — like having gimbals be 154, etc. Technically, this makes it easier to deconflict IDs, but it comes with a number of disadvantages:
- We’re assigning semantic meaning to the IDs, a role that’s already fulfilled by MAV_TYPE and a decision that limits the flexibility.
- Having two sources of component type information causes confusion. Some libraries may check the component ID instead of the MAV_TYPE, which is incorrect.
- Multiple of the same device will have the same default component ID and require deconfliction anyway.
Node deconfliction is a somewhat complex topic, but there are a number of easy ways in which MAVLink could implement it; a crude method is to simply listen for other nodes with the same ID for a period, and reassign.
Basically, the TL;DR of routing is that libraries don’t implement it properly, and it is not super flexible. The protocol itself is acceptable, but library support is meh.
MAVLink is a standard. That means industry standard open-source autopilots should implement the standard consistently, but we’ve seen that they often don’t.
For instance, take the simple MAV_CMD_NAV_TAKEOFF command:
That altitude parameter is completely ignored by PX4, which instead only pays attention to the MIS_TAKEOFF_ALT parameter. Ardupilot implements the altitude parameter.
Where is this documented? Nowhere, except if you dig through the MAVSDK implementation for takeoff.
Ardupilot and PX4 also implement different sets of modes, even though they are functionally the same thing. This is the standard DO_SET_MODE command:
That first parameter is a standard enum of modes:
PX4 and Ardupilot promptly ignore this first field entirely and instead implement custom modes, but ones that do the same thing (Mission, Guided/Offboard, etc.) While custom modes are useful, it would benefit the community to be able to have a standard set of modes for vehicles, instead of completely separate modes that largely have the same features.
MAVLink has a large collection of microservices that implement certain functionality, such as sending mission descriptions, writing parameters, etc.
There are issues with some of the microservice protocols. Most of the issues boil down to three points:
- Scope creep
- Not implemented consistently or properly
The MAVLink Camera Protocol is a protocol for generally exposing camera streams and image servers. It’s…okay, in that it fulfills the job of exposing the stream/image server, but it contains a lot of redundant information. Take, for example, the CAMERA_INFORMATION message:
This doesn’t seem terrible, except all of that information is also encoded in the camera definition file. Then, the VIDEO_STREAM_INFORMATION has similar data:
except resolution, bitrate etc. are all available via the RTSP/RTP/whatever stream itself, usually.
The camera protocol is honestly not that bad of an offender; it is just rather complex — it requires a companion computer to function correctly, and preferably a dedicated radio (MAVLink FTP is very, very slow). There’s a reference implementation, but is archived.
My major gripe with this protocol is that it depends on an external HTTP/FTP server + companion computer for the definition and stream anyway. MAVLink is clearly not the optimal protocol for this application; it would likely be better to standardize an HTTP-based API for requesting camera information, and use MAVLink only to point ground control software at the server.
Gimbal Protocol (v1 and v2)
The Gimbal Protocol was updated with a v2 a while back to address issues with the v1 protocol, mainly its ambiguous message set and performance issues. V2 made major changes, the most important of which is an explicit separation between gimbal device and gimbal manager:
The separation is a good idea; v1 struggled because both the autopilot and ground station would often try to control the gimble directly, which would cause conflicts. The gimbal manager aims to solve this; its primary job is to implement higher-level functionality and deconflict control.
The way it deconflicts control is by assigning a primary and secondary control system+component ID; the implementation of this mixing is not defined, which is a minor issue. The behavior of primary and secondary control is not defined to be consistent across platforms, which could lead to portability issues, although this is solvable.
There are also a large number of manager commands, some of which will not be supported by all systems. While GIMBAL_MANAGER_INFORMATION contains some facilities for reporting capabilities, I think more fine-grained capability reporting (e.g. can track ROI but not WPNEXT) would be useful.
There is also a somewhat confusing gimbal manager to device relationship. The protocol calls for one gimbal manager to one device; messages such as GIMBAL_MANAGER_INFORMATION use a gimbal device ID directly rather than calling it a manager ID, so the wording is confusing.
Additionally, gimbal managers have no component ID. They are instead attached to another device (autopilot, gimbal device etc.) but which device they are attached to is generally unclear.
It would be much clearer if gimbal managers were considered their own devices; it would simplify addressing gimbal managers (no need for a separate gimbal device ID in the control message!) and clear up confusion.
Better yet, remove the gimbal device from the MAVLink network. This seems to me that moving the gimbal device out of the main MAVLink network and having it be isolated through the gimbal manager greatly reduces confusion; the gimbal manager is the representation of the gimbal on the main network. Additionally, it opens up fun possibilities for ideas such as chaining gimbal managers; a lower-level gimbal manager implemented on a physical gimbal that can only control angle can be chained into an internal autopilot or companion gimbal manager, which allows less advanced hardware to take advantage of autopilot sensors and systems.
MAVLink in 2021 is a fairly usable protocol that suffers from some incorrectly implemented libraries and feature creep. The core protocol does a great job of controlling drones; auxiliary protocols such as the gimbal and camera protocol may benefit from moving some functionality outside of MAVLink. Multi-drone control and correct routing is an issue that libraries sorely need to address.
Here’s to hoping that 2022 will bring further improvements to the protocol and the ecosystem surrounding it.