Aqara E1: Writing new firmware

This is the fourth part of a five-part post:

Aqara E1 Switch Firmware Project

Sanity check - Blinky

For an initial sanity check, I modified the NXP SDK’s blinky example for the 4 LEDs of the Aqara QBKG41LM (DIO lines identified here) to make sure that it would accept and run new firmware.

const uint8_t NUM_LED = 4;
const uint32_t LED_LEFT_RED_PIN = 14;
const uint32_t LED_LEFT_BLUE_PIN = 13;
const uint32_t LED_RIGHT_RED_PIN = 7;
const uint32_t LED_RIGHT_BLUE_PIN = 6;

uint32_t pins[] = {
    LED_LEFT_RED_PIN,
    LED_RIGHT_RED_PIN,
    LED_LEFT_BLUE_PIN,
    LED_RIGHT_BLUE_PIN,
};

for (int i = 0; i < NUM_LED; i++) {
    LedInit(pins[i], 0);
    LedOff(pins[i]);
}

while (1)
{
    for (int i = 0; i < NUM_LED; i++) {
        for (int j = 0; j < (i+1); j++) {
            LedOn(pins[i]);
            SysTick_DelayTicks(150U);
            LedOff(pins[i]);
            SysTick_DelayTicks(150U);
        }
        SysTick_DelayTicks(500U);
     }
     SysTick_DelayTicks(1000U);
}

Success!!!

NXP Zigbee SDK - first impressions

Tolerable…in the context of MCU vendor provided SDKs

The SDK is tolerable, but contains a lot of what one might term “gotchas”. Problems you may not expect and that you can only learn through developing the samples, discovering them, and then determining their cause by inspecting the source. The initial period was painful, but once I became familiar with how things were structured, progressing became easier. More so than with the TI ZStack SDK.

On the plus side, everything is single threaded and runs in the same context, which simplifies debugging. On the other hand, a good chuck of the code is precompiled (including on-chip ROM) and without source. This latter issue is problematic when misconfiguration means that incoming messages never make it into the application or the accessible parts of the SDK, and there is no way to debug what is happening, or even determine that a message was received.

Not a tutorial

This article in not intended to provide an deep discussion of the SDK or development process. I wrote a separate article providing an introduction to development with the NXP Zigbee SDK here:

> Getting started with the NXP Zigbee SDK

Instead I’ll just highlight some challenging, novel, or particularly annoying aspects.

Where to start for the main firmware?

With a sample. The SDK includes the following two that seemed suitable:

JN-AN-1243
Base Device Behavior implementation for coordinator / router / end device.

JN-AN-1257
Window covering (i.e. Zigbee closure). This example shows step-by-step how to take JN-AN-1243 and add the functionality for a window covering endpoint.

I decided to base the application on JN-AN-1243, and add functionality from there, using JN-AN-1257 as a reference to understand the changes needed as endpoints and clusters are added.

These sample applications are available in the SDK package, which can be downloaded from the NXP site.

Implementing features

Momentary and toggle modes

The majority of switching requires a toggle action, where the target device is switched on and off alternately with each press. For some applications though, a momentary action can be desirable, such as holding down a button to raise a blind.

I implemented independent toggle/momentary settings for each tap combination. So 1-tap could toggle the local load, but 2-taps could open a blind while held, and 3-taps could close a bind while held for example.

Toggle switch behavior

Momentary switch behavior

Multi-tap and annoying delays

One annoyance with some devices that implement multi-tap is the delay between tapping the button and having the light turn on/command sent etc. The reason for this is clear - the device can’t know if you intend to continue tapping the button and so must wait several hundred milliseconds to see if you do.

You may have a two-button switch and only intend to use single-tap on the left button and single/double-tap on the right. In this scenario, it makes no sense for the device not to respond immediately to the first tap on the left button. Similarly the device shouldn’t wait for a third tap on the right button.

To deal with this, I implemented independent max_taps settings for each button. I also incorporated configurable max_time_between_taps, since some people are happy to hammer the button, while others may prefer a more leisurely tap sequence.

Device menus complicate the situation

I didn’t want to only rely on Zigbee2MQTT (i.e. writing ZCL attributes over the network) for configuration. For some settings (e.g max_time_between_taps), this is unavoidable, but for others, like max_taps, I wanted the ability to configure using the device buttons. Similarly, putting the device into the various possible binding modes should be possible locally.

To accomplish this, I created a menu system, whereby tapping the button n-times would enter the menu, which could then be navigated using taps and long presses. There are two menus: binding and device configuration, which require 4 and 5 consecutive taps to enter respectively. Additionally, there are two reset functions, which require 6 or 7-taps, with the final one being held.

This complicates the ability to remove the annoying delays mentioned above. The button may only ever by used to toggle the local load (i.e. max_taps = 1) and so it responds immediately on button down of the first tap, switching the load, but the device still needs to keep waiting to see if this is a 4/5/6/7-tap menu or reset combination.

This isn’t a problem in itself, but what should the device do when a second tap comes 300ms later? Is this the second tap in a 5-tap sequence to enter the configuration menu? Or is this the user wanting to turn off the light that they just turned on by mistake? We have two choices here:

Toggle the output on every tap, which means switching the light on/off five times in quick succession even though the user is just trying to enter the menu.
Toggle the output on the first tap only - i.e. wait max_time_between_taps before allowing a subsequent tap to switch the load

The downside of this second approach is that after turning the light on/off, you need to wait max_time_between_taps before you can toggle it again. The downside of the first option is obvious.

I initially chose the second option, but was frustrated by exactly the problem described. I found myself sometimes tapping the wrong button on a 2/3-button switch, and getting annoyed when an immediate subseqent tap did nothing. It felt sluggish. I then modified the code to use the first option, but only in situations where max_taps == 1. It feels like the correct approach given the infrequent use of menus/reset once everyhing is setup.

Squeezing multi-tap functionality into standard Zigbee binding

Two core features I wanted to include were:

Multi-tap.
The ability to do direct binding.

By default, the ZCL is not well setup to handle this combination. The on/off cluster, which is what is used by (non-dimming) lights and switches, has no concept of multi-tap. The only primary commands are on, off, and toggle.

Many devices that incorporate multi-tap update a cluster attribute with the latest event, report these changes to Zigbee2MQTT or HA, and rely on that software to trigger actions.

This is fine for setups with an always on hub like HA, but I prefer to bind things directly wherever possible.

The binding process

When devices are bound, two specific clusters are connected: one on a specific endpoint of each device. The clusters must be identical. So a light bulb that supports the On/off cluster as a server on one of its endpoints can only be bound with a device that has the On/off cluster as a client on one of its endpoints.

To differentiate between single, double, and triple taps, we need to use three different endpoints. To external devices, it will appear that there are three separate switches for each physical switch.

A lot of application endpoints are needed

EP		Main clusters	Other clusters
	Application
1/2/3	Relay 1/2/3	On/off (server)	Basic/Identity/Groups/Scenes
10/11/12	Button 1/2/3 (single-tap)	On/off (client) Window Covering(client)	Basic/Identity/Groups/Scenes On/off switch config.* Multistate input
20/21/22	Button 1/2/3 (double-tap)	On/off (client)	Basic/Identity/Groups/Scenes
30/31/32	Button 1 (triple-tap)	On/off (client)	Basic/Identity/Groups/Scenes
100	OTA / Config	OTA Update (client) Custom config.
	Base
242	Green power
0	ZDO

* Custom attributes

More flexible remote control through scripts

Binding is useful for simple switch -> light bulb type setups, but more flexibility is needed to control unrelated devices and also in determining specific ZCL commands. This is where software like Zigbee2MQTT and HA typically come in.

I wanted to take the firmware a bit further in that direction, and settled on allowing groups of preset ZCL commands to be sent on each switch/tap combination.

Setup is via Zigbee2MQTT, where for each single, double, or triple tap, one can enter a set of semi-colon delimited commands to be sent when that combination is tapped.

Calling them scripts is a bit of a stretch. There’s no program flow. They are just sequential commands. They are stored as string attributes in the on/off switch configuration cluster in each button endpoint.

I implemented the commands I most often used, and also a generic command to send any command/payload combination.

The full command list and syntax can be found in the firmware instructions (coming soon).

Fragmented messages

I had initially set the maximum length at 128-bytes, but this necessitated the use of fragmented messages to read/write them. Ths is because the maximum packet size on a Zigbee network is 127 bytes - a restriction of the 802.15.4 physical layer. 127-bytes is the maximum packet size, not payload size. By the time you add in the various headers for addressing, security, ZCL etc., you are left with about 65-70 bytes for the actual payload.

In principle fragmenting shouldn’t be a problem. The NXP stack can be configured to do it, but it proved relatively unreliable when I tried, and the higher level functions of the stack (report attribute etc.) are not setup for it and would need to be modified.

I’ll likely make the required changes and increase the limit at some point, but 64-bytes is sufficient for my current needs.

Reboot, factory reset, and emergency reset

Network reset versus factory reset

In the sample applications, and on other devices I’ve seen, leaving the network (either self initiated or via a command from the coordinator) typically resets all persistent settings.

With so many configurable parameters, including script commands, this didn’t seem like the right approach here. The Zigbee spec only requires the Zigbee-related persistent data to be cleared, although it does sort of imply that the configuration should also be reset:

…and perform a reset such that the node is in much the same state as it was when it left the factory.

Zigbee Base Device Behavior Specification

There’s a lot of room in “much the same”.

I decided to keep the local configuration on a network leave/reset and only clear it during a locally initiated factory reset.

The local configuration is the group of settings configurable in the Zigbee2MQTT exposes tab, or via the device configuration menu (max taps, button timing, LED color etc.). All network settings such as the routing table, binding table etc. are cleared in all scenarios.

Factory reset

Factory reset is executed by tapping the left-most (or only) button 7 times, but keeping it held on the seventh tap. The device then enters a reset countdown, with the device LEDs all flashing red. The frequency of flashing increases over 10 seconds until it is solid and the device performs a factory reset. Releasing the button at any time prior to the reset aborts it. All persistent settings are removed and the configuration is restored to its default state.

Backup emergency factory reset method

I was concerned that an unforseen bug could result in some specific state preventing the device resetting via the 7-tap sequence described above. In such a scenario, it may be impossible to recover without ISP access. To counter this possibility, I implemented a reset-at-power-up-via-button-hold function. This is a full factory reset identical to the 7-tap factory reset.

The code for this doesn’t rely on the button state machine, interrupts etc., and runs very early in the bootup process, with a simple GPIO check to see if the button is held and the hold timer using a busy spin rather than the systick interrupt.

Simple reboot

The device only searches for a new network to join for 30-seconds on startup. In situations where you wish to restart the search, but not factory reset the device (i.e. wish to keep prior settings), this reboot is available. The process is the same as the 7-tap factory reset, except the button is held on the 6th tap, not the 7th, and the LEDs flash blue.

Device configuration through ZCL attributes

The relays do not require any configuration*. They are on/off server endpoints, and are always available to be controlled remotely via binding, scenes, normal on/off commands from the coordinator or other devices, etc.

The buttons on the other hand must be configurable to achieve maximum flexibility, such as selecting the toggle/momentary mode discussed above or choosing which relay to switch when the button is tapped. There are also various configuration options that are not button specific, such as LED output, OTA settings and the like.

I wanted to provide some configuration capability via the device itself (menus accessed via button press combinations), but Zigbee2MQTT was intended to be the main UI for adjusting settings. The settings therefore needed to be ZCL attributes.

* Aside from the configurable off timer.

Extending the on/off switch config cluster

Since there is already a standard ZCL On/off switch config cluster (OOSC - 0x0006) , I extended it with the following additional custom attributes:

OOSC cluster custom attributes

Attribute	ID	type	def.	min	max
MAX_TAPS	`0xf000`	uint8	1	1	3	taps
TAP_1_SWITCHES_RELAY	`0xf001`	enum8	1	0	#channels + 1	0 = none 1 = left … #channels + 1 = all
TAP_2_SWITCHES_RELAY	`0xf002`	enum8	0	0	#channels + 1	0 = none 1 = left … #channels + 1 = all
TAP_3_SWITCHES_RELAY	`0xf003`	enum8	0	0	#channels + 1	0 = none 1 = left … #channels + 1 = all
TAP_1_IS_MOMENTARY	`0xf004`	bool	0	0	1	0 = toggle 1 = momentary
TAP_2_IS_MOMENTARY	`0xf005`	bool	0	0	1	0 = toggle 1 = momentary
TAP_3_IS_MOMENTARY	`0xf006`	bool	0	0	1	0 = toggle 1 = momentary
TURN_OFF_AFTER_N_SEC	`0xf007`	uint16	0	0	50000	seconds (0 = disabled)
TAP_1_SCRIPT	`0xf008`	cstring	""			0..64-char string
TAP_2_SCRIPT	`0xf009`	cstring	""			0..64-char string
TAP_3_SCRIPT	`0xf00a`	cstring	""			0..64-char string

A new device config cluster

For switch-wide settings, I created a new custom device configuration cluster:

Custom configuration cluster (0xfc01) attributes

Attribute	ID	type	def.	min	max
MAX_TIME_BETWEEN_TAPS_MS	`0xf000`	uint16	500	100	2000	milliseconds
MIN_TIME_FOR_HOLD_MS	`0xf001`	uint16	1100	50	2000	milliseconds
ONOFF_LED_TYPE	`0xf002`	uint8	1	0	2	0 = disabled 1 = on when on 2 = on when off
ONOFF_LED_COLOR	`0xf003`	uint8	1	0	1	0 = red 1 = blue
OTA_BLOCK_DELAY_MS	`0xf004`	uint16	250	50	10000	milliseconds
SHOW_ERR_LED_ON_NO_COORD	`0xf005`	bool	0	0	1	binary

And a final status cluster

Finally, I added an additional custom status cluster to output status information:

Custom status cluster (0xfc02) attributes

Attribute	ID	type
UPTIME	`0xf000`	uint32	days
STATUS	`0xf001`	cstring	last error description

Full details of the application endpoints, clusters, etc. can be found in the technical details readme.

Zigbee2MQTT

Button actions

All button actions are sent to the coordinator. They are reported as value changes of the PresentValue attribute (0x0055) of the MultistateInput cluster (0x0012) on the single-tap endpoint for that button.

Coordinator software such as Zigbee2MQTT will report these actions via MQTT allowing for external software to react as required.

The switch events are:

0 tap_1_tap
1 tap_2_tap
2 tap_3_tap
3 tap_1_hold
4 tap_2_hold
5 tap_3_hold
6 tap_1_release
7 tap_2_release
8 tap_3_release

Note that for a specific number of taps, either tap_x_tap or tap_x_hold/tap_x_release will be reported, depending on the switch configuration for that number of taps.

External converter

Zigbee2MQTT features a very flexible extension system, allowing for non-standard ZCL clusters, commands, and attributes to be exposed via “converters”.

I ran into a couple of issues that needed patching to getting things working fully, but nothing major.

OTA updates

Although it was likely going to be necessary to upload the initial firmware via ISP, ongoing updates should use the OTA mechanism. Getting OTA update working took longer than expected. I wrote a separate article here:

> Zigbee2MQTT, the NXP Zigbee stack, and OTA updates

Possible future additions

Notifications

The idea of using light switch LEDs indicators as visual outputs for the broader smart home seems interesting. I can imagine scenarios like flashing all switches red during some emergency condition such as a water overflow sensor triggering, or overheating in a switch, or flashing some sequence when rain is expected etc.

The ZCL is not really setup for this. There’s the digital output cluster, and the usual on/off/color control etc., but they’re not really designed for visual status output. The alarms cluster is probably more suitable. Local configuration could set alarm 1 = flash red slowly, alarm 2 = flash red then blue etc.

Longer (fragmented) scripts

As described above, “scripts” are currently limited to 64-bytes. This may be increased in future using fragmented messages.

Duk.io

Aqara E1: Writing new firmware

Apr 15, 2025
last updated: Jun 13, 2025

This is the fourth part of a five-part post:

Aqara E1 Switch Firmware Project

Sanity check - Blinky

NXP Zigbee SDK - first impressions

Tolerable…in the context of MCU vendor provided SDKs

Not a tutorial

Where to start for the main firmware?

Implementing features

Momentary and toggle modes

Multi-tap and annoying delays

Device menus complicate the situation

Squeezing multi-tap functionality into standard Zigbee binding

The binding process

A lot of application endpoints are needed

More flexible remote control through scripts

Fragmented messages

Reboot, factory reset, and emergency reset

Network reset versus factory reset

Factory reset

Backup emergency factory reset method

Simple reboot

Device configuration through ZCL attributes

Extending the on/off switch config cluster

A new device config cluster

And a final status cluster

Zigbee2MQTT

Button actions

External converter

OTA updates

Possible future additions

Notifications

Longer (fragmented) scripts

Aqara E1: Writing new firmware

Apr 15, 2025 last updated: Jun 13, 2025

This is the fourth part of a five-part post:

Aqara E1 Switch Firmware Project

Sanity check - Blinky

NXP Zigbee SDK - first impressions

Tolerable…in the context of MCU vendor provided SDKs

Not a tutorial

Where to start for the main firmware?

Implementing features

Momentary and toggle modes

Multi-tap and annoying delays

Device menus complicate the situation

Squeezing multi-tap functionality into standard Zigbee binding

The binding process

A lot of application endpoints are needed

More flexible remote control through scripts

Fragmented messages

Reboot, factory reset, and emergency reset

Network reset versus factory reset

Factory reset

Backup emergency factory reset method

Simple reboot

Device configuration through ZCL attributes

Extending the on/off switch config cluster

A new device config cluster

And a final status cluster

Zigbee2MQTT

Button actions

External converter

OTA updates

Possible future additions

Notifications

Longer (fragmented) scripts

Apr 15, 2025
last updated: Jun 13, 2025