Small Things Can Make a Huge Difference

Firmware Stability

If you read our blog update about our time at Global EMC doing our EMC and safety testing, you may recall we’d noticed something strange: our new firmware seemed a little unstable (it stopped “broadcasting”). We’d thought it was a software bug and have spent a ungodly amount of effort trying to trace what’s going on, which has been exacerbated by the holiday and flu season.

To help track down the issue, Lorenzo had written some neat code that implements a watchdog timer (a way of ‘rebooting’ our devices when they become unresponsive) and this also sets the LED to be constantly be lit red if this happens as a quick visual indicator.

Something I noticed last weekend was that I could consistently make this LED go red by rapidly changing the absent temperature the device was exposed to (it was about -15c here at the time, so it was easy as I wander around with a Wimoto Climate in my backpack all the time, and going from outside to inside was enough to trigger the issue).

We had been working with our silicon partner, Nordic Semiconductor, since before Christmas trying to figure out why the devices would randomly, spontaneously reset — and at least now we had a working theory as to what was causing the reset (rapid temperature changes). We setup some A-B test environments in the office (using our office fridge’s freezer and a ceramic heater), and we were able to prove consistently that fast temperature variations would trigger the watchdog.  More interestingly, older versions of our firmware and even our development kits did not exhibit the issue — so the plot thickened.

Armed with the above information, we’ve been working with Nordic’s great support team (first their tech support in Norway, and now one of their Application Engineers in California) on a fix. It turns out that a constellation of factors (silicon revision, SDK version and Bluetooth stack version) lead to the low-freqency clock becoming unstable. Because the low-frequency clock is onboard the chip and is an RC (Resistor-Capacitor) oscillator, it’s temperature sensitive. Under normal circumstances, the main system clock (HFCLK) periodically wakes up and  “recalibrates” the low-frequency clock. In newer versions of the SDK it does this based on temperature changes at the silicon die level — it takes 35-microseconds to measure the temperature of the die and 17-milliseconds to recalibrate the clock, so the former is more power friendly.

Nordic’s initial advice was to use a later revision of the silicon, which isn’t really an option when you have thousands of Wimoto’s already made, and thousands more chips sitting at your contract manufacturer. Although we’ll be using a later silicon revision in later builds. Another option would be to use an external low-frequency clock, and we’ve already designed that in to later hardware revisions — although the on-chip one is expected to work too.

We were going to start shipping last week but have suspended shipping anything until we get 100% clarity on this firmware issue. Why is it a big issue, even if it happens intermittently or randomly? Well, it upsets the data-logging capability of our devices and also currently clears any set alarms, although we can work around the alarm issue by persisting them to local Flash memory if we have to. Whilst the data-logger is not necessarily a feature everyone will use, it is an important feature and one we promised we’d deliver. It also uses the realtime clock, although we haven’t implemented the “this alarm happened at this date/time” feature in the apps yet.

We have a number of options as things stand today:

(1) Find a software workaround in the current firmware. This is our preferred option, of course, and we’re somewhat hopeful this can be done (with no major downsides). We’re a little bit at the mercy of Nordic in this respect as it’s their silicon and they have the expertise to fix this. However, our Application Engineer is working with us today to try and find a solution. I’m keeping my fingers crossed that either today or Monday, we’ll have a fix, and now that we know the nature of the problem, it’ll be very quick to test and then burn this firmware to boards and get some out of the door! This is “Plan A”.

(2) Ship our old firmware. This still works with our apps, but we lose what we call “concurrent broadcast and peripheral mode”. In simple terms, this means that we lose both the ability to work with an iPhone and a gateway (‘Mesh’ with a capital ‘M’) at the same time and it also limits our next feature — meshing (‘mesh’ with a small ‘M’). We also have to test that our over-the-air firmware updates and boot loader will still function properly, so Lorenzo will do that today. This is currently “Plan B”.

(3) Throw away ~2,000 Climates ~300 Sentry’s and make new ones. This is actually an option we’ve considered, even though it’s incredibly expensive (tens of thousands of dollars). One of the challenges, though, is replacing the existing system-on-a-chip with a revision that has a newer Nordic nRF51822 silicon in it. The semiconductor industry works very slowly and we’re looking at a 90-day “factory lead-time” as the SoC is built for us; which isn’t really a viable option. Time is actually bigger deterrent to us at this point than money.

We’ll provide another update on Monday (or sooner) regarding this as it’s a critical path issue. We’ll also make a decision one way or the other on Monday the 19th.

Janet made hundreds of Climate boxes that now anxiously await their contents :(rr

Janet made up hundreds of Climate boxes that now anxiously await their contents 😦

Test Flight Feedback

We had fifteen or so people sign up to play with our iOS app via Test Flight and a couple of people who weren’t running iOS8 who we’ve manually gotten the app to (our app runs on iOS7; Apple’s Tests Flight is an iOS8-only feature, but we have a procedure to get the app to iOS7 folk for beta-testing purposes).

The biggest and loudest feedback was that the “hamburger” menus were confusing in Build 205, so in Build 206 we’ve:

– Removed the hamburger menu icons at the top of the screen
– Expanded the “wave” area at the bottom of the screen
– Built a full button bar in the “wave” area that has buttons for the drawer menus that contain the list of Wimoto’s you have (right drawer), and the settings and other information (left drawer)
– Added a quick shortcut to add additional Wimotos’ directly from the list of Wimoto’s in the right drawer menu
– Generally tightened-up the UI and UX

Refined "menu bar" in the grey wave area allows for one-handed operation -- even on an iPhone 6/6+

Refined “menu bar” in the grey wave area allows for one-handed operation — even on an iPhone 6/6+

Initial feedback from a couple of people has been much more favourable in terms of user experience.

Build 207 will be the one that will be the first release in the App Store, which should happen next week (the approval counter unfortunately resets with every new build). Build 208 adds an app-walkthrough, which we think is also important, and that should be the first “App Update” push if we can’t sneak it in to the Apple review process.

Build 100 of the Android app should hit Google Play within the next few days. It’s still trailing the iOS app in terms of functionality, but it’s better that we get it out there and improve it rather than wait for it to reach parity and then release it. Build 200 will be the first build that mimics the iOS functionality and is currently 2-3 weeks away. Luckily there’s no Google Play shenanigans  to contend with for “appcessories” makers 😉

6 comments

  1. Thanks for the update guys. Sounds extremely frustrating and difficult to have diagnosed. I hope Nordic comp snare in some way as a clock that floats that severely within its rated temperature is pretty poor. I wonder how many times they’ve had the same feedback. Obviously they’ve had some as their new silicon already fixed the problem.

    1. Frustrating is one word for it 🙂

  2. Gaston Paradis · ·

    Thanks for the update and your honesty. What is the status of the other mote unit Water, Sentry and Grow? The last one being the reason I backed your project almost 2 years ago. Are they already manufactured?

    1. Hi Gaston, we’ve been able to do some quick work to ‘save’ Leak [Water], Grow and Thermo by adding a discrete low-frequency clock source (see today’s post) and trashing the old PCB design. Climate was already fully manufactured (a very large run), and Sentry was already partially run….but we’re still hopefully this can be fixed in firmware.

  3. David Smith · ·

    Are there in Wimoto’s ready for delivery?

    1. Hi David, did you maybe mean “are they in Wimoto’s ready for delivery?” If so, the answer is “yes”. But we have a fix it seems.

%d bloggers like this: