Anatomy of a DJI Flyaway

ianwood

Taco Wrangler
Joined
Jan 7, 2014
Messages
5,107
Reaction score
2,043
Location
Lost Angeles
I had my first ever bonafide flyaway about 6 weeks ago. I had my theories, bought $1,000 in replacement parts, rebuilt and moved on. Then it happened again.

To view this content we will need your consent to set third party cookies.
For more detailed information, see our cookies page.

This time, however, I had a data logger attached to the Phantom, a custom made miniature "black box" that records 75+ different flight parameters 5 times per second. I've spent the past day analyzing the data and I've discovered some things that I didn't suspect before.

CAN-logger-small.jpg


None of the sensors including IMU, compass, GPS, etc. were to blame. The battery, ESCs, motors did exactly as they were supposed to do. There was no loss of connection, GPS interference or any other external factors.

The NAZA locked up. It had intermittent freezes that stopped it from functioning normally. Here is a snapshot of the logged data from when the NAZA stopped working normally.

bendix-crash-table.jpg


At 12 minutes, 45 seconds, the data shows 5 anomalies:
  • The Course Over Ground (COG) freezes. It had been fluid all throughout the earlier part of the flight.
  • The speed (Total SPD) changes to 769,361.92 m/s. It had been normal throughout the earlier part of the flight.
  • The satellite count (SAT) jumps to 204 then returns to normal.
  • Erratic updates of ATTI pitch and roll values. 2Hz or slower.
  • Erratic updates of motor speed. 2Hz or slower.
The cause is likely to be one of two things:
  • Defective hardware in NAZA (e.g. cold solder joint) that took several flights of light vibration to manifest.
  • Firmware defect when flying in edge condition (corner of the envelop) that rarely manifests.
I am working on recovering more of the video (file is corrupted due to loss of power).

I've included a more detailed analysis of the data which is in the PDF link here: http://www.ianwood.com/docs/anatomy-of-a-dji-flyaway-v1.pdf
 
Last edited:
Wow, this is very impressive. I would for sure send this on to DJI so you can get a replacement.
 
  • Like
Reactions: ianwood
The most riveting post I've seen on this or any other UAV forum.

Does this not prove what we suspected right back in the day of the early P1's? That the naza freezes.

Ian, a muppet guide on how you gathered this data please.
 
Wow is right!

Sorry to hear about a second crash but glad you were prepared.
You were prepared, understatement of the year.
 
  • Like
Reactions: ianwood
Great job Ian.

Edge conditions are not going to be easy to investigate... what about examination of NAZA for cold joints etc. I remember seeing a report/youtube clip somewhere of some guys who flew multicopters commercially having a problem which they traced (I think !) to a cold joint on one of the NAZA connectors. I wonder if that is still about.
 
  • Like
Reactions: ianwood
So this was an issue with the NAZA on the P2 then? It seems odd that this would also happen on other models through various updates and firmware changes. Certainly sounds like a firmware bug
 
So this was an issue with the NAZA on the P2 then? It seems odd that this would also happen on other models through various updates and firmware changes. Certainly sounds like a firmware bug

Out of interest, why have you ruled out hardware? E.g. flawed manufacturing process/ cold joint aggravated by vibration and age. If this did happen twice in the same aircraft I would suspect hardware over firmware.
 
Out of interest, why have you ruled out hardware? E.g. flawed manufacturing process/ cold joint aggravated by vibration and age. If this did happen twice in the same aircraft I would suspect hardware over firmware.

Happening twice in the same aircraft doesn't necessarily rule out hardware or software (or indeed suggest either was the cause) - I think perhaps since one aircraft could have it after just a few flights whilst another could fly hundreds of flights successfully, that does actually suggest hardware.

My reasoning for leaning more towards firmware is purely based on personal experience. I've just completed a 4-year degree in electronics engineering, and throughout my time these kind of intermittent problems usually point to software/firmware, particularly for the following reasons:
  • It's the NAZA flight controller which is causing the issue, which is most likely just a number of microprocessors on a PCB. Microprocessors are held in place physically by soldering joints and these are highly unlikely to fail if they don't work the first time. In contrast, if it was a sensor that was causing the issue, this would more likely point to a hardware issue e.g. wear and tear of mechanical parts or failure of analogue components leading to strange readings.
  • In my experience, physical failures of microprocessors are highly uncommon and usually just result in total failure of the module, rather than erratic behaviour.
  • Firmware/software can be vastly more complex than the hardware. Particularly in a control module such as the NAZA, there is not much to mechanically go wrong, while there are potentially thousands of unaccounted for issues in firmware. Testing the hardware would be a relatively simple task, whereas thoroughly testing the firmware might take hundreds of man-hours. Generally a "good enough" approach is taken, and while I'm sure DJI test their firmware well, bugs will always be present in any complex system such as this.
  • These things are incredibly complex, and the software of such a central device is less easily tested than the hardware. The hardware in the case of the NAZA likely revolves around the microprocessors, and those are often standard, produced by someone like Texas Instruments, are deployed worldwide in hundreds of different applications, and have a much more rigorous testing phase than DJIs firmware does.
If you're unconvinced, take a look at what happened to NASA's Mars Pathfinder rover in 1997:
http://research.microsoft.com/en-us/um/people/mbj/mars_pathfinder/mars_pathfinder.html
The hardware worked well, but despite their enormous budget, even NASA missed some multi-threading bugs.
 
  • Like
Reactions: ianwood
Is this the same NAZA from first flyaway/crash ?
What software are you running ?
How much damage happened this time ?
$1000 to rebuild ?
Where you in the same place as the first crash ?
Sorry bud your having some bad luck :(
 
What I'd like to know is if the non-Lite Naza has the same known problem.
Because then a firmware re-flash to the non-Lite would solve this, though controversial and possibly "piracy"."

Also, I agree with Trumple. The PCBs and soldering inside the Phantom is pretty high quality. Even the PCB in the controller, where you wouldn't really expect it, has an ENIG finish. The solder joints that we have seen pictures of failure on are caused by human factor; things that are not machine soldered, ie. motor to ESC wire. This is a connection a human solders.
We have also seen IC failures on the ESC, but these are probably due to high heat and vibration.

The Naza, although I've never taken it apart and only seen pictures, seems to be fully machine placed and soldered parts except the board-to-board and pin connectors. A failure could happen there, although I doubt it would cause the deviations posted above.
Again, ENIG PCBs and even a clear lacquer coating ensure very good part-to-board connection. To break this connection, the part would have to suffer very high heat, enough to weaken the solder and in combination with vibration cause disconnection. If this happened, It would most likely not be able to fly again.

And as mention by Trumple, it is virtually impossible to test firmware in all environments and conditions. This is probably why the Jhook problems took so long to solve. They never sent anyone to actually test the Jhook and did the firmware fix based on user data and the mathematics of GPS in relation to magnetic declination.
 
Last edited:
  • Like
Reactions: ianwood
Happening twice in the same aircraft doesn't necessarily rule out hardware or software (or indeed suggest either was the cause) - I think perhaps since one aircraft could have it after just a few flights whilst another could fly hundreds of flights successfully, that does actually suggest hardware.

My reasoning for leaning more towards firmware is purely based on personal experience. I've just completed a 4-year degree in electronics engineering, and throughout my time these kind of intermittent problems usually point to software/firmware, particularly for the following reasons:
  • It's the NAZA flight controller which is causing the issue, which is most likely just a number of microprocessors on a PCB. Microprocessors are held in place physically by soldering joints and these are highly unlikely to fail if they don't work the first time. In contrast, if it was a sensor that was causing the issue, this would more likely point to a hardware issue e.g. wear and tear of mechanical parts or failure of analogue components leading to strange readings.
  • In my experience, physical failures of microprocessors are highly uncommon and usually just result in total failure of the module, rather than erratic behaviour.
  • Firmware/software can be vastly more complex than the hardware. Particularly in a control module such as the NAZA, there is not much to mechanically go wrong, while there are potentially thousands of unaccounted for issues in firmware. Testing the hardware would be a relatively simple task, whereas thoroughly testing the firmware might take hundreds of man-hours. Generally a "good enough" approach is taken, and while I'm sure DJI test their firmware well, bugs will always be present in any complex system such as this.
  • These things are incredibly complex, and the software of such a central device is less easily tested than the hardware. The hardware in the case of the NAZA likely revolves around the microprocessors, and those are often standard, produced by someone like Texas Instruments, are deployed worldwide in hundreds of different applications, and have a much more rigorous testing phase than DJIs firmware does.
If you're unconvinced, take a look at what happened to NASA's Mars Pathfinder rover in 1997:
http://research.microsoft.com/en-us/um/people/mbj/mars_pathfinder/mars_pathfinder.html
The hardware worked well, but despite their enormous budget, even NASA missed some multi-threading bugs.
Many of your points have some validity. But simply based on the fact that there are hundreds of thousands of aircraft exercising the same or very similar firmware for many hours without issue and the fact that a good deal of aircraft behaving like this seem to have done over 100 flights could be significant.
 
  • Like
Reactions: Trumple
Many of your points have some validity. But simply based on the fact that there are hundreds of thousands of aircraft exercising the same or very similar firmware for many hours without issue and the fact that a good deal of aircraft behaving like this seem to have done over 100 flights could be significant.

Yep agreed. I think the fact that the issues seem to be limited to just a few aircraft indicates that it could be hardware related. If it was firmware, it should be more of an apparent problem. However, without any kind of statistics we can't really know one way or the other.
 
Since only some of the GPS data is corrupt my best guess would be firmware failure in the GPS chip. This may not even be the Naza's fault, but the Naza should have some kind of failsafe for corrupt data, ie. when speed from GPS goes over XY, stop and slowly land regardless of GPS data.
The corrupt data above makes all GPS function totally useless, even RTH wouldn't be possible unless it ignored COG and speed and tried to return with some default values.
 
  • Like
Reactions: JKDSensei
I had a similar incident yesterday. Have P2V+. Calibrated Compass and had Homelock & GPS lock. Did a couple of previous flights at same location with no problems. On the third flight I was hovering about 20 feet in the air with no wind and it was rock steady. All of the sudden the P2V+ took a extreme bank and it then tried to correct itself wobbling all the way toward the ground. I was able to apply FULL Throttle Up to slow the descent so it just bumped the ground and flew back up. When it was back in the air I could not control it as normal. I had to give it full input in the opposite direction it started to go to to keep it in the normal "Sticks Off" position. I landed it, changed my shorts and checked everything out and it seemed OK except for the camera was now tilting due to the hard hop. I fixed that with no problems. I was in a park away from any sources of interference and I was unable to explain why the P2V+ changed back so abruptly without any input from the sticks. This is the second time this has happened and I am getting very weary of flying the P2V+ knowing that at any second it could just have a mind of its own.
 
Ian

were you flying around in GPS mode when this happened or stationary?

kudos to you for having the skills to incorporate a data logger into your stock Phantom and for discovering this glitch. we discussed this a while ago at multirotorforums.com that the Phantom likely doesn't have the processing power to allow a user to fly around in a dynamic fashion while also engaged in GPS position hold mode. the data flow would be staggering as it attempts to calculate position hold info that is changing rapidly as the heli is flying around. even the APM 2.5 runs into a computational wall when an Okto is being asked to do too many things at the same time, hence the Pixhawk.

we've concluded the same thing as a lot of other users, that the Phantom 3 is a great little helicopter for the money but the controller is not of the same caliber as other top shelf controllers (DJI A2, Mikrokopter FC 2.5, Pixhawk, etc.). Flying around while also having GPS position hold engaged just doesn't seem like a good idea and we've recommended against it.

Great work!

bart
multirotorforums.com
 
Last edited:
You should get a full time salary from DJI. They would have paid someone to do this anyway.

I have a feeling they may already know all about it :(
 

Members online

Forum statistics

Threads
143,066
Messages
1,467,354
Members
104,933
Latest member
mactechnic