One particularly irritating variety of bug in a microprocessor-controlled system is for the microprocessor to unexpectedly reset. An important tool for debugging this kind of problem is a list of possible causes. What could cause a microcontroller to unexpectedly reset?
Answer
On PIC and dsPIC chips, I have observed the following causes of unexpected reset.
Hardware:
- Reset pin driven low or floating. Check the obvious stuff first!
- ESD coupling into the reset pin. I've seen this happen when completely unrelated equipment gets turned on on the same desk. Make sure there's enough capacitance on the reset pin, possibly as much as 1 uF.
- ESD coupling into other pins of the processor. Scope probes in particular can act as antennae, couple noise into the chip and cause odd resets. I've heard reports of "invalid opcode" reset codes.
- Bad solder joint/intermittent bridge. Could be losing or shorting a power rail, either on the processor or somewhere else on the board.
- Power rail glitch/noise. Could be caused by any number of external problems, including a damaged regulator or a dip in the upstream supply. Make sure the power rails feeding the processor are stable. May require more cap somewhere, perhaps decoupling cap directly on the processor.
- Some microcontrollers have a Vcap pin, which must not be connected to VDD and must have its own capacitor to common. Failure to connect this pin properly may have unpredictable results.
- Driving an analog input negative past a certain limit causes a reset that reports in RCON like a brownout. The same may be true of digital inputs.
- Very high dV/dt in a nearby power converter can cause a brownout reset. (See this question.) I have seen this in two cases, and in one I was able to track it to capacitive coupling. An IGBT was switching 100-200 amps, and at turn-off some feedback circuits were seeing a few microseconds of noise, going from 2V to over 8V on a 3.3V processor. Increasing the filter cap on that feedback rail made the resets stop. One could imagine that adding a dV/dt filter across the transistor might have had a similar effect.
Software:
- Watchdog timer. Make sure the watchdog timer is cleared often enough, especially in branches of your code that may take a long time to execute, like EEPROM writes. Test for this by disabling the watchdog to see if the problem goes away.
- Divide-by-zero. If you're performing any divide operation in your code, make sure the divisor can never be equal to zero. Add a bounds check before the division. Don't forget that this also applies to modulo operations.
- Stack overflow. Too many nested function calls can cause the system to run out of dynamic memory for the stack, which can lead to crashes at unusual points in code execution.
- Stack underflow. If you are programming in assembler, you can accidentally execute more RETURNs than you executed CALLs.
- Non-existent interrupt routine. If an interrupt is enabled, but no interrupt routine is defined, the processor may reset.
- Non-existent trap routine. Similar to an interrupt routine, but different enough I'm listing it separately. I've seen two separate projects using dsPIC 30F4013 which reset randomly, and the cause was tracked to a trap that was called but undefined. Of course, now you have the question of why a trap is called in the first place, which could be any number of things, including silicon error. But defining all trap handlers should probably be a good early step in diagnosing unexplained resets.
- Function pointer failure. If a function pointer does not point to a valid location, dereferencing the pointer and calling the function pointed to can cause a reset. One amusing cause of this was when I was initializing a structure, with successive values of NULL (for a function pointer) and -1 (for an int). The comma got typoed, so the function pointer actually got initialized to NULL-1. So don't assume that just because it's a CONST it must contain a valid value!
- Invalid/negative array index. Make sure you perform bounds checking on all array indices, both upper and lower bounds, if applicable.
- Creating a data array in program memory that's larger than the largest section of program memory. This may not even throw a compilation error.
- Casting the address of a struct to a pointer to another type, dereferencing that pointer, and using the dereferenced pointer as the LVALUE in a statement can cause a crash. See this question. Presumably, this also applies to other undefined behaviors.
On some dsPICs, the RCON register stores bits indicating cause of reset. This can be very helpful when debugging.
No comments:
Post a Comment