Saturday 8 July 2017

fpga - VHDL: receive module randomly fails when counting bits


Background


This is a personal project; it regards connecting an FPGA to a N64, the byte values that the FPGA receives are then sent through UART to my computer. It actually functions pretty well! At random times unfortunately, the device will fail, then recover. Through debugging, I've managed to find the problem, however I am stumped at how to fix it because I'm fairly incompetent with VHDL.


I've been toying with the VHDL for a couple days now and I may be incapable of resolving this.


The Problem


I have an oscilloscope measuring the N64 signal into the FPGA, and the other channel connects to the output of the FPGA. I also have digital pins recording the counter value.



Essentially, the N64 sends 9 data bits, including a STOP bit. The counter counts the data bits received and when I reach 9 bits, the FPGA starts transmitting via UART.


Here's correct behavior: enter image description here


The FPGA is the blue waveform and the orange waveform is the input of the N64. For the duration of receiving, my FPGA "echos" the signal of the input for debugging purposes. After the FPGA counts to 9, it begins transmitting the data through UART. Notice that the digital pins count to 9 and the FPGA output goes LOW immediately after the N64 is finished.


Here's an example of a failure:


enter image description here


Notice that the counter skips bits 2 and 7! The FPGA reaches the end, waiting for the next start bit from the N64 but nothing. So the FPGA times out and recovers.


This is the VHDL for the N64 receive module. It contains the counter: s_bitCount.


library IEEE;
use IEEE.STD_LOGIC_1164.all;
use IEEE.STD_LOGIC_UNSIGNED.ALL;


entity N64RX is
port(
N64RXD : in STD_LOGIC; --Data input
clk25 : in STD_LOGIC;
clr : in STD_LOGIC;
tdre : in STD_LOGIC; --detects when UART is ready
transmit : out STD_LOGIC; --Signal to UART to transmit
sel : out STD_LOGIC;
echoSig : out STD_LOGIC;

bitcount : out STD_LOGIC_VECTOR(3 downto 0);
data : out STD_LOGIC_VECTOR(3 downto 0) --The significant nibble
);
end N64RX;

--}} End of automatically maintained section

architecture N64RX of N64RX is

type state_type is (start, delay2us, sigSample, waitForStop, waitForStart, timeout, count9bits, sendToUART);


signal state: state_type;
signal s_sel, s_echoSig, s_timeoutDetect : STD_LOGIC;
signal s_baudCount : STD_LOGIC_VECTOR(6 downto 0); --Counting variable for baud rate in delay
signal s_bitCount : STD_LOGIC_VECTOR(3 downto 0); --Counting variable for number of bits recieved
signal s_data : STD_LOGIC_VECTOR(8 downto 0); --Signal for data

constant delay : STD_LOGIC_VECTOR(6 downto 0) := "0110010"; --Provided 25MHz, 50 cycles is 2us
constant delayLong : STD_LOGIC_VECTOR(6 downto 0) := "1100100";


begin

n64RX: process(clk25, N64RXD, clr, tdre)
begin
if clr = '1' then
s_timeoutDetect <= '0';
s_echoSig <= '1';
s_sel <= '0';
state <= start;
s_data <= "000000000";

transmit <= '0';
s_bitCount <= "0000";
s_baudCount <= "0000000";
elsif (clk25'event and clk25 = '1') then --on rising edge of clock input
case state is
when start =>
--s_timeoutDetect <= '0';
s_sel <= '0';
transmit <= '0'; --Don't request UART to transfer
s_data <= "000000000";

s_bitCount <= X"0";
if N64RXD = '1' then
state <= start;
elsif N64RXD = '0' then --if Start bit detected
state <= delay2us;
end if;

when delay2us => --wait two microseconds to sample
--s_timeoutDetect <= '0';
s_sel <= '1';

s_echoSig <= '0';
if s_baudCount >= delay then
state <= sigSample;
else
s_baudCount <= s_baudCount + 1;
state <= delay2us;
end if;

when sigSample =>
--s_timeoutDetect <= '1';

s_echoSig <= N64RXD;
s_bitCount <= s_bitCount + 1;
s_baudcount <= "0000000";
s_data <= s_data(7 downto 0) & N64RXD;
state <= waitForStop;

when waitForStop =>
s_echoSig <= N64RXD;
if N64RXD = '0' then
state <= waitForStop;

elsif N64RXD = '1' then
state <= waitForStart;
end if;

when waitForStart =>
s_echoSig <= '1';
s_baudCount <= s_baudCount + 1;
if N64RXD = '0' then
s_baudCount <= "0000000";
state <= delay2us;

elsif N64RXD = '1' then
if s_baudCount >= delayLong then
state <= timeout;
elsif s_bitCount >= X"9" then
state <= count9bits;
else
state <= waitForStart;
end if;
end if;


when count9bits =>
s_sel <= '0';
if tdre = '0' then
state <= count9bits;
elsif tdre = '1' then
state <= sendToUART;
end if;

when sendToUART =>
transmit <= '1';

if tdre = '0' then
state <= start;
else
state <= sendToUART;
end if;

when timeout =>
--s_timeoutDetect <= '1';
state <= start;


end case;
end if;
end process n64RX;
--timeoutDetect <= s_timeoutDetect;
bitcount <= s_bitCount;
echoSig <= s_echoSig;
sel <= s_sel;
data <= s_data(4 downto 1);

end N64RX;


So, any ideas? Debugging tips? Tips on coding Finite State Machines?


In the meantime, I'll keep playing around with it (I'll have it eventually)! Help me Stack Exchange, you're my only hope!


Edit


A further discovery in my debugging, the states will jump from waitForStart back to waitForStop. I gave each state a value with waitForStart equal to '5' and waitForStop equal to '4'. See the image below: enter image description here



Answer



I don't see a synchronizer on the rx data line.


All asynchronous inputs must be synchronized to the sampling clock. There are a couple of reasons for this: metastability and routing. These are different problems but are inter-related.


It takes time for signals to propagate through the FPGA fabric. The clock network inside the FPGA is designed to compensate for these "travel" delays so that all flip flops within the FPGA see the clock at the exact same moment. The normal routing network does not have this, and instead relies on the rule that all signals must be stable for a little bit of time before the clock changes and remain stable for a little bit of time after the clock changes. These little bits of time are known as the setup and hold times for a given flip flop. The place and route component of the toolchain has a very good understanding of the routing delays for the specific device and makes a basic assumption that a signal does not violate the setup and hold times of the flip flops in the FPGA. With that assumption and knowledge (and a timing constraints file) it can properly place the logic within the FPGA and ensure that all the logic that looks at a given signal sees the same value at every clock tick.


When you have signals that are not synchronized to the sampling clock you can end up in the situation where one flip flop sees the "old" value of a signal since the new value has not had time to propagate over. Now you're in the undesirable situation where logic looking at the same signal sees two different values. This can cause wrong operation, crashed state machines and all kinds of hard to diagnose havoc.



The other reason why you must synchronize all your input signals is something called metastability. There are volumes written on this subject but in a nutshell, digital logic circuitry is at its most basic level an analog circuit. When your clock line rises the state of the input line is captured and if that input is not a stable high or low level at that time, an unknown "in-between" value can be captured by the sampling flip flop.


As you know, FPGAs are digital beasts and do not react well to a signal that is neither high nor low. Worse, if that indeterminate value makes its way past the sampling flip flop and into the FPGA it can cause all kinds of weirdness as larger portions of the logic now see an indeterminate value and try to make sense of it.


The solution is to synchronize the signal. At its most basic level this means you use a chain of flip flops to capture the input. Any metastable level that might have been captured by the first flip flop and managed to make it out gets another chance to be resolved before it hits your complex logic. Two flip flops are usually more than sufficient to synchronize inputs.


A basic synchronizer looks like this:


entity sync_2ff is
port (
async_in : in std_logic;
clk : in std_logic;
rst : in std_logic;
sync_out : out std_logic

);
end;

architecture a of sync_2ff is
begin

signal ff1, ff2: std_logic;

-- It's nice to let the synthesizer know what you're doing. Altera's way of doing it as follows:
ATTRIBUTE altera_attribute : string;

ATTRIBUTE altera_attribute OF ff1 : signal is "-name SYNCHRONIZER_IDENTIFICATION ""FORCED IF ASYNCHRONOUS""";
ATTRIBUTE altera_attribute OF a : architecture is "-name SDC_STATEMENT ""set_false_path -to *|sync_2ff:*|ff1 """;

-- also set the 'preserve' attribute to ff1 and ff2 so the synthesis tool doesn't optimize them away
ATTRIBUTE preserve: boolean;
ATTRIBUTE preserve OF ff1: signal IS true;
ATTRIBUTE preserve OF ff2: signal IS true;

synchronizer: process(clk, rst)
begin

if rst = '1' then
ff1 <= '0';
ff2 <= '0';
else if rising_edge(clk) then
ff1 <= async_in;
ff2 <= ff1;
sync_out <= ff2;
end if;
end process synchronizer;
end sync_2ff;


Connect the physical pin for the N64 controller's rx data line to the async_in input of the synchronizer, and connect the sync_out signal to your UART's rxd input.


Unsynchronized signals can cause weird issues. Make sure any input connected to an FPGA element that isn't synchronized to the clock of the process reading the signal is synchronized. This includes pushbuttons, UART 'rx' and 'cts' signals... anything that is not synchronized to the clock that the FPGA is using to sample the signal.


(An aside: I wrote the page at www.mixdown.ca/n64dev many years ago. I just realized that I broke the link when I last updated the site and will fix it in the morning when I'm back at a computer. I had no idea so many people used that page!)


No comments:

Post a Comment

arduino - Can I use TI&#39;s cc2541 BLE as micro controller to perform operations/ processing instead of ATmega328P AU to save cost?

I am using arduino pro mini (which contains Atmega328p AU ) along with cc2541(HM-10) to process and transfer data over BLE to smartphone. I...