Monday, 26 January 2015

microcontroller - Programming pattern to generate VGA signal with micro-controller?


I want to generate VGA signal with micro-controller (like TI Tiva ARM which runs at 90/120Mhz speed).


I'm not sure how to make accurate timings with micro-controller. Which programming pattern I need to use?


Do I need any inline assembler code? How to wisely use interrupts?


Would be great if anybody show some pseudo code how to generate VGA signal.


I successfully generated VGA signal with FPGA. But I just can't figure out how to do it with MCU.



Answer



This answer is all based on standard VGA screens, which are 640 x 480 since that was what you referenced in your answer. Timing for SVGA (super VGA, 800 x 600), XGA (1024 x 768) and so on is going to be different. This page has a good list of virtually all the resolutions in use. But the pixel timing for these bigger screens is going to be so fast I doubt you could generate it using a micro.



So the standard VGA screen is 640 pixels horizontally and 480 lines vertically. To generate the timing, a 25.175 MHz pixel clock is used. (Theoretically, this means the micro would need to be able generate pulses that are exactly 39.72 ns long, which would be rather difficult, but it seems you can get by with 25 MHz and 40 ns clocks.



The format of VGA timing for each line dates back to CRT computer screens days and before that television sets. It looks like this for a 640x480 picture (just the horizontal shown here):


enter image description here


There are two synchronization pulses -- one per line (horizontal sync) and one per frame (vertical sync). The horizontal sync pulse is preceded by a "back porch interval, and followed by a "front porch" interval. In between these two is the active video (640 pixels). The total line though is 800 pixels wide.


Likewise, the vertical lines have a sync pulse at the bottom, surrounded by a vertical back porch and front porch and is 524 pixels high.


enter image description here


Back when video was done using CRT's instead of LCD screens, there was an electron beam that scanned across the screen for each line, which then moved down a bit, scanned the next line etc. At the bottom of the screen, it had to zip back to the top.


At the end of each line, the beam has to move back to the beginning of the next line -- this takes a little time. During this time the video was blanked. This is known as the horizontal refresh time, and is the sum of the back porch+sync+front porch timing. Likewise for the vertical refresh period. That's why there is a total of 800 pixels across the screen, and 524 lines vertically, even though you only see 640x480.


The timing then can be specified in terms of clock pulses:


enter image description here


where HPX are the 640 pixels, HFP is the horizontal back porch interval, HSP is the horizontal sync pulse, HBP is the horizontal back porch interval. Likewise for the vertical. This what you need to use for your timing in the microcontroller (assuming you have a 40 ns clock).



So each full line is 800 clocks, or 800 x 40 ns = 32 µs.


Note that the critical timing only occurs when writing pixels to the screen (12 bits every 40 ns). While writing out the pixels you're not going to have much time to do anything else. But during the front porch, sync pulse, and back porch you've got 635 ns, 3.81 µs, and 1.9 µs respectively for doing other stuff.


Note that if you don't have a processor fast enough to generate 40 ns pixels, you can divide everything in half (essentially running at 12.5 MHz, or a 80 ns clock) and you will have 320 x 240 resolution. So each pixel is repeated twice as far as the monitor is concerned. It still thinks you're sending VGA.



To generate the video signals for each pixel, you can make your own DAC (digital to analog converter) using resistors. If you allocate 4 bits per color (RGB) you will need 12 resistors in total, arranged so:


enter image description here


You could get by with less bits, for example 3 bits per color, or even 2, but the picture quality won't be as good.



Assume we have a 100 MHz 32-bit processor (10 ns per instruction). Also assume we are using 12 bit pixels, stored two pixels to a 32-bit word (so we waste 8 bits unfortunately).


Before starting, assume r0 is loaded with the address of the starting 32-bit word containing pixels for this line, and r1 is loaded with address of a memory mapped I/O port, of which the lower 12 bits D0-D11 are brought out, and r2 is a temporary holding two pixels.



I'm using a made-up RISC-like instruction set, but it should be pretty obvious what is going on. Assume each instruction takes one cycle, or 10 ns.


ld r2,[r0]      ; load indirect through register r0, 32 bits (2 pixels)
andi r2,0xFFF ; and immediate, get lower 12 bits
st r2,[r1] ; store pixel to I/O port
ld r2,[r0] ; get pixel again
rsh r2,16 ; right shift 16 bits to get upper pixel
andi r2,0xFFF ; and immediate, get lower 12 bits
st r2,[r1] ; store pixel to I/O port
addi r0,4 ; increment memory address (assume byte addressing)


If the processor is faster than 100 MHz, you will need to add nops so the entire sequence still takes 80 ns for two pixels.


You repeat this sequence of eight instructions 320 times inline. At the end, set up an interrupt for 635 ns in the future (end of back porch) and return from the interrupt level (entered when you started sending you pixels) In the meantime you've got 63 instructions free for the base level.


At the interrupt (end of back porch), generate the beginning of the horizontal sync pulse (goes low) and set another interrupt this time 3.81 µs in the future (end of horizontal sync) and exit from the interrupt. You can execute about 380 instructions this time.


At the interrupt (end of horizontal sync), complete the horizontal sync (goes back high), set an interrupt for 1.9 µs in the future (end of front porch) and return from the interrupt. 190 or so more instructions available.


At the final interrupt (end of front porch), start outputting pixel data again (all in the interrupt routine).


Similar code to handle the end of the frame (vertical sync etc.).


Except for the horizontal retrace time, all of this is just to copy the pixels from the RAM to the screen. There would be an additional 32 µs * 44 lines or 1.408 ms during the vertical retrace available, or 140,800 instructions for other stuff. But this, added with the extra instructions available during the horizontal retrace (304,000) would still not be fast enough to generate a full frame of video for the next time at the full 640x480 resolution. You would really need at least a 200 MHz processor to both generate video and copy it to the VGA output.


It's no wonder PCs come with specialized graphics hardware from the very beginning that writes the contents of a video RAM out to a video port (VGA or whatever) without the assistance of the main processor..


No comments:

Post a Comment

arduino - Can I use TI's cc2541 BLE as micro controller to perform operations/ processing instead of ATmega328P AU to save cost?

I am using arduino pro mini (which contains Atmega328p AU ) along with cc2541(HM-10) to process and transfer data over BLE to smartphone. I...