flash - Why is NOR memory considered XIP, and NAND memory is not?

Thursday, 5 March 2015

flash - Why is NOR memory considered XIP, and NAND memory is not?

I've been reading up on NOR and NAND memories in search of an answer as to why exactly is NOR a XIP (eXecute In Place), and NAND is not. While both of the links were extremely informative and well-written, neither explicitly satisfied my curiosity.

I think I understand the basic pros and cons of both kinds. The way I see it, NOR memory is suitable for program storage, instead of data storage, because it has fast read times, and horrendously slow write/erase times. Clear so far. Also, NOR offers true random access (due to the NOR gates, which are essentially parallel and not serial, if I understood it correctly), as opposed to NAND, which typically works on the level of words, serially, not individual bits.

This made me think that the XIP part is somehow due to the ability to access every individual bit, NOR being random. I am, however, not fully convinced this is so. Is that right, or am I spouting nonsense?

A followup if it indeed is so - I've gotten the impression that NAND actually can be implemented in a way that achieves random access (although there is no real reason to do this, AFAIK). Could XIP theoretically then also be achieved with such NAND memory?

Answer

XIP requires random access -- the CPU needs to read the instruction stream as it executes it. If there is a loop, it needs to re-read the same instructions again (unless they are cached). If there is a branch, it needs to follow it without a delay.

NAND flash interfaces are usually designed to allow blockwise access only. If you want to execute code from that memory, you'd either read the entire block and discard everything but the instruction you are interested in (which would be horribly slow), or you add a caching mechanism.

There are implementations that implement XIP in NAND by catching a page fault, reading a block into cache and then pinning the cache to the virtual address space. With programs optimized for it (there are compilers that understand paged architectures and know to align code so functions calling each other end up on the same page) you don't lose much performance.

Note that for anything but large production runs, all these considerations are pretty much obsolete. RAM is cheap enough that for a few thousand units the difference in hardware cost is still smaller than the difference in engineering effort.

Blog

Thursday, 5 March 2015