Compiler magic
Just seen this: Inside template heavy code, Visual C++ 8 was able to unroll a loop and add a prefetchnta
call to prefetch data - first time I’ve seen this with VC++ 8. Well done, folks!
Update: This seems even to work if the target location of the prefetched byte is computed via a static function call, cause it adds a prefetch inside code where all access happens via matrix (row, column);
(which in turns calls a small function to compute the exact memory location), now that is cool!