Anteru's blog
  • Consulting
  • Research
    • Assisted environment probe placement
    • Assisted texture assignment
    • Edge-Friend: Fast and Deterministic Catmull-Clark Subdivision Surfaces
    • Error Metrics for Smart Image Refinement
    • High-Quality Shadows for Streaming Terrain Rendering
    • Hybrid Sample-based Surface Rendering
    • Interactive rendering of Giga-Particle Fluid Simulations
    • Quantitative Analysis of Voxel Raytracing Acceleration Structures
    • Real-time Hybrid Hair Rendering
    • Real-Time Procedural Generation with GPU Work Graphs
    • Scalable rendering for very large meshes
    • Spatiotemporal Variance-Guided Filtering for Motion Blur
    • Subpixel Reconstruction Antialiasing
    • Tiled light trees
    • Towards Practical Meshlet Compression
  • About
  • Archive

Cache-aware programming

June 24, 2007
  • Optimisation
  • Programming
approximately 1 minutes to read

I’ve been working today on a project, and after the first implementation session I ran it through a profiler to see whether I had some obvious performance bottlenecks. Turned out not to be the case, but looking through the code, I’ve seen some opportunity to reduce the working set size a bit and partition the data so the CPU would work on a smaller part of it. Took quite some while, but I got down to less than 0.000x (the x is there cause the profiler does display only 0.000) misses per instruction, both L1, L2 and TLB, giving a 0.001-0.002% performance penalty for the L1 data misses and 0.000-0.001% for the L2 misses. Some more tuning improved the branch prediction hit rate up to 99.39% (originally, it was slightly below 99% due to the partitioning overhead), making my program overall 50% faster. Note that I didn’t change the underlying algorithms, I just changed how the data is presented to the algorithmic kernel! So even on modern CPUs with large caches and rather small working sets (just a few times bigger than the cache), cache aware code is still a win.

Previous post
Next post

Recent posts

  • Data formats: Why CSV and JSON aren't the best
    Posted on 2024-12-29
  • Replacing cron with systemd-timers
    Posted on 2024-04-21
  • Open Source Maintenance
    Posted on 2024-04-02
  • Angular, Caddy, Gunicorn and Django
    Posted on 2023-10-21
  • Effective meetings
    Posted on 2022-09-12
  • Older posts

Find me on the web

  • GitHub
  • GPU database
  • Projects

Follow me

Anteru NIV_Anteru
Contents © 2005-2025
Anteru
Imprint/Impressum
Privacy policy/Datenschutz
Made with Liara
Last updated February 03, 2019