Often underestimated, memory allocations and copying can be a serious performance problem. Especially when using STL containers, which often allocate memory "behind your back", the memory usage can become difficult to predict, resulting in non-obvious performance problems.
There are many reasons for this, at which we'll take a look now. I assume you are writing an application with C++, though most of this applies to C, too. The first problem only applies to C++ and the STL containers. Be careful to always pass by reference. A very simple mean, which can have dramatic effects, especially if you often copy small containers with just a few elements. While it can be tolerable for vectors, for all other containers you surely want to avoid copying at all costs, as it usually results in a lot of tiny allocations.
Tiny allocations are also a major problem. A standard list of integers is probably the worst you can do, as it will fragment the memory extremely. If your application becomes multi-threaded, this problem is multiplied, as the system allocators are often protected by a global lock, which can easily serialise all your threads. Even if you append to a container, you can fragment the memory. For example, a vector will double (or so, the Microsoft STL multiplies by 1.5) its memory each time it runs out of it. On each reallocation, the following happens:
- A new memory block, large enough to store the data, is created.
- The old data is copy-constructed into the new memory, and then destroyed (until we get move construction with C++0x).
- The remaining data is initialised by running the default constructor.
- The old block is deallocated.
The order of this operation may be different, but it should be obvious that his cannot be cheap. If possible, try to reserve the required amount of memory up front, instead of relying on automatic resizing. In most cases, you can either use a range-based insert (which in theory should call reserve first), or manually call reserve/resize and insert the elements.
Beware of non-efficient implementations of STL containers. I'm very sad that something like the EASTL is not coming anytime soon (if you are interested to work on a free implementation, contact me). In the meantime, different STL vendors have sometimes extremely inefficient implementations. For example, Microsoft has added range checks to the index-based operator  in their STL (unless you compile all your code with _SECURE_SCL disabled, but there is hope Microsoft will disable it for VC10), which make the access approximately half the speed of direct pointer access.
Containers reloaded: Use vectors if possible. Vectors store their memory in a contiguous array, and have thus the best cache performance. The Loki library contains a convenient vector-based map implementation, which will be much faster for small maps than the standard map (those use often a red-black tree, and jump around the memory like hell). It's no use to have a list if you have only 5 elements, as you can be pretty sure that the vector will outperform it in every way.
So much for now, I'm pretty sure I forgot half the stuff I wanted to blog about, so stay tuned for a part two (or simply use the comment function).