16 February, 2010

Catching memory leaks, redux

I tried to discuss things, that I've missed last time.

Reading bug reports (or: how can I resolve mem-leak?)

Well, we already talked about this before. Thought I've already said all things, but I think that it's worth to mention them one more time - more focused on mem-leaks this time. That's because many people seems to miss the whole point of mem-leak reports.

Okay, so the first error of newbies is immediate (without any side-thoughts or doubts) opening the code, on which call stack points to - and just sitting here, starring at the line, trying to figure out, what go wrong with this code. The problem is: memory leak report does not point to the problem.

For example, report about leaking of String type will (probably) point to some trivial code like this:
S := IntToStr(I);
And what do you expect to find here? :)

And it is not some kind of tool's limitation - it's just that these tools can not read your mind (well, not yet, you know? :D ). Let's think for a second: what is a leak? Leak is... well, it is when we allocate something and do not release/free it. So, mem-leak report can (and, actually, will) contain that "something" - a resource; and it contains "allocation" - i.e. call stack to line of code, which allocates resource. But where is our problem? An actual problem is sitting at "release/free" moment! A tool can not know: where did you (your code) planned to release resource? That's why report contain only information about allocation. There is no direct information on the problem in the report.

What does it mean, "the problem is in release"? It means, that either we lost pointer to resource or we do have a pointer, but our release routine wasn't called for some reason. And those are points, which you should look at.

So, what should you do with mem-leak report? Well, you first need to follow call stack and find code. But the next thing is different (comparing to exception bug report): you don't need to analyze this line. You need:
  1. Note, what resource was allocated here (object, string, array, memory block, etc...).
  2. Find, where this resource should be released "by plan" (call do destructor, out of scope, explicit free call, etc...).
  3. Found reason, why resource wasn't release at founded location.
As I already said, there can be 2 reasons for item 3: we lost reference or we missed the call.

For some reason, newbies like to post their mem-leak reports on forums, saying: "Help me! I can't understand, what's wrong here!". Hope, now you're seeing, how strange this looks: nobody, except the author of the code, can know that! That's because only author can tell, where did he planned to release resources. This looks especially stunning if there was no code posted, except allocation line (which, as we already figured out, have absolute nothing to do with the leak problem) - I think they suppose, that everybody have some kind of telepathic powers :)

That's why it will be more correct to post at least your code along with the report. Better yet - study situation by yourself before posting (we're discussing it now). Or just ask a different question. For example: ask, what can be possible reasons for leak in this code? There may be few guesses answered - and you can check, if they are valid for your case, thus, solving the leak.

Well, it is (more or less) clear with real pointers: you forgot to call release func, you re-assign pointer without releasing old value, etc, etc. But the leaks with auto-finalized types (strings, interfaces) often stun people: how is it possible at all? Isn't automatic management supposed to solve these kind of problems?

Well, certainly, yes - but this doesn't mean, that you can't mess up here ;)

Here goes few examples (note: all examples below are special crafted; nevertheless, they illustrate real-life problems, which you can encounter in your practice; other note: different Delphi versions can generate different machine code, so you may need to adjust examples a little; examples below use Delphi 14.0.3513.24210).

Example #1: memory corruption.
You have absolutely no problems with resource management in this example. But you have another problem somewhere else. You overwrite memory (a dreaded buffer overflow error!) - possible, by using low-level routines - thus, erasing pointer to resource, which leads to forgotten resource and, therefore, a leak:
// Warning: BAD code ahead

{$O-} // disable optimization to force compiler to accept stupid code 

procedure TForm1.FormCreate(Sender: TObject);

  procedure Test;
  const
    ArrLen = 5;
  var
    A: array[1..ArrLen - 1] of Integer;
    S: String;
  begin
    // Some actions here
    S := IntToStr(5); // <- mem-leak here!
    // Some actions here
    FillChar(A, ArrLen * SizeOf(Integer), 0);
    // Some actions here
  end; 

begin
  Test;
end; 

initialization
  ReportMemoryLeaksOnShutdown := True;
end.
Code in this example will produce a memory leak. Call stack will point to marked line, which (obviously) does not have any problem. Even more: all work with S does not have any problem either! The real reason lies within work with A array: a FillChar routine clears one element more, than array contains (quite a real problem, BTW - here it happened due to unusual array's range). Because, there is a S variable right after array - FillChar will clear it. Ooops! We just lost our pointer to string!

Example #2: stack corruption.
Well, strictly saying, it is memory corruption too. But first example can occur with any memory - not necessary the stack. This example is specific for stack only:
// Warning: BAD code ahead

{$O-} // disable optimization 

procedure SomeProc(I1: Pointer; I2: Integer); stdcall;
begin
  // Some actions here
end; 

procedure TForm1.FormCreate(Sender: TObject);

  procedure Test;
  var
    P: procedure(I: Pointer); stdcall;
    D: Pointer;
    S: Pointer;
  begin
    P := @SomeProc;
    D := nil;
    // Some actions here
    GetMem(S, 512); // <- mem-leak here!
    P(S);
    SomeProc(nil, 0);
    FreeMem(S);
    // Some actions here
  end; 

begin
  Test;
end; 

initialization
  ReportMemoryLeaksOnShutdown := True;
end.
The problem in this example is mismatch of prototypes (declarations) of P and SomeProc. So, calling P (actually - SomeProc) will pop one more item from stack, resulting in "shifted" stack data. So, any further work (call of SomeProc right after P's call) will damage (in our example - erase) S pointer. How can application survive a stack corruption, you say? Well, that happens. So, we lose the pointer in this example too.

How to fight with such problems - in the next time.

Leak of other resources' types

You can catch memory leaks and it is memory, that was allocated through Delphi's memory manager. I mean, that you can do this by using methods and tools, explained in previous article. You can't monitor any other types of resources by using this approach (replacing/using Delphi's memory manager). Sometimes, people forgot that their tool is not "all-mighty-seeing-eye", and when their apps start growing, but tool doesn't report anything - they just don't know what to think.

For example, GDI resources - they are allocated and deleted by using corresponding WinAPI calls, which (obviously) have nothing to do with Delphi's memory manager - therefore, they aren't counted in your logic of memory leaks. It's worth to note, that often all work with low-level resource is wrapped into class. For example, Delphi applications usually uses TBitmap class to manage HBITMAP objects. TBitmap is a Delphi's wrapper for system resource HBITMAP. You have one-to-one correspondence in that case (between HBITMAP's handle and its class wrapper - TBitmap). This means, that leaking one kind of resource will automatically mean leak of other resource's type and visa verse. In that case - you can catch resource leak by catching leak of corresponding memory. Well, it is not direct catch, but still a very good choice. Sometimes, there are exceptions though. In the example with TBitmap: it have ReleaseHandle method, which allows you to "release" handle HBITMAP, dis-assigning it from wrapper object - thus, breaking the one-to-one correspondence.

Second example: not resources, but memory. But this time - memory, allocated through different allocator, than Delphi's memory manager. This usually includes memory allocations for WinAPI or 3rd party DLL calls. For example, if you change String type to WideString in the example #1 above - there will be no leak reported. Why? Well, the leak is still here, but it is hidden now: we have leak of different kind. WideString is wrapper for BSTR system type, which have a certain requirement: all memory requests must be completed by specific system memory manager. This means that Delphi's memory manager will be not called - and so we have no chances to find a leak.

Note, that you can analyze these operations, but this is a more complex task, than catching memory leaks in Delphi's memory manager. That's because there is no one central management here: just a bunch of different functions (besides, there is no official way to add your code to these routines - only by using hacking hooks). I have already mentioned AQ tools as example of tools, which can analyze these situations. But you can use them at testing/developer stage on developer machine only (like FastMM), you can't deploy them to use at client machines (like EurekaLog).

In fact, these examples shows importance of class wrappers. Don't use low-level functions directly all over your code - just write a class wrapper. You can test it and then use in your code. You can significantly simplify code (and, therefore, searching for a leak) by gathering all related code in one place: inside class wrapper.

How can I find mem-leak without leaving application?

Any mem-leak report can only be created at application's shutdown - you can't create it "on demand". This is not a limitation of some method or tool - just common sense.

What is a leak? Leak is when you allocate resource (memory) and don't release it. How can you find it? You need to find all un-released resources, of course. And how can you know, if this particular resource was/will be un-released? All resources, which exists at application's exit, are (definitely) un-released. If they were released - then they didn't be here in the first place.

But how can you do the same logic while program still running? You still need to know: will/was this resource be released or not. How can you know that? You can't. There is no way you may know how many references are there on this resource, and (even if they are) will there be a release call for any reference or not.

Simple example:
function GetWorkFilePath: String;
begin
  // a lot of action
end;
You want to know: does GetWorkFilePath have a leak? So, you write something like this:
// Warning: BAD code ahead

function GetWorkFilePath: String;
begin
  MemState := RememberAllAllocatedMemory;
  try
    // a lot of action
  finally
    CreateMemLeakReport(MemState);
  end;
end;
Here: RememberAllAllocatedMemory routine makes a snapshot of all memory allocations, and CreateMemLeakReport routine compares current snapshot with saved (passed as argument) one. CreateMemLeakReport creates a mem-leak report for each mis-match in these two snapshots.

Your idea is: I have a solid block of code; all memory should be released, right? So "memory before" and "memory after" should be the same - so any difference will indicate a leak.

Unfortunately, this is not that simple.

In the example above: GetWorkFilePath routine allocates and return a string. This string will be a leak (judging by your logic) - because it is still here, after function's return. But this happens not because string is a leak, but because function pass string's ownership to caller, making him responsible for string's deletion.

Even more: strings (and not only strings) have a reference counter. Which means, that you can't apply any logic, which is based on memory manager's calls match. That's because most operations with these kind of resources will not call memory manager at all, cause all work will just inc/dec reference counter! The example above could allocate and could notallocate memory for the result. The last case holds when routine just return already allocated string (some global variable, for example) or even const.

Searching for hidden leaks

As a matter of fact, when somebody asks question, like in previous point - he didn't want an answer for it. What he's really asking is: how can he find hidden mem-leaks.

What is a hidden mem-leak? Take this example: your code creates many objects during your application's run. You want trade memory for speed, so you decided to re-use created objects, putting them to some kind of cache: TObjectList with ownership for inserted objects. When you need the same object again - you don't recreate it, but just picks from this cache.

And suppose that your code have a bug: one specific kind of object is never deleted from the cache. I.e. your application is growing and growing as the time passes - those are usual "visual signs" of memory leak. In normal conditions this grow is restricted by maximum size of a cache. But since some of your objects are never deleted from the cache - cache grow is unlimited. However: this is not a strict mem-leak. That's because you didn't lose reference to these "lost" objects: they are still here, in the cache. And when application shutdown - cache will be deleted and, thus, all objects in cache (including those forgotten objects) will be deleted too. Strictly saying: you have no mem-leak.

Here is what I mean by "hidden mem-leak". And people ask different questions, trying to find these leaks.

In fact, you don't need some "mem-leak of demand" here. There is one trick for this: run your application and let it work for few hours. Let him eat as much memory as it wants to. Now: pick any random address in your process. What will you see here? A hidden-leak.

Why is that so? Suppose that real working data for your app costs 50 Mb of memory. And suppose that your process eats 60 Mb of memory. Among these 60 Mb: 50 Mb are real working data and 10 Mb - "leaked" objects in your cache. Now, let your process eats 2 Gb of memory. Among these 2 Gb: 50 Мб — are still your real data (okay, may be 70 Mb, but this doesn't matter). All other space (2 Gb - 50 Mb) is for your forgotten objects. So, if you pick any random address - you have a (2048 — 50) / 2048 = 98% chance, that you hit you forgotten object. All you need to do is get its call stack (you run your app in debug mode, right?) and proceed as with usual mem-leak. The more memory process eats - the more garbage be there, and, therefore, the higher chances will be. Even 50% chance is not bad (garbage amount is the same as actual data: 50 + 50 = 100 Mb) - you only need to check few locations, not one.

For these tasks you can use FastMM and its LogAllocatedBlocksToFile routine. May be you will also find useful other routines (like FastGetHeapStatus, GetMemoryManagerState, GetMemoryManagerUsageSummary and GetMemoryMap) — just walk through interface section of FastMM4.pas unit. And if you didn't find suitable routine - there always is an option to write a simple stub memory manager for your case.

Why mem-leaks are bad and do I always need to release all memory?

Generally speaking, often mem-leak does not mean any visible problem to a user: application still works. Mem-leaks? So what? Program still do all tasks, that I need from it. This is especially true for client applications: cause they work for limited amount of time. So mem-leak is not scary - since all memory will be reclaimed at application's exit (refer to Jeffrey Richter's book on native code for more info) and so all leaks will be removed too. No, I don't mean that you don't need to fight mem-leaks here: mem-leak is always bad. It is just that mem-leaks aren't that fatal. Of course, this is not applicable for server applications. Server applications work for long period of time, so even minor leak will be deadly.

Some other question, which is close related to above, is: "if all memory is reclaimed upon app's shutdown - can I skip cleanup for global variables? They still will be deleted by automatic cleanup from OS!"

Well, the formal answer is: "you can do it". This is correct and you really can do it. But "can" does not mean "should". Obviously, there will be no technical problems with that approach. So, why is this bad?

Because you can't find a real mem-leaks, if you do like this. If you don't pedantically clean all of your resources - you will get a bunch of mem-leak reports. Well, leaks "by design": technically it's a leak (since resource wasn't released), but it is not a logical leak. Since you know, that those aren't reports about real mem-leaks - you will ignore them. And the problem is that if there will be a report about real leak - you may just miss it.

That's why it is a common "good rule" to always clean your resources. Unfortunately, there can be cases, when you can't do that. Those are very rare cases, but it can happen. But general rule is: always clean your resources, if you can do it. Don't rely on system's cleanup to throw out garbage for you. This will greatly simplify your life in the future.

Delphi's bugs

Well, I guess we can't ignore them :)

Before starting doing anything - make sure, that the problem really exists: run your application in wild run without debugger. This will eliminate any possible false mem-leaks like this.

Aside from IDE's bugs, there can be bugs in RTL/VCL too: example. It can be direct bugs (and there is change for their fix in next Delphi version - example), or things that just can't be fixed. Anyway, both cases introduce a mem-leaks in your application and your code has nothing to do with it.

So what can you do here? Putting patching apart - the only thing you can do is to ignore them (since you can't fix them). You should use capabilities of your tool: see if it have some routine like RegisterExpectedMemoryLeak - you can call this routine at application's start and pass a pointer. When tool will build a mem-leak report it will ignore any leak, that match one of such "registered" pointers. Yes, this is a workaround. You don't fix a problem - you just hide it, so you can concentrate on problems, which you can fix. The main danger here is overuse of such routines: do NOT add all mem-leaks as "registered" - don't forget that this will not fix the problem!