EurekaLog-enabled application starts up x2 times slower on Windows 2016

We were contacted by a customer who claimed adding EurekaLog to his application increased application startup time by a factor of 2 - but only on PCs running Windows 2016. In other words, application startup takes about 6 seconds on Windows 2016, while taking only 3 seconds on any other OS.

The customer used a Process Monitor tool to observe that EurekaLog-enabled application creates a lot of *.tmp files. He wondered why that is, and if it could be a source of the issue.

Generally speaking, using a file in Windows is not slower than using a memory: that is because file operations are cached in memory. Writing, say, 1 Mb to a file does not mean your HDD will spin up to write 1 Mb of data, and your code continues only after the HDD finished his work. If you write some data into a file - it will be written in a memory buffer and dumped to a hard disk in background. In other words, if your application runs slower when creating/using a file - it is not because it is using a file, but because data size for the file is too large to fit in memory. So, if you replace a file with memory - the end effect would be the same: memory would be paged to disk because there is not enough RAM.

(There is an additional aspect when working with files: if you read/write very small chunks of data, so you call a lot of kernel's file functions. Calls to the kernel are slow - that is what may be causing the performance loss: it is about kernel calls, not about using a file. External factors may also play a role: for example, anti-virus could affect your file operations.)

EurekaLog uses temp files to offload large chunks of data from your address space so your application would have more free address space to run. For example, there is a lot of DLLs loaded into your process. EurekaLog has to provide debug information for each DLL in order to build reports including these DLLs. Naturally, this information has to be stored somewhere in a ready-to-use form. If we would store this info in memory, your application would have much less memory to run, as debug information tends to be very large. That is why EurekaLog creates debug information in temp files. However, this has nothing to do with performace. If we switch from files to memory - the end performance would be almost the same, but your own code would have less memory to execute.

We have the guide on troubleshooting performance issues in EurekaLog, so we naturally asked the customer to walk through this guide to check if startup on Windows 2016 is somehow different from startups on other OS - for example, perhaps there are additional exceptions being thrown when run on Windows 2016. However, the customer reported the guide did not help.

We asked the customer to capture a run-time log from EurekaLog for the "slow" PC and the "fast" PC, so we can compare two runs together. Basically, it means: compile your app with debug version of precompiled files (*.dcu) - which is already a default on modern IDEs. And pass the --el_debug command line switch. An el_debug.csl file will be created in app's folder. The file will contain everything EurekaLog is doing in the app. See the above mentioned article for more details on how to enable and use the run-time logging.

The produced *.csl files are compatible with the CodeSite file format, so you can use the CodeSite File Viewer tool - available as part of the freeware CodeSite Tools package (scroll to the bottom to find download link for the tools). Now you can open both *.csl files. This may take a while, since the CodeSite log format is mostly textual, so the CodeSide File Viewer will need some time to parse it. Typical logs from EurekaLog on application startup start from 2 Mb and are usually up to 10 Mb - depending on what your application is doing on startup. Opening a 10 Mb text file in the CodeSize File Viewer tool could easily take a few minutes.

Once the log is opened - you can use the "View" / "Select Columns" menu item to display Time Offsets:

Configuring the "Time" column

Since any Windows application runs multiple threads (even if your own code does not spawn any threads - the system will create few threads), it is a good idea to organize messages into individual threads, so you can inspect the main thread isolated from background threads:

Organizing log messages

Organizing messages could also take some time, be patient. Once messages are organized by threads - switch to the main thread tab:

Threads tabs will appear after organizing messages

Finally, we need to collapse all functions to make it easier to navigate the log. You can right-click on any log message inside log's view and select the "Collapse all" from the pop-up menu:

Collapsing all functions

This could take some time too.

Now we are ready to analyze the logs. Let's compare two startups:

Startup on the "Fast" PC

Startup on the "Slow" PC

As you can see: EurekaLog completed the startup in about 5 seconds on the "Fast" PC, and in about 11 seconds on the "Slow" PC. These values are slower than values reported by the customer, but it is because logging itself takes some time. Nevertheless you still can observe x2 time difference.

Let's open (expand) the "EurekaLog.Initialization" function to see what exactly it is doing:

Startup on the "Fast" PC

Startup on the "Slow" PC

Now we see something interesting: the "Slow" PC is actually running sligtly faster than the "Fast" PC! Notice how everything is completed in 0.343 seconds on the "Slow" PC compared to the 0.575 seconds on the "Fast" PC. However, there is still something on the "Slow" PC that causes the major delay. We need to dig deeper.

We would need to expand the "ExceptionLog7.Init" function, and then open the "EurekaLogInitialization" function, etc. We open a lot of functions until we arrive at the bottleneck code:

No bottleneck on the "Fast" PC

Bottleneck on the "Slow" PC

And again you can see the "Slow" PC is sligntly faster than the "Fast" PC, but not when inside the bottleneck code. The bottleneck code is the TELDebugInfoExports.GetExportList function from the EDebugExportBase unit. It takes about 0.5 seconds on the "Fast" PC and almost 4 seconds on the "Slow" PC - causing a major x8 times difference.

As you can see: this code loads debug information from the rtl290.bpl package. In this case: EurekaLog creates new debug information from the DLL (BPL) exports table since this package does not have any other debug information. If you scroll further - you will find a similar picture for the vcl290.bpl package. There will be more BPLs and DLLs listed, but times for these BPLs/DLLs will be similar.

The EDebugExportBase.TELDebugInfoExports.GetExportList function is essentialy a copy cycle:

  Count := ExportDir.NumberOfNames;
  SetLength(FNames.FNames, Count);
  for I := 0 to Count - 1 do
  begin
    Address := PDWORD(PtrUInt(Functions) + NameOrdinals^ * SizeOf(DWORD))^;
    UTF8Name := PAnsiChar(ABorImage.RvaToVa(Names^));
    if not TryUTF8ToString(UTF8Name, ExportName) then
      ExportName := String(UTF8Name);
    FNames.Add(ConvertAddress(Pointer(PAddress(Module) + Address)), ExportName);
    Inc(NameOrdinals);
    Inc(Names);
  end;

This cycle copies function names from the BPL/DLL exports table into a temp file buffer.

Well, assuming the rtl290.bpl package is the same on both "Slow" and "Fast" PC - it is not obvious how this code could produce such a large difference (almost x8 times). We asked the customer to rule out any possible external factors: such as anti-virus, but the customer reported he did not find anything.

One option is to pursue the issue further: launch a proper performance profiler to see what is taking so long in the code above. But there is another way: this bottleneck code is called because a package does not have a prepared debug information, so EurekaLog has to create one (from exports table). As you know: the fastest code is the one that is never called. The way to never call this code is to supply debug information for the package. Thankfully, modern IDEs come with *.jdbg files for each *.bpl file. *.jdbg file contains debug information in the JCL (JEDI) format. EurekaLog is able to read this format if you enable the corresponding debug information provider in EurekaLog's project options.

Once the customer deployed *.jdbg files to Windows 2016 PC - the performance problem disappeared. EurekaLog no longer needs to analyze the package and can use prepared debug information from *.jdbg files.

08 July, 2025

EurekaLog-enabled application starts up x2 times slower on Windows 2016