<--Last Chapter | Table of Contents | Next Chapter--> |
If you program is running slower than you expected, or is using more memory or disk space than you expected, you should first examine the approach you used in your Ada source code. Can you use better data structures, or implement faster algorithms? For example, a bubble sort is an easy way to sort relatively small amounts of data, but a quick sort is faster on thousands or millions of pieces of data.
In large programs, the subprogram causing the biggest bottlenecks may not be obvious. Experimenting with different test data and timing the results can often narrow down the problem areas. You could also try the gprof profiling tool, which will give you statistics on your program performance and will show that you are on the right track. Why spend hours or days improving a section of your program that isn't causing the problem? This is especially important in a business environment: focus your time on the sections that will give the greatest improvements.
Some optimizations can be done automatically by the Gnat compiler. There are both compiler switches and language pragmas for fine tuning your programs.
The -O switch tells the compiler how much time it should spend optimizing the program:
When using floating point numbers, you may experience rounding errors if you don't use the -ffloat-store switch as discussed in 8.5.
Inlining is also affected by two other switches:
These switches both require a -O switch for inlining to take effect.
The -gnatp switch turns off all non-essential error checking such as constraint and range checks. This is the same as usingpragma Suppress( All_Checks )
on every file in the
entire program, making the program smaller and faster.
There are some other gcc optimization switches which can sometimes be used:
-ffast-math - gcc will ignore certain ANSI & IEEE math rules. For example, it will not check for negative numbers before invoking the sqrt function. This improves math performance but can cause side- effects for libraries expecting the ANSI/IEEE rules to be honoured.
-fomit-frame-pointer - gcc will free the register usually dedicated to hold the stack frame pointer. This improves performance, but makes debugging difficult--many debugging utilities require the frame pointer.
IDE: TIA sets the proper switches for you based on your selections in the project parameters window. |
Ada Package | Description | C Equivalent |
pragma Pack( Aggregate ); |
Use minimum space for the aggregate. | - |
pragma Optimize( Space / Time / Off ); |
How you want your statements optimized. | - |
pragma Inline( Subprogram ); |
Inline the subprogram | inline |
pragma Inline_Always( Subprogram ); |
Inline the subprogram | - |
pragma Discard_Names( type ); |
Don't include ASCII identifiers in executable. | - |
Pragma Pack compresses an array, record or tagged record so that it uses the minimum space possible. For example, a packed boolean array takes up one bit for each boolean. Pack only packs the aggregate, not any aggregate items that might make up the aggregate: if you have an array of records, you'll need to both pack the array and the records to use the minimum space possible. Packing aggregates usually slows down the execution of your program.
type CustomerProfile is record Preferred : boolean; PreordersAllowed : boolean; SalesToDate : float; end record; pragma Pack( CustomerProfile );Gnat can perform close packing, that is, packing right down to individual bits, for array elements or records of 64 bits or smaller.
Pragma Optimize specifies how you want your statements to be optimized: to run as fast as possible (time), to be as small as possible (space), or no optimization at all. Optimize does not affect data structures.
pragma Optimize ( space ); package body AccountsPayable isPragma Inline makes Ada inline the subprogram whenever possible. That is, it physically inserts the subprogram whenever it's named instead of calling it in order the make your program run faster. This uses up a lot of space and is only practical for small procedures and functions.
procedure Increment( x : integer ) is begin x := x + 1; end Increment; pragma Inline( Increment );
Compiling switch -O3 must be used or pragma inline is ignored. -O3 will also automatically inline short subprograms for you.
Pragma Inline_Always forces inlining between packages (like -gnatn) regardless of whether or not -gnatn or -gnatN has been used.
Pragma Discard_Names frees up space by discarding the ASCII images (names) of identifiers. For example, if you have a big enumerated type, Ada normally maintains strings for the names of each of the enumerated items in case you want to use the 'img attribute. You can discard these names if you never intend to use 'img.
type aDogBreed is (Unknown, Boxer, Shepherd, MixedBreed ); pragma Discard_Names( aDogBreed );
When you discard names, the 'img is still available. Instead of returning the enumerated value's image, 'img returns the position of the enumerated type (for example, 0, 1, 2 and so forth).
Fun Fact: The ASCII images of your variable names are stored as C strings at the end of your executable file. You can view them using the less (or strings) shell command. |
-mno-486 - optimize for 80386.
-m486 - optimize for 80486. These programs will still run on
a 80386.
Future versions of Gnat built for GCC 3.x or later will
probably support:
|
There are currently no switches newer CPUs such as Pentiums. Under GCC 2.8.1 (and Gnat), the GCC FAQ recommends the following switches for reasonable Pentium performance: "-m486 -malign-loops=2 -malign-jumps=2 -malign-functions=2 -fno-strength-reduce".
There are other switches that may or may not be helpful, depending on your program: read the gcc FAQ for full details.
IDE: TIA sets the proper switches for you based on your selections in the project parameters window. |
Let's put all these flags together. Suppose you are trying to
develop a program for the Intel Pentium CPU with an emphasis on
speed. During development, the Gnatmake switches would be "-O1"
since this setting suppresses pragma optimize warnings. For the
final release, the Gnatmake switches should be "-m486 -O3
-malign-loops=2 -malign-jumps=2 -malign-functions=2
-fno-strength-reduce -gnatp" for maximum performance on a Pentium
processor.
In the previous sections, we saw GCC compiler switches and and Ada pragmas that affect the speed and size of your finished application. But how much of a difference does optimization make? And are there any problems caused by optimization?
The optimization switches and pragmas affect different applications differently. Some will give better results to certain kinds of applications, while others may actually have a negative effect. The following table summarizes the results of optimizing on the Hartstone Ada benchmark program. Hartstone is a multithreading mathematics test available freely on the Internet http://ftp.sunet.se/pub4/benchmark/hartstone/.
Table: Hartstone 1.1 Benchmark Summary
Gnat switches | Ada pragmas | CPU Time | File Size | Task Set Util |
-gnatE -gnato -g | - | 0.13s | 294265 | 0.41% |
-gnatE -gnato | - | 0.13s | 147433 | 0.41% |
-gnatE | - | 0.10s | 138679 | 0.32% |
no switches | - | 0.10s | 138679 | 0.29% |
-O | - | 0.07s | 113076 | 0.22% |
-O2 | - | 0.07s | 113324 | 0.22% |
-O3 | - | 0.07s | 118790 | 0.20% |
-O3 -gnatp | - | 0.08s | 104290 | 0.37% |
-O3 -gnatp Pent | - | 0.08s | 105042 | 0.15% |
Max | - | 0.05s | 105714 | 0.15% |
Max | Optimize( Space ) | 0.05s | 105714 | 0.15% |
Max | Optimize( Time ) | 0.05s | 105714 | 0.15% |
Max | Pack arrays | 0.11s | 105712 | 0.15% |
Pent - GCC Pentium optimization switches
Max - Pent + -ffast-math + -fomit-frame-pointer
This test was conducted with a Pentium II 350, 64 Megs RAM
and ALT Gnat 3.12p-9. As they say, your milage many vary
(and probably will).
By optimizing the application, Hartstone can be reduced to half its size and run about 2/3 faster than using no optimization. However, if we pack the arrays in Hartstone, we save two bytes but lose all the improvements in speed. Sometimes smaller programs are not faster.
Let's try optimizing a convoluted program that uses integers, arrays, functions and mathematics and see what effect the optimization techniques have.
procedure bench is --Simple benchmark program to test optimization pragma optimize( time ); type bench_integer is new long_integer range long_integer'range; type small_integer is new long_integer range 0..9; function p( param : bench_integer ) return bench_integer is divideby : constant bench_integer := 4; begin return param / divideby; end p; pragma inline( p ); j : bench_integer := bench_integer'last; -- deliberate error in main program for j * 2 type atype is array(0..9) of small_integer; --pragma pack( atype); a : atype; begin for i in 1..100_000_000 loop j := abs( p( bench_integer( i ) ) - (j * 2) ); a( integer( j mod 10 ) ) := small_integer( j mod bench_integer( small_integer'last) ); end loop; end bench;
Notice that j is assigned the largest bench_integer possible. This will force an overflow error the first time around the for loop, when j is multiplied by two. The following chart shows the effect of the different switches and pragmas, and indicates when gnat caught the overflow error. The test was conducted on a Pentium II 350 with 64 Megs of RAM using the gnat 3.11 NYU binaries and was timed with the time command.
Gnatmake Switches | Pragmas | CPU Time | Size | Error Caught? |
gnatmake -gnato -gnatE | - | - | 118162 | YES |
gnatmake -gnato | - | - | 118162 | YES |
gnatmake -gnatE | - | 40.3 s | 118162 | No |
gnatmake | - | 40.3 s | 117634 | No |
gnatmake -O | - | 10.8 s | 117426 | No |
gnatmake -O2 | - | 10.8 s | 117426 | No |
gnatmake -O3 | - | 10.8 s | 117426 | No |
gnatmake -O3 -gnatp | - | 9.6 s | 117410 | No |
gnatmake -O3 -gnatp Pent | - | 9.6 s | 117410 | No |
gnatmake -O3 -gnatp Pent | Optimize( Space ) | 9.6 s | 117410 | No |
gnatmake -O3 -gnatp Pent | Optimize( Time ) | 9.6 s | 117410 | No |
gnatmake -O3 -gnatp Pent | Pack atype | 4.4 s | 117326 | No |
We can compare the results to the equivalent C program:
int p( int param ) { return param / 4; } int i; int j = 2147483647; int a[10]; int main() { for (i=1; i<=100000000; i++) { j = abs( p(i)-(j*2)); a[ j%10 ] = j%10; } return 0; }
GCC Switches | Pragmas | CPU Time | Size | Error Caught? |
gcc -Wall | - | 12.8 s | 24541 | No |
gcc -O3 Pent | - | 8.6 s | 24541 | No |
In this case, notice that C never detected the overflow error. Secondly, notice that the Ada program ran twice as fast as the C program.
In theory, an Ada compiler can take advantage of the typing information and the optimization hints provided by the pragmas. The C compiler has less information and this can hinders the optimization process. (I've never investigated whether or not Gnat does this or how much of an effect it has.)
The optimization techniques will affect different programs differently. You need to chose the best approach for your particular project.
The Linux assembler is called gas (the GNU assembler). Like GNAT and C++, gas works through gcc. To assemble an assembly language source file, simply run gcc. The compiler will recognize the assembly language file and will assemble it using gas.
If you want to view the assembly source code of your Ada program, use the "-c -S -fverbose-asm" options when compiling. GNAT will create a file with a ".s" suffix containing the assembly source. You can view it, or even edit it and assemble afterwards. Improving the instructions produced by the compiler and then assembling afterwards is known as hand optimizing. This technique is typically used for high performance applications such as games, where the programmer needs to get the maximum performance from the hardware.
The following is the stderr.s file for the stderr.adb program described elsewhere in this document.
.file"stderr.adb"
See Chapter 19 for a discussion of embedding assembly language into an Ada program.
<--Last Chapter | Table of Contents | Next Chapter--> |