8.10 Problem Solving: Searching and Sorting an Array

In this section, we will discuss two common problems in processing arrays: searching an array to determine the location of a desired value and sorting an array to rearrange the array elements in sequence. As an example of an array search, we might wish to search the array of exam scores read in by procedure ReadScores (see Section 8.8, Program 8.12) to determine which student, if any, got a particular score. An example of an array sort would be rearranging the array elements so that they are in increasing (or decreasing) order by score.

Searching an Array

We repeat the type definitions used in Section 8.8 for convenience here:

    MaxSize : CONSTANT Positive := 250;
    MaxScore : CONSTANT Positive := 100;
    SUBTYPE ClassIndex  IS Positive RANGE 1..MaxSize;
    SUBTYPE ClassRange  IS Natural RANGE 0..MaxSize;
    SUBTYPE ScoreRange  IS Natural RANGE 0..MaxScore;
    TYPE ScoreArray IS ARRAY (ClassIndex) OF ScoreRange;
    
    Scores : ScoreArray;
    ClassSize : ClassRange;

We can search an array for a particular score (called the search target) by examining each array element and testing to see whether it matches the target score. If a match occurs, we have found the target and can return its subscript as the search result. If a match does not occur, we should continue searching until either we get a match or we test all array elements. The input data requirements for a search function are:

    Scores : ScoreArray      -- the array to be searched
    ClassSize : ClassRange   -- the number of elements in Scores
    Target : ScoreRange      -- the score being searched for

The function result is the subscript of the first element containing Target, or zero if Target was not found. The algorithm for searching the array is:

Algorithm for Search

1. Start with the first array element.

2. WHILE the current element does not match the target

AND the current element is not the last element LOOP

3. Advance to the next element.

END LOOP;

4. IF the current element matches the target THEN

5. Return its subscript.

ELSE

6. Return zero.

END IF;

The WHILE loop in step 2 compares each array element to the target. Loop exit occurs if there is a match or if the element being tested is the last element. After loop exit, the IF statement defines the function result by repeating the last comparison. Program 8.14 shows function Search. The function uses a local variable CurrentScore to control the WHILE loop.

Program 8.14
Function Search

FUNCTION Search (Scores: ScoreArray; ClassSize: ClassRange; 
                 Target: ScoreRange) RETURN ClassRange IS
   -- Searches for Target in array Scores
   -- Pre : ClassSize and subarray Scores(1..ClassSize) are defined
   -- Post: Returns the subscript of Target if found;
   --    otherwise, returns 0

   CurrentScore: ClassIndex;       -- array subscript   


BEGIN  -- Search   

  -- Compare each value in Scores to Target until done  
  CurrentScore := 1;            -- Start with the first record   
  WHILE (Scores(CurrentScore) /= Target) 
  AND (CurrentScore <= ClassSize) LOOP
  -- invariant:
  -- CurrentScore <= ClassSize + 1 and
  -- no prior array element was Target

      CurrentScore := CurrentScore + 1; -- advance to next score

  END LOOP;
  -- assertion: Target is found or last element is reached.    

  -- Define the function result.   
  IF Scores(CurrentScore) = Target THEN
    RETURN CurrentScore;
  ELSE
    RETURN 0;
  END IF;

END Search;

The WHILE loop condition compares the array element selected by CurrentScore to the target. If they are equal, loop exit occurs. If they are unequal and the last element has not been reached, CurrentScore advances to the next array element.

Loop exit occurs when the target is found or the last element is reached. After loop exit, the IF statement returns the subscript (CurrentScore) of the current element if Target was found; otherwise, it returns zero.

Sorting an Array

In Section 6.6 we discussed a simple sort operation involving three numbers ( Program 6.9). We performed the sort by examining pairs of numbers and exchanging them if they were out of order. There are many times when we would like to sort the elements in an array, for example, to display a grade report in alphabetical order or in order by score.

This section discusses a fairly intuitive (but not very fast) algorithm called the selection sort. To perform a selection sort of an array with N elements (subscripts 1..N), we locate the largest element in the array and then switch the largest element with the element at subscript N, thereby placing the largest element at position N. Then we locate the largest element remaining in the subarray with subscripts 1..N-1, and switch it with the element at subscript N-1, thereby placing the second largest element at position N-1. Then we locate the largest element remaining in subarray 1..N-2 and switch it with the element at subscript N-2, and so on.

Figure 8.11 traces the operation of the selection sort algorithm. The diagram on the left shows the original array. Each subsequent diagram shows the array after the next largest element is moved to its final position in the array. The subarray in the darker color represents the portion of the array that is sorted after each exchange occurs. Note that it will require, at most, N-1 exchanges to place the largest element of an array with N elements. The algorithm follows.

Figure 8.11 Trace of Selection Sort

Figure 8.11

Selection Sort Algorithm

1. FOR PositionToFill IN 1..N-1 LOOP

2. Find the largest element in subarray PositionToFill+1..N.

3. IF the largest element is not at subscript PositionToFill THEN

Switch the largest element with the one at subscript PositionToFill.

END IF;

END LOOP;

The refinement of step 2 also contains a FOR loop and is shown next.

Step 2 Refinement

2.1. Save PositionToFill as the position of the largest so far in the subarray

2.2. FOR ItemToCompare IN PositionToFill+1..N LOOP

2.3. IF the element at ItemToCompare is bigger than largest so far

THEN

Save ItemToCompare as the position of the largest so far.

END IF;

END LOOP;

Procedure SelectSort in Program 8.15 implements the selection sort algorithm.

Program 8.15
Selection Sort Procedure

PROCEDURE SelectSort
  (Scores: IN OUT ScoreArray; ClassSize: IN ClassRange) IS
  -- Pre:  Scores and ClassSize are defined
  -- Post: The first ClassSize elements of Scores 
  --   are sorted in descending order

  IndexOfMax: ClassRange;

BEGIN

  FOR PositionToFill IN 1..ClassSize - 1 LOOP

    -- Find the element in subarray PositionToFill..ClassSize
    -- with largest Score
    IndexOfMax := PositionToFill;
    FOR ItemToCompare IN PositionToFill .. ClassSize LOOP
      IF Scores(ItemToCompare) > Scores(IndexOfMax) THEN
        IndexOfMax := ItemToCompare;
      END IF;
    END LOOP;
    -- assert: element at IndexOfMax is largest in subarray

    IF IndexOfMax /= PositionToFill THEN
      Exchange(Scores(PositionToFill),Scores(IndexOfMax));
    END IF;

  END LOOP;

END SelectSort;

Local variable IndexOfMax holds the location of the largest exam score found so far in the current subarray. After each execution of the inner FOR loop, procedure Switch is called to exchange the elements with subscripts IndexOfMax and PositionToFill, provided that the element at PositionToFill is not the next largest element. After the execution of SelectSort, the elements of the array Scores are in decreasing order.

Analysis of Search and Sort: Big-O Notation

There are many algorithms for searching and sorting arrays. Because arrays can have a very large number of elements, the time required to process all the elements of an array can become significant, so it is important to have some idea of the relative efficiency of different algorithms. It is difficult to get a precise measure of the performance of an algorithm or program. For this reason, we normally try to approximate the effect on an algorithm of a change in the number of items, N, that it processes. In this way, we can see how an algorithm's execution time increases with N, so we can compare two algorithms by examining their growth rates.

For example, if we determine that the expression

2N² + N - 5

expresses the relationship between processing time and N, we say that the algorithm is an O(N²) algorithm where O is an abbreviation for Order of Magnitude. (This notation is called Big-O Notation.) The reason that this is an O(N²) algorithm instead of an O(2N²) algorithm or an O(N² + N - 5) is that we are interested in only the fastest-growing term (the one with the largest exponent) and we ignore constants.

To search an array of N elements for a target, we have to examine all N elements when the target is not present in the array. If the target is in the array, we only have to search until we find it. However, it could be anywhere in the array, and it is equally likely to be at the beginning of the array as at the end. So on average, we have to examine N/2 array elements to locate a target value that is in an array. This means that an array search is an O(N) process, so the growth rate is linear.

To determine the growth rate of a sorting algorithm, we normally focus on the number of array element comparisons that it requires. To perform a selection sort on an array with N elements requires N - 1 comparisons during the first pass through the array, N - 2 comparisons during the second pass, and so on. Therefore, the total number of comparisons is represented by the series

1 + 2 + 3 + ... + (N - 2) + (N - 1)

The value of this series is expressed in closed form as

 N × ( N - 1)    N²   N
------------- = --- - -
     2           2    2

Therefore, selection sort is an O(N²) process and the growth rate is quadratic (proportional to the square of the number of elements).

What difference does it make whether an algorithm is an O(N) process or an O(N²) process? Table 8.5 evaluates N and N² for different values of N. A doubling of N causes N² to increase by a factor of 4. Since N increases much more slowly with N, the performance of an O(N) algorithm is not as adversely affected by an increase in N as is an O(N²) algorithm.

Table 8.5
Table of Values of N and N²

N N²

Since in this book we will be using relatively small arrays in our programming, algorithm performance is not a major concern. Analyzing the performance of algorithms is, however, an important subject about which you will study a great deal as you progress in your education, because knowing how to compute the expected Big O of an algorithm can, for large arrays and other data structures, make the difference between writing a program whose running time is acceptable and one that may run for weeks or months.

Case Study: Sorting an Array of Records

Our study of arrays began with a statement that the element type of an array can be any type, including a structured type like a record. In this section we consider how to sort an array of records.

Declaring an Array of Records

We begin with a set of declarations similar to the ones in the previous section. The new declarations define a subtype StudentName as a string of (exactly) 20 characters, a type ScoreRecord as a record containing a student's name and a test score, and a type ScoreArray as an array of these records.

    MaxSize : CONSTANT Positive := 250;
    MaxScore : CONSTANT Positive := 100;

    SUBTYPE StudentName IS String(1..20);
    SUBTYPE ClassIndex  IS Positive RANGE 1..MaxSize;
    SUBTYPE ClassRange  IS Natural RANGE 0..MaxSize;
    SUBTYPE ScoreRange  IS Natural RANGE 0..MaxScore;

    TYPE ScoreRecord IS RECORD
    	Name:  StudentName;
    	Score: ScoreRange;
    END RECORD;

    TYPE ScoreArray IS ARRAY (ClassIndex) OF ScoreRecord;

    Scores : ScoreArray;
    ClassSize : ClassRange;

These declarations mean that each element of Scores is a record with two fields. We can store values in an element of the array by combining subscripting with field selection:

    Scores(27).Name  := "Jones, Mary         ";
    Scores(27).Score := 79;

Note that the string representing the name must be exactly 20 characters long, and therefore we have included the extra blanks as required. Figure 8.12 shows a diagram of this array structure, with the first three records occupied.

Figure 8.12
An Array of Records

Figure 8.12

Reading Records from a File

In fact, arrays of records are more often filled by reading fields from an external file. Program 8.16 is a revised version of Program 8.15. The procedure GetRecords assumes that a file F of type Ada.Text_IO.File_Type has been created, either by another Ada program or by a human using a text editor. In this file, each line consists of a student name and a score. In creating such a file, one must be careful to provide exactly 20 characters in the student name. Assuming that the person who created this file was careful and that all the data in the file is valid, the two fields in each student record are read from the file by

    Ada.Text_IO.Get(File => F, Item => Scores(WhichStudent).Name);
    Ada.Integer_Text_IO.Get(File => F, Item => Scores(WhichStudent).Score);

This procedure is similar to Program 5.5 in its use of file operations from Ada.Text_IO.

Sorting the File of Records

Given the SelectSort procedure from Program 8.15, sorting the file of records is easy. Instead of using the "whole" array element as the key, or basis of comparison, the score field of each array element is used. In the sort procedure itself, only a single line needs to be modified: the line which compares two array elements needs to be changed to

    IF Scores(ItemToCompare).Score > Scores(IndexOfMax).Score THEN

After the execution of procedure SelectSort, the student records will be ordered by exam score (record with smallest score first).

The revised version of SelectSort appears as a procedure in Program 8.16, with the necessary changes made to accommodate the fact that we are using a file of score records instead of an array of scores.

Program 8.16
Sorting a File of Test Score Records

WITH Ada.Text_IO;
WITH Ada.Integer_Text_IO;
WITH Simple_Dates;
PROCEDURE Sort_Score_File IS
------------------------------------------------------------------------
--| Sorts and displays an array of test score records
--| The records are read from a file scorfile.dat
--| Author: Michael B. Feldman, The George Washington University 
--| Last Modified: July 1995                                     
------------------------------------------------------------------------

  MaxSize : CONSTANT Positive := 250;
  MaxScore : CONSTANT Positive := 100;

  SUBTYPE StudentName IS String(1..20);
  SUBTYPE ClassIndex IS Positive RANGE 1..MaxSize;
  SUBTYPE ClassRange IS Natural  RANGE 0..MaxSize;
  SUBTYPE ScoreRange IS Natural  RANGE 0..MaxScore;

  TYPE ScoreRecord IS RECORD
     Name:  StudentName;
     Score: ScoreRange;
  END RECORD;

  TYPE ScoreArray IS ARRAY (ClassIndex) OF ScoreRecord;

  Scores : ScoreArray;
  ClassSize : ClassRange;

  PROCEDURE GetRecords
    (Scores: OUT ScoreArray; ClassSize: OUT ClassRange) IS
  -- Pre:  None
  -- Post: Scores contains records read from file; ClassSize
  --   indicates the number of records read
    
    TestScores: Ada.Text_IO.File_Type; -- variable naming the input file
    TempSize:   ClassRange;
    TempRecord: ScoreRecord;

  BEGIN  -- GetRecords 

    -- Open the file and associate it with the file variable name
    Ada.Text_IO.Open
      (File => TestScores, Mode => Ada.Text_IO.In_File, 
       Name => "scorfile.dat");

    -- Read each data item
    -- and store it in the appropriate element of Scores

    TempSize := 0;                         -- initial class size   
    -- Read each array element until done.   
    WHILE (NOT Ada.Text_IO.End_Of_File(TestScores)) AND 
          (TempSize < MaxSize) LOOP
      -- invariant:
      --   Records remain in the file and
      --   TempSize <= MaxSize
        
      Ada.Text_IO.Get(File => TestScores, Item => TempRecord.Name);
      Ada.Integer_Text_IO.Get
        (File => TestScores, Item => TempRecord.Score);

      TempSize := TempSize + 1;
      Scores(TempSize) := TempRecord;     -- Save the score   

    END LOOP;      
    -- assert:
    --   End of file reached or
    --   TempSize is MaxSize
      
    IF TempSize = MaxSize THEN
      Ada.Text_IO.Put(Item => "Array is filled.");  
      Ada.Text_IO.New_Line;
    END IF;

    ClassSize := TempSize;

  END GetRecords;

  PROCEDURE Exchange(Student1, Student2: IN OUT ScoreRecord) IS
  -- Pre:  Student1 and Student2 are defined
  -- Post: Student1 and Student2 are interchanged

    TempRecord: ScoreRecord;

  BEGIN

    TempRecord := Student1;
    Student1 :=   Student2;
    Student2 :=   TempRecord;

  END Exchange;

  PROCEDURE SelectSort
    (Scores: IN OUT ScoreArray; ClassSize: IN ClassRange) IS
  -- Pre:  Scores and ClassSize are defined
  -- Post: The records in the first ClassSize elements of Scores 
  --   are sorted in descending order by score

    IndexOfMax: ClassRange;

  BEGIN

    FOR PositionToFill IN 1..ClassSize - 1 LOOP

      -- Find the element in subarray 1..PositionToFill 
      -- with largest Score
      IndexOfMax := PositionToFill;
      FOR ItemToCompare IN PositionToFill + 1 .. ClassSize LOOP
        IF Scores(ItemToCompare).Score > Scores(IndexOfMax).Score THEN
          IndexOfMax := ItemToCompare;
        END IF;
      END LOOP;
      -- assert: element at IndexOfMax is largest in subarry

      IF IndexOfMax /= PositionToFill THEN
        Exchange(Scores(PositionToFill),Scores(IndexOfMax));
      END IF;

    END LOOP;

  END SelectSort;

  PROCEDURE DisplayScores
    (Scores: IN ScoreArray; ClassSize: IN ClassRange) IS
  -- Pre:  Scores and ClassSize are defined
  -- Post: dislays the first ClassSize records in Scores

  BEGIN

    FOR I IN 1..ClassSize LOOP
      Ada.Integer_Text_IO.Put(Item => I, Width => 3);
      Ada.Text_IO.Put(Item => "  ");
      Ada.Text_IO.Put(Item => Scores(I).Name);
      Ada.Integer_Text_IO.Put(Item => Scores(I).Score, Width => 4);
      Ada.Text_IO.New_Line;
    END LOOP;

  END DisplayScores;

BEGIN -- Sort_Score_File

  Ada.Text_IO.Put(Item => "Today is ");
  Simple_Dates.Put(Item => Simple_Dates.Today);
  Ada.Text_IO.New_Line;

  GetRecords(Scores => Scores, ClassSize => ClassSize);
  Ada.Text_IO.Put(Item => "Original Test File:");
  Ada.Text_IO.New_Line;
  Ada.Text_IO.New_Line;
  DisplayScores(Scores => Scores, ClassSize => ClassSize);

  SelectSort(Scores => Scores, ClassSize => ClassSize);
  Ada.Text_IO.New_Line;
  Ada.Text_IO.Put(Item => "Sorted Test File:");
  Ada.Text_IO.New_Line;
  Ada.Text_IO.New_Line;
  DisplayScores(Scores => Scores, ClassSize => ClassSize);
  Ada.Text_IO.New_Line;

END Sort_Score_File;

Sample Run

Today is AUG 24 1995
Original Test File:

  1  Jones, Mary           75
  2  Hubbard, Kathy        99
  3  Andersen, Lars        80
  4  Gribben, George       21
  5  Rogers, Roy           34
  6  Evans, Dale           76
  7  VanDoren, Charles    100

Sorted Test File:

  1  VanDoren, Charles    100
  2  Hubbard, Kathy        99
  3  Andersen, Lars        80
  4  Evans, Dale           76
  5  Jones, Mary           75
  6  Rogers, Roy           34
  7  Gribben, George       21

Exercises for Section 8.10

Self-Check

What happens in function Search if the last student score matches the target? Why can't we use the condition CurScore <= ClassSize in the WHILE condition?
Another technique for searching an array is to introduce a program flag, say, Found, that is initially False and is set to True inside a search loop if the target value is found. Loop repetition continues as long as Found is still False and all elements have not been tested. After loop exit, the value of Found determines whether the current subscript or zero is returned as the function result. Write the procedure body.
Trace the execution of the selection sort on the list below. Show the array after each exchange occurs. How many exchanges are required? How many comparisons?
```
10 55 34 56 76 5
```
How could you get the scores in descending order (largest score first)? What changes would be needed to sort the array Class by student name instead of score?
When looking for the largest element in subarray Scores(1..i-1), explain why the program will be a bit faster if we start the search with the last element (i.e., IndexOfMax := i-1) rather than the first element.

Programming

Write a procedure to count the number of students with a passing grade on the exam (60 or higher).
Another method of performing the selection sort is to place the smallest value in position 1, the next smallest in position 2, and so on. Write this version.
Modify Program 8.16 so that the array is sorted by the students' names instead of by the test scores.
Combine procedures SelectSort and ReadScores, and function Search, into a program to read a set of scores from the terminal, sort the array, and then determine whether any student got a score of 75 on the test.