ERROR SEVERITY AND DEBUGGING

by Stanley H. Kremen, CDP

 

It is extremely improbable that a large computer system has ever been created completely error free. Complex programs have higher error probabilities than simpler ones.

 

The primary purpose of a software system is ease of use. Therefore, the ratio of function to conceptual complexity is the test of good system design. Neither function nor simplicity alone defines a good system. This implies that a user can tolerate a number of errors in a software system provided that the system is functional.

 

The four most common causes of programming errors are: (1)

 

 

 

Furthermore, types of errors can be classified under the following general headings:

 

 

Often, during testing, it is discovered that many apparent errors are due to incorrect specifications. This occurs when the system analysis was not performed properly. The errors most persistent and difficult to correct are those arising from incorrect assumptions made during the analysis phase.

 

The production of a large software product can be scheduled using the following rule of thumb (excluding documentation): (2)

 

 

Clearly, testing is the largest component of a software development project. It is also the part that is most mis-scheduled. Because testing occurs towards the end of a programming project, failure to allow sufficient time for this phase results either in late delivery or in delivery of software that does not work.

 

Generally, testing proceeds in five phases: (3)

 

UNIT TEST - The programmer tests the program or module on a stand alone basis with ad hoc test data. Theoretically, testing and debugging should performed over and over again until all errors are corrected and the programmer can find no additional errors.

 

INTEGRATION TEST - After unit testing is complete, programmers working on several modules within a system test their programs at the same time with data passing from module to module. This is done to determine whether the programs work well together. Often, many errors are uncovered during this phase.

 

SYSTEM TEST - The completed software is tested by a special group of programming technicians according to a specific test plan. The objective of this phase is to insure that the software performs according to user requirements.

 

PARALLEL TEST - The user tests the software system simultaneously with another system that produces verifiable results. The two systems are compared. Discrepancies usually imply that errors exist in the new system.

 

ACCEPTANCE TEST - This is a joint user/developer test of the software according to a pre-defined set of test conditions and criteria.

 

 

Initially, the number of errors discovered during testing is very high. They decrease exponentially with time, and seem to approach some asymptotic value. Then, suddenly, after a reasonable period of use, the number of errors discovered increases, peaks and then decreases with time. This is due to the increased level of sophistication of users after they have used the system for a long while.

 

Obviously, some minor software errors can be tolerated by users if the software is functional. Severe errors cannot be tolerated.

 

Some programmers define the severity of an error by the nature of what must be done to correct it. Others define it by the amount of time it would take to correct the error. For example, some programmers consider a programming error to be minor if it can be fixed within two hours. In fact, these definitions of error severity cannot be accepted in industry. It results in the delivery of software which is of poor quality.

 

Companies performing software development must look at error severity from a user's point of view. AT&T and the BELL System have developed a very useful system for classifying errors. They use it in their computerized M.R. system. An M.R., or modification request, is entered into a computer by any individual discovering an error in any AT&T software. The severity of the error is expressed as a number between one and four, and is defined as follows:

 

1 - An error which causes a program or system interrupt or which causes program execution to abort. AT&T and BELL System personnel refer to this type of error as a "show stopper". This error has the highest severity rating.

 

2 - A severe error which causes a program not to perform properly or to produce unreliable results. Normally, the user cannot find an appropriate "workaround" for this type of error.

 

3 - An error for which, while not minor, a "workaround" solution can be found for the user.

 

4 - A minor error, a cosmetic change, or an enhancement.

 

 

It is the policy of AT&T and the BELL System that, once a severity "1" error is discovered, programmers and developers must work around-the-clock until the error is corrected. Software can never be released to customers with any severity "1" or "2" errors. Occasionally, a software release with some severity "3" errors will be permitted. However, these errors are documented along with the actions to be taken by users when encountering them. Normally, severity "4" errors would not delay a software release.

 

It is important that software developers define software error severity according to a user's point of view. Failure to do so degrades the quality of software produced by the computer industry. The user must expect and must receive good usable software. Otherwise, not only will the computer industry suffer the consequences, but so also will all American business.

 

 

NOTES

 

1. Wooldridge, Susan, Systems and Programming Standards, Petrocelli/Charter, New York, NY, 1977, ISBN: 0-88405-425-X, Library of Congress Catalog No. 72-8535

 

2. Brooks, Frederick P., Jr., The Mythical Man Month, Addison-Wesley, Philippines, 1972, 1975, Library of Congress Catalog No. 74-4714

 

3. Metzger, Phillip W., Managing A Programming Project, Prentice-Hall, Englewood Cliffs, NJ, 1973, ISBN: 0-13-550756-1, Library of Congress Catalog No. 72-8515