SoftRel: The Simulation Technique
Overview
The software reliability process simulator SoftRel does not assume
staff, resources, or schedule models, but provides for quintuple
inputs for them. The simulator also captures the effects of
interrelationships among activities, and characterizes all events
as piecewise-Poisson Markov processes with explicitly defined event
rate functions, as explained in the Chapter 16 of this Handbook.
The set of adjustable parameters input to SoftRel is called the "model";
the set of event status monitors that describes the evolving process at any
given time is called the set of "facts". The "model" and "facts"
structures are defined so as to accommodate multiple categories of
classes of events in the subprocesses of the overall reliability process,
with each "model"-"facts" pair representing a separate class of events.
Because of the usual assumption that event processes are independent,
the same simulation technique could be applied simultaneously by using
separate computer processors running the same algorithms for each class.
If only a single processor were to be used, the same algorithms could
be applied to each class separately, but interleaved in time, or else they
could be run entirely separately. In entirely separate executions,
the sets of results would be merged later into a proper time sequence.
For simplicity, in its initial form, the simulator reported here only
accommodates a single category of events for each of the reliability
subprocesses. Separate runs using different "model" parameters can be
later merged to simulate performance of a single process that has multiple
failure categories, if desired. Extension of SoftRel to accommodate the
more general case is not conceptually difficult, but has not yet been
undertaken. Later versions may possibly include multiple failure categories,
should this feature prove beneficial.
SoftRel simulates two types of failure events, namely, defects in
specification documents and faults in code, all considered to be in
the same seriousness category, as reflected by the single set of "model"
parameters. As an aside, we note that the seriousness category is often
indicated by the probabilities of observation and outage, and the lengths
of outages: a process with these quantities high will have highly visible
and abortive failures, whereas when these probabilities are low, the
process will have rarely noticed, inconsequential failures.
The ``documentation'' currently simulated by SoftRel consists only
of requirements, design, interface specifications, and other entities whose
absence or defective nature can beget faults into subsequently produced code.
Integration and test procedures, management plans, and other ancillary
documentation, when deemed not to correlate directly with fault generation,
are excluded. The presumption is that the likelihood of a fault
at any given time increases proportionately to the amount of documentation
missing or in error.
SoftRel does not currently simulate the propagation of missing and
defective requirements into missing and defective design and interface
specifications; both requirements analysis and design activities are
currently combined in the document construction and integration
phases. All defects occur either in proportion to the amount of new and
reused documentation, to the amount that was changed, deleted, and added,
or to the number of defects that were reworked.
The Simulation Input Parameters
The reliability process in SoftRel is fairly comprehensive with
respect to what really transpires during software development.
The capability to mirror that process in a simulator will require a
large number of parameters relating to the ways in which people and
processes interact.
The SoftRel "model" parameters are the following:
Model parameters (fixed per execution):
dt simulation time increment, days
workday_fraction average calendar days per day worked
doc_new_size new documentation units
doc_reuse_base reused documentation units
doc_reuse_deleted reused documentation units deleted
doc_reuse_added documentation units added to reuse base
doc_reuse_changed documentation units changed in reuse
doc_build_rate new documentation units/workday
doc_reuse_acq_rate reused documentation acquisition units/workday
doc_reuse_del_rate reused documentation deletion units/workday
doc_reuse_add_rate reused documentation addition units/workday
doc_reuse_chg_rate reused documentation changed units/workday
defects_per_unit defects generated/new documentation unit
reuse_defect_rate reused documentation indigenous defects/unit
del_defect_rate defects inserted/deleted reused unit
add_defect_rate defects inserted/addition to reused unit
chg_defect_rate defects inserted/changed reused unit
hazard_per_defect documentation hazard units added or removed
per defect
new_doc_inspect_frac fraction of new documentation inspected
reuse_doc_inspect_frac fraction of reuse documentation inspected
insp_doc_units_per_workday inspected documentation units/workday
inspection_limit relative number of defects that can be
removed by inspection
find_rate_per_defect rate of defect discovery per hazard unit
per documentation unit
defect_fix_rate corrected documentation defects/workday
defect_fix_adequacy true documentation fixes/correction
new_defects_per_fix defects created/correction
doc_del_per_defect documentation units deleted/correction
doc_add_per_defect documentation units added/correction
doc_chg_per_defect documentation units changed/correction
code_new_size new code units
code_reuse_base reused code units
code_reuse_deleted reused code units deleted
code_reuse_added code units added to reuse base
code_reuse_changed code units otherwise changed in reuse base
code_build_rate new code units/workday
code_reuse_acq_rate reused code acquired, units/workday
code_reuse_del_rate reused code deletions, units/workday
code_reuse_add_rate reused code additions, units/workday
code_reuse_chg_rate reused code changed, units/workday
faults_per_unit faults generated/code unit
reuse_fault_rate indigenous reused code faults/code unit
del_fault_rate faults inserted/deleted code unit
add_fault_rate faults inserted/added code unit
chg_fault_rate faults inserted/changed code unit
faults_per_defect number of code faults/defect
miss_doc_fault_rate faults/code unit generated per missing
documentation fraction
hazard_per_fault code hazard units added or removed per fault
new_code_inspect_frac fraction of new code inspected
reuse_code_inspect_frac fraction of reused code inspected
insp_code_units_per_workday inspected code units/workday
find_rate_per_fault fraction of faults detected per inspected unit
fault_fix_rate code faults "corrected"/workday
fault_fix_adequacy true fault fixes/"correction"
new_faults_per_fix faults created/"correction"
code_del_per_fault code units deleted per fault "correction"
code_add_per_fault code units added per fault "correction"
code_chg_per_fault code units changed per fault "correction"
tests_gen_per_workday test cases/workday
tests_used_per_unit test cases used/resource unit
failure_rate_per_fault failures/resource unit/fault density
miss_code_fail_rate failures per resource unit per missing code
fraction
prob_observation probability that failure is observed
prob_outage probability that a failure causes outage
outage_time_per_failure delay caused by failure, days
analysis_rate failures analyzed/workday
analysis_adequacy faults recognized/fault analyzed
repair_rate fault "repairs"/workday
repair_adequacy true repairs/"repair"
new_faults_per_repair faults created/"repair"
validation_rate "repairs" validated/workday
find_rate_per_fix detected bad repairs/repair validation
retest_rate retested faults/resource unit
retest_adequacy detected bad repairs/retest/unrepaired fault
schedule schedule_item list:
(t_begin, t_end, event, staff, resources)*
packets.
When the work effort expended by an activity is needed, it may be
computed by using the instantaneous staffing, or work force,
function s(alpha, t) defined for each such activity alpha over the time
periods of applicability. The corresponding work effort w(alpha, T)
over a time interval (0, T), for example, is
w(alpha, T) = _0^T s(alpha, t) dt
In SoftRel, s(alpha, t) is coded as "staffing(A, p, M)", where "A" is
the activity, "p" points to a "facts" structure, and "M" points to a
"model".
Similarly, if computer CPU time, or another computer resource, is
required in calculating the event-rate functions above, it is found
through the conversion function q(alpha, t), which is defined for each
activity alpha as the CPU or resource utilization per wall-clock day.
The CPU resource usage over the time interval (0, T), for activity
alpha, for example, is
T_cpu(alpha, T) = _0^T q(alpha, t) dt
The function q(alpha, t) in SoftRel appears as "resource(A, p, M)",
with the same arguments as "staffing", above.
The number of wall-clock days may be interpreted either as
literal calendar days, or as actual workdays. These alternatives are
selected by proper designation of the "model" parameter, "workday_fraction".
A value of unity signifies that time and effort accumulate on the basis
of one workday effort per schedule day per individual. A value of
5/7 means that work effort and resource utilization accumulate on the
average only during 5 of the 7 days of the week. A value of 230/365
denotes that 230 actual workdays span 365 calendar days. These
compensations are made in the "staffing" and "resource" functions, above.
Activities of the life cycle are controlled by the staffing function.
No progress in an activity takes place unless it has an allocated work
force. If, however, staffing is non-zero, event rates involve s(alpha, t)
when work effort dependencies exist, and q(alpha, t) when CPU
dependencies are manifest.
Staffing and computer resource allocations in the "model"
are made via the "schedule" list of "schedule item" packets,
each of which contains
"activity" & = & index of the work activity
"t_begin" & = & beginning time of the activity, days
"t_end" & = & ending time of the activity, days
"staffing" & = & staff level of the activity, persons
"cpu" & = & resources available, units per day
"next" & = & pointer to next "schedule item" packet
The entire list is merged by the staffing and resource-utilization
functions, s and q, or "staffing" and "cpu" in the program, to provide
scheduled workforce and computer resources at each instant of time
throughout the process. Both "staffing" and "cpu" express resource
units per project day. If the schedule quintuples include weekends,
holidays, and vacations, then staff and resource values must be
compensated so that the integrated staff and resources over the project
schedule are the allocated total effort and resource values. This is
done via the parameter "workday_fraction" discussed above.
Event Status Monitors: Output
The event status indicators of interest, or "facts", during the
reliability process are the time-dependent values
Project Status ("facts" output for each dt iteration):
active Project is active if true, else completed
DU Total documentation units goal
DU_t Total number of documentation units built
DU_n New documentation units
DU_r Acquired reused documentation units
DU_rd Reused documentation deleted units
DU_ra Reused documentation additional units
DU_rc Reused documentation changed units
E_d Human errors putting defects in all documentation
E_dn Human errors putting defects in new documentation
E_dr Human errors putting defects in reused documentation
DH Total documentation hazard
DH_n Hazard in new documentation
DH_r Hazard in reused documentation
DI_t Inspected portion of all documentation
DI_n Inspected portion of new documentation
DI_r Inspected portion of reused documentation
D Documentation defects detected
d Documentation defects corrected
CU Total code units goal
CU_t Total code units built
CU_n New code units
CU_r Acquired reused code units
CU_rd Reused code deleted units
CU_ra Reused code additional units
CU_rc Reused code changed units
E_f Human errors putting faults in all code
E_fn Human errors putting faults in new code
E_fr Human errors putting faults in reused code
CH Total code hazard
CH_n New code hazard
CH_r Reused code hazard
CI_t Inspected portion of all code
CI_n Inspected portion of new code
CI_r Inspected portion of reused code
e Code faults detected in inspection
h Code inspection faults corrected (healed)
C Test Cases prepared
c Test cases expended
F Failures encountered during testing
A Failures Analyzed for fault
f Faults isolated by testing
w Faults needing rework, revalidation, etc.
u Number of faulty repairs
R Number of fault repairs undertaken
V Validations conducted of fault repairs
RT Retests conducted
r Faults actually repaired
rr Faults re-repaired
outage Total outage time due to failure
t Current accumulated time
T[N] Time by activity array
W[N] Work effort by activity array
cpu[N] CPU/resource usage by activity array
``Documentation units'' and ``code units'' are typically counted in
pages of specifications and lines of source code, but other conventions
are acceptable, provided that rate functions and parameters of the
"model" are consistently defined.
Other status metrics "facts" of interest are
"t" & = & Current time.
"T[i]" & = & Cumulative time consumed by activity "i".
"W[i]" & = & Cumulative work effort consumed by activity "i".
"cpu[i]" & = & Cumulative CPU or other computer resource consumed
by activity "i".
"outage" & = & Total outage time due to failure.
"active" & = & Boolean indicator, true if the process has not yet
terminated.
Note that the time-related activities above which measure times in days
are expressed as elapsed wall-clock time. Conversions to effort in
workdays and to CPU (or other) computer resource utilization in
resource-days are "model"-related
and addressed previously.