Software Development
What are the high-level concepts that people
think about when they are developing software?
Software Engineering
Software Engineering is concerned with constructing software systems that are:
- Correct, meaning they do exactly what they were intended to do;
- Reliable, meaning they don't crash;
- Robust, meaning they can handle unforeseen or error conditions by logging alerts, cleaning up, and proceeding if possible;
- Predictable, meaning their behavior is never shocking (without good reason);
- Efficient, meaning they do not take too long to run, nor consume more memory or system resources than is reasonable;
- Understandable, meaning that you can look at their source code and tell what they do;
- Reusable, meaning the components from which they are built can function unchanged in other applications;
- Scalable, meaning that new features can be added without excessive restructuring of existing code, and without impacting its run-time performance;
- Maintainable, meaning that bugs can easily be isolated and fixed;
- Appropriate, meaning they do what their users want them to do; and
- Economical, meaning they are produced on-time and under-budget, and are fairly priced.
Exercise: Think up a couple more criteria.
Software engineering is fundamentally different from other engineering
disciplines, argues Jack Reeves in these
three
excellent essays.
Exercise: Write a review of these essays, arguing whether you agree or disagree with his thesis "the code is the design".
Why Study Software Engineering?
According to Booch,
"industrial-strength" software is inherently complex.
Steve McConnell calls it a
wicked problem.
No single person can understand all the subtleties of the design of a
large software system. Why?
- Requirements are under- or over-specified, contradictory,
too unintelligible, always changing.
- Development process involves too many people,
different machines, excessive documentation.
- Software has unlimited flexibility. (A carpenter wouldn't truck
in a freshly cut tree and set up a lumber mill on your front
lawn to make you a new front door, but many programmers are
guilty of writing their own linked-list classes.)
- Discrete systems are hard to characterize (combinatorial
explosion in number of states).
To construct complex software we should try to understand
the nature of complex systems in general, and see how we
deal with them.
Complex Systems
An empirical study of various complex systems (e.g., matter,
personal computers, plants and animals, social institutions)
reveals:
- Hierarchy: different levels of abstraction are built
upon each other.
- Strict separation of concerns between levels.
- Little or no centralization: the high level functionality
appears due to a cooperation of agents
(so concurrency is fundamental).
- Strong intracomponent linkages (strong cohesion) and
weak intercomponent linkages (loose coupling).
- Economy of Expression: similar building blocks
in different levels (e.g. cells, transistors, quarks/leptons,
people).
- As systems evolve, objects that were once considered
complex become the primitive components of the next
generation system.
How Humans Deal With Complexity
We manage complexity using abstraction, classification, and hierarchy.
- ABSTRACTION
Recognition of fundamental concepts, structures and
behaviors, without concern for implementation details
- CLASSIFICATION
Recognition that every object is an instance of
some class
- HIERARCHY
Distillation of essential similarities and differences
- Generalization / Specialization:
("is-a", "kind-of")
- Composition:
("has-a", "integral-part-of")
- Aggregation:
("member-of")
Abstraction
Abstraction is probably the single most important principle in
Computer Science. It is the primary way humans deal with complexity,
and software systems are humankind's most complex creations.
You have to be able to view software components in an abstract way,
that is, you have to be able to describe what they do, without relying
on describing how they do it.
Examples
- Driving a car: you don't have to know how internal combustion,
fuel cells, or batteries work in order to drive.
- Using a microwave: you don't have to know the physics to cook;
- Talking on a phone: you don't have to know how your voice
is encoded or how calls are routed to communicate.
- Setting a thermostat: you don't have to know what
gets the AC or heater to fire up in order to set a temperature.
- Playing videos: you don't need to know how
the content is downloaded, buffered, and turned into frames for viewing.
Exercise: Give a few more examples.
Primary types of abstraction in programming
- Procedural Abstraction: You call a function with a
known interface but you do not know or care how it runs
(what code implements it).
- Data Abstraction: You declare objects of a known
type, but you do not know or care how those objects
are laid out in memory, nor how the operations that
manipulate them work.
Examples of Data Abstraction
- Integers can be represented in 1's-complement, 2's-complement,
BCD, ...
- There are many different formats for floating point values
- There are different file systems, e.g. NTFS, xfs, e2fs,
and zillions more
- Lists can use array or linked structures
Classification
Classification is identifying that a number of objects have similar
structure and behavior and giving a name to that class of objects.
For example "Dog" is a class and your particular dog is an object
of that class.
Hierarchy
Hierarchy is how levels of abstraction are organized.
Is-a (Kind-of) Hierarchy
Shows classes and subclasses. Moving up is called
generalization (recognizing that different classes
share some similarities), moving down is called
specialization (factoring a class into subclasses
which are different from each other in some ways).
Exercise: Give some more examples.
Has-a (Part-of) Hierarchy
Shows classes in a containment hierarchy. Moving up is called
composition (combining parts to form larger objects),
moving down is called decomposition (breaking a larger
structure down into components). In a composition relationship,
contained objects are completely owned by the container: if the
containing object goes away, so does the containee.
Member-of (Aggregation) Hierarchy
Shows classes related by groups and subgroups. This
is very similar to composition except that the members of
a group continue to exist even if the group goes away.
Software Development Methodologies
Many software development methodologies, or
processes,
have been created.
Usually they fall into a spectrum from adaptive to
plan-driven. Many of the more adaptive are known as
agile
methods. The most plan-driven method is probably the
waterfall model.
This is pretty much despised as a way to build software, because it
doesn't work. It does work in heavy manufacturing and similar industries,
though.
Some methodologies:
- Rational Unified Process (RUP)
- Extreme Programming (XP)
- Scrum
- Agile Unified Process (AUP)
- Open Unified Process (OpenUP)
- DSDM
Boehm and Turner
give a great characterization of "home grounds" for adaptive and
plan-driven methods (loosely summarized here):
Adaptive | Plan-driven
|
---|
Goal is to respond quickly to change | Goals include predictability and stability
|
Smaller teams, more senior developers | Larger teams, more juniors
|
Tacit interpersonal knowledge | Explicit documentation
|
User stories | Explicit, formal, detailed, requirements
|
Simple designs | Detailed and extensive designs
|
Culture "thriving on chaos" | Culture "thriving on order"
|
Exercise: Write a three to five page paper on Agile software development.
Phases, Iterations, and Workflows
Software Development is an incremental and iterative process (waterfall
doesn't work for software). You
iterate because coding might show part of the design was infeasible,
maintenance requires recoding, the customer will change requirements
just when the product is about to be shipped, etc.
The major elements of a development cycle are phases,
iterations and workflows. This diagram shows
the four major phases (inception, elaboration, construction
and transition). Within each phase you do a number of iterations.
An iteration results in the development of a complete,
executable subset of the system. The diagram (from
the RUP) shows how much effort within a given workflow you put into
an iteration.
Current Research
- Physical Distribution
- Concurrency
- Replication
- Security
- Load Balancing
- Fault Tolerance
- Grid Computing
- Data Science (including Analytics)
- Search and Data Mining
- Robotics
Technologies that Help
- Components
- Visual Programming
- Patterns
- Frameworks
- Application Containers
- Aspect Orientation
Producing Efficient Software
An efficient algorithm minimizes cost which is one or
more of:
- Time, which is affected by algorithmic issues like the
could be number of operations, or physical issues like
communication delays between network nodes, between
processors, between the CPU and a graphics chip, between the CPU
and its cache or memory, etc. (In more technical terms we want
to minimize network hops, page faults, context switches, and
so on).
- Space, which refers to the amount of memory or disk space required
to "hold intermediate results".
- Coding time, which goes way up if you're not experienced
or are unaware of existing libraries and solutions that
you can adapt.
- Verification and debugging time, which goes up when the code
is hard to read or understand)
- System integration effort, which is too high when there
are artificially too many little pieces of code written
by too many different people.
Caskey's Law of Software Development
"A good system must first and foremost be easy to modify
and extend." —Caskey Dickson