Introduction to Distributed Programming

Remember that old phrase “The Network Is The Computer”?

Definitions

Distributed Computing is computing on a distributed system.

A Distributed System is a system of computers communicating via messages over a network so as to cooperate on a task or tasks. There’s no physical shared memory in a distributed system, though algorithms can simulate such a thing.

Areas of Study

Here are a few areas we’ll cover:

Networks and Internets
Distributed Algorithms and Transactions
Paradigms
Enterprise Computing
Grid Computing
Programming Languages

Networks and Internets

This is a big topic, so there’s a separate page of notes.

Distributed Algorithms

Distributed algorithms are designed for programming distributed systems. Unlike centralized algorithms, distributed algorithms are unaware of any global state or a global time frame. When designing such algorithms we have to consider:

Modeling: transition systems, state diagrams, temporal logic
Communication, Timing, and Synchronization
Routing
Virtual Circuits vs. Packet Switching
Paradigms: wave algorithms, traversal algorithms, election algorithms, snapshot algorithms
Transactions
Distributed Termination Detection
Distributed Deadlock Detection
Distributed Failure Detection
Fault Tolerance
Stabilization

Distributed Computing Paradigms

Paradigms are the different approaches to structuring and programming distributed systems. Here are a few:

Client-server
Multi-tier
Peer-to-peer
Publish/subscribe
RPC
Distributed Objects
Object Spaces
Mobile Agents
Network Services
Groupware

Exercise: Research these paradigms. Write a survey paper covering all of these, and any more you find. Provide examples, comparisons, and lots of references. Make the paper of publishable quality.

Enterprise Computing

Enterprise applications are applications that run on large servers with multiple (simultaneous) users communicating over a network via clients like web browsers, PDAs, cell phones, or desktop applications. These applications generally read from and write to big databases.

Some people say enterprise applications are only for business functions (accounting, customer management, product tracking, etc.); some say any big distributed application counts as “enterprise”.

Enterprise Computing Platforms

In the old days, “big systems” like IMS and CICS were run on big mainframes, often programmed in COBOL. In the 1990s and early 2000s a couple new big players emerged: Java EE and .NET, with these features:

Java EE	.NET
Runs on a JVM From Sun Fully implemented on many operating systems Maintained and enhanced by the Java Community Process (comprised of hundreds of companies and organizations) Source code for the entire framework freely available Mature Kind of a standard	Runs on the CLR (Common Language Runtime) From Microsoft Fully implemented on Windows; partially implemented on other operating systems Microsoft-maintained and enhanced Some source code is proprietary Mature Kind of a marketing strategy; however, some "components" are official standards (e.g. C#)

Enterprise Architectures

In the old days, and today for the most trivial of applications, we see client-server organizations.

Two tier architectures are almost always way too fragile. They soon gave way to three-tier architectures:

The idea here is that any one of the three layers can be completely re-implemented without affecting the others.

The middle layer completely isolates the front end from any knowledge of the database. The UI doesn’t even know what the data source is. It just makes calls like fetchCustomerById(24337).

Software running in the middle tier is called middleware. Middleware products are also called containers, since they host and manage the business objects. They can manage lifecycles, transactions, memory, authentication, concurrency, distribution, security, sessions, resource pooling, logging and lots of other “system-level plumbing things” so developers only have to concentrate on business logic.

There’s no need to stop at three tiers. You’ll often hear the term n-tier.

Sometimes applications are classified by the complexity of the client:

Thick Client	Thin Client
Customized client application Probably a rich GUI Runs on a desktop (but could be delivered via WebStart) In two-tier architecture, has too much business logic In two-tier architecture, may have embedded database calls	Client probably just a web browser Can make use of a web container’s database pooling and other helpful offerings. Probably a weak GUI, but new technologies (e.g. Ajax) helping a lot! In two-tier architecture, might have database calls embedded in a web page

Grid Computing

The term grid computing refers to the computation of highly compute-intensive algorithms (protein folding, SETI, earthquake simulation, climate modeling) over many computers across administrative domains. Most of the computers run similar code; they’re all contributing bits toward the overall solution.

More at Wikipedia’s article on Grid Computing.

Exercise: Research and write about the differences between grid computing and cluster-based computing. Mention how a cluster is different from a grid.

Language-Specific Distributed programming

This paper by Henri Bal, Jennifer Steiner, and Andrew Tannenbaum from 1989, covers dozens of languages. A good read!

Here is how they categorize the covered languages:

TODO TABLE

Naturally, a lot of new languages have arrived since then!

Exercise: Read the paper, then answer whether you think any new categories for language classification have arisen since 1989. If so, what are they? Then for languages created since that time that you know, try to fit them into the original (or your modified) classification scheme.

Mainstream languages in wide use tend to provide support for distributed programming both within the language itself and in libraries. Here is where to look:

Language	Distributed Programming Support
Erlang	Built-in support for distributed programming, fault tolerance, and concurrency. TODO links
Ada	Has built-in support for distributed systems, including tasking and real-time features. TODO links - distributed programming annex
Perl	Has modules like Parallel::ForkManager and POE for distributed programming. TODO links - e.g., perlnet
Go	Built-in support for concurrency and networking, making it suitable for distributed applications. TODO links
Scala	Often used with frameworks like Akka for building distributed systems. TODO links
Elixir	Built on the Erlang VM, it inherits Erlang's capabilities for distributed programming. TODO links
Java	With libraries like RMI (Remote Method Invocation) and frameworks like Spring, it supports distributed programming. TODO links
Python	Has libraries like Pyro and Dask for distributed computing. TODO links
Rust	Has libraries like Tokio and Actix for building distributed systems. TODO links
C	While not inherently distributed, it can be used with libraries like MPI (Message Passing Interface) for distributed computing. TODO links
C++	Can be used with libraries like Boost.Asio and ZeroMQ for distributed applications. TODO links
Ruby	Has libraries like DRb (Distributed Ruby) for building distributed applications. TODO links
JavaScript	With Node.js and libraries like Socket.io, it supports real-time distributed applications. TODO links
C#	With libraries like WCF (Windows Communication Foundation) and ASP.NET Core, it supports distributed programming. TODO links
PHP	Can be used with libraries like Ratchet for real-time distributed applications. TODO links
Swift	With libraries like SwiftNIO, it supports building distributed applications. TODO links
Kotlin	With libraries like Ktor and coroutines, it supports building distributed applications. TODO links
Haskell	With libraries like Cloud Haskell, it supports distributed programming. TODO links
Julia	With libraries like Distributed.jl, it supports parallel and distributed computing. TODO links

Summary

We’ve covered:

What distributed computing is
Areas of study
Concerns of distributed algorithms
Paradigms for distributed computing
Enterprise computing
Grid computing
Distributed programming in various languages