Goal: Provide a layer of abstraction for process-to-process communication that enables a process on one system to invoke a function on another system without dealing with the problems of formatting data and parsing messages.
Therefore never send to know for whom the bell tolls; it tolls for thee.
– John Donne, Meditation XVII.
Introduction
Sockets are a core part of client-server networking. They provide a mechanism for a program to establish a connection to another program on a remote or local machine and send messages back and forth. Sockets are also the standard kernel interface for communication across machines.
However, sockets force us to design distributed applications using a read/write interface, which is not how we generally think about application design.
In centralized applications, the procedure call is the standard interface model. It allows for the creation of components with functional interfaces. If we want to make distributed computing look like centralized computing, I/O-based communication is not the way to accomplish this.
In 1984, Andrew Birrell and Bruce Nelson at Xerox PARC devised a mechanism to allow programs to call procedures on other machines. A process on machine A can call a procedure on machine B. When it does so, the process on A is suspended and execution continues on B. When B returns, the return value is passed back to A and A continues execution.
This mechanism is called the remote procedure call, or RPC. To the programmer, it appears as if a normal procedure call is taking place. Obviously, a remote procedure call is different from a local one in the underlying implementation, but the programming model hides those differences.
How Local Procedure Calls Work
Consider how local procedure calls are implemented. Every processor provides some form of a call instruction, which pushes the address of the next instruction onto the stack and transfers control to the specified address. When the called procedure finishes, it issues a return instruction, which pops the address from the stack and transfers control there.
The compiler generates code to evaluate each parameter, push its value onto the stack, and then issue the call. In the called function, the compiler ensures that any registers that may be overwritten are saved, allocates stack space for local variables, and restores registers and the stack pointer before returning.
None of this makes sense if we want to call a procedure loaded on a remote machine. The compiler must do something different to provide the illusion of calling a remote procedure.
Most programming languages do not provide a way to tag a function as being remote. Instead, remote procedure calls are usually implemented by user-space libraries and toolchains (often with generated stubs). They key point is that they, unlike sockets, they are implemented at the user level rather than by the kernel. Sockets, in contrast, are an operating system construct provided by the kernel.
RPCs provide the illusion of calling a procedure on a remote machine. During this time, execution of the local thread stops until results are returned. The programmer is freed from packaging data, sending and receiving messages, and parsing results.
Steps in a Remote Procedure Call
The illusion of a remote procedure call is accomplished by generating stub functions. On the client side, the stub (often called a proxy) is a function with the same interface as the desired remote procedure. Its job is to take the parameters, package them into a network message, send them to the server, await a reply, and then extract the results and return them to the caller.
The client feels like it’s calling a local procedure because it is: it’s just calling the proxy.
On the server side, the stub (known as a skeleton) is responsible for being the main program that registers the service and awaits incoming requests. It extracts the data from the request, calls the user’s procedure, and packages the results into a network message sent back to the client.
The user’s procedure on the server isn’t aware that it’s being called by the skeleton instead of the client application.
The process of packaging data into a network message is called marshalling. Marshalling requires serializing all data elements into a flat byte array suitable for transmission. The terms marshalling and serialization are often used interchangeably in the RPC context, though serialization more broadly refers to any conversion of data to bytes, while marshalling specifically implies preparing data for a remote procedure call, which may include additional metadata like function identifiers or version numbers.
Stub functions can be generated in two ways:
-
Traditional RPC systems use static stub generation, where an RPC compiler reads an interface definition and generates code for stubs before the program is compiled.
-
Some languages use dynamic stub generation, where the language runtime creates proxy objects at runtime. This relies on reflection, a language feature that allows a program to examine its own structure (such as what methods an interface defines) and create new behavior at runtime. Java RMI2 (since version 1.5, using
java.lang.reflect.Proxy) and Python frameworks like RPyC use dynamic approaches, eliminating the need for a separate compilation step. Ruby, JavaScript and C#/.NET also support this. Languages like Rust, Go, C, and C++ do not.
The steps in executing a remote procedure call are:
-
The client calls a local procedure, the client stub. To the client process, this appears to be the actual procedure. The client stub marshals the parameters into one or more network messages.
-
Network messages are sent by the client stub to the remote system via a system call to the local kernel using socket interfaces.
-
Network messages are transferred by the kernel to the remote system via some protocol, either connectionless or connection-oriented.
-
A server stub receives the messages on the server. It unmarshals the arguments from the messages and, if necessary, converts them from a standard network format to a machine-specific form.
-
The server stub calls the server function, passing it the arguments received from the client.
-
When the server function finishes, it returns to the server stub with its return values.
-
The server stub takes the return values and marshals them into one or more network messages to send to the client stub.
-
Messages are sent back across the network to the client stub.
-
The client stub reads the messages from the local kernel.
-
The client stub returns the results to the client function, converting them from network representation to local format if necessary.
The client code then continues execution.
Note that basic RPC is synchronous: the client blocks, waiting for the server to respond before continuing. This matches the behavior of local procedure calls but can be problematic when calling slow services.
Some RPC frameworks offer asynchronous variants where the client continues execution immediately and handles the response later through callbacks or futures. However, the synchronous model remains the default because it is simpler to reason about.
Why Use RPC?
The major benefits of RPC are twofold. First, the programmer can use procedure call semantics to invoke remote functions and get responses. Second, writing distributed applications is simplified because RPC hides all network code inside stub functions. Application programs do not have to worry about sockets, port numbers, and data conversion.
Additional advantages include:
Dynamic port allocation: In many RPC systems, you do not need to pick a unique port number. The server can bind to any available port and register that port with a discovery mechanism. The client uses this mechanism to find the port for a given program. All this can be invisible to the programmer.
Service discovery: Applications on the client can locate services without hard-coding network addresses. Various discovery mechanisms exist, from simple name servers to DNS-based discovery to sophisticated service mesh infrastructure.
Choice of transport protocol: Some RPC systems allow you to choose whether communications will take place over TCP or UDP and may allow you to select the protocol at runtime. It’s just a matter of generating stubs for each protocol.
Procedural interface: The function call model can be used instead of the send/receive interface provided by sockets. Users do not have to deal with marshalling parameters and parsing them out on the other side.
Note that not all RPC systems provide all these benefits. Some frameworks are tightly bound to specific transport protocols (HTTP only, for example), and some require explicit configuration of ports and addresses. The specific features depend on the RPC framework being used.
Challenges in Implementing RPC
There are several hurdles to overcome in implementing remote procedure calls.
Parameter Passing
Most parameters in programs are passed by value. That is easy to do remotely: just send the data in a network message. Some parameters, however, are passed by reference. A reference is a memory address. The problem is that memory is local, and a memory address passed from a client to a server will refer to memory on the server rather than the contents on the client.
This is particularly problematic for pointer-based data structures. Consider a linked list, a tree, or a graph. These structures use pointers to connect nodes1. You cannot simply send the pointer values because they are meaningless on the remote machine. Instead, the entire data structure must be traversed, serialized into a pointerless format, transmitted, and reconstructed on the remote side.
There is no elegant solution to the pass-by-reference problem. The common approach is copy-restore (also called copy-in/copy-out): send the referenced data to the remote side, where it will be placed in temporary memory. A local reference can then be passed to the server function. Because the contents might have been modified by the function, the data must be sent back to the calling client and copied back to its original location.
The cost of copy-restore can be substantial. It might not be much for small strings, but for a large data structure, such as a tree with millions of nodes, the entire structure must be serialized, transmitted, and deserialized, even if the remote function examines only a small portion of it. This overhead is one reason why RPC interfaces are often designed differently from local interfaces, favoring smaller, self-contained parameters over large pointer-based structures.
Data Representation
All data sent needs to be represented as a series of bytes placed into network messages. Not only must any data structure be sent in a serialized format with no pointers, but the format must be standardized between the client and server so that the server can make sense of the data it receives.
Different processors and languages may use different conventions for integer sizes, floating point sizes and formats, byte ordering, and alignment of data.
Byte ordering refers to how multi-byte values are stored in memory: big-endian systems store the most significant byte first, while little-endian systems store the least significant byte first. The number 0x1234 would be stored as bytes 12 34 on a big-endian system but as 34 12 on a little-endian system.
Intel/AMD processors use little-endian; network protocols traditionally use big-endian (sometimes called “network byte order”). ARM processors can be configured at boot time for big-endian or little-endian operation.
Generating Stubs
Since most languages do not support remote procedure calls natively, something has to generate client and server stubs.
Traditional RPC systems use a standalone program known as an RPC compiler or protocol compiler. The RPC compiler takes an interface specification as input and generates client-side stubs (proxies) and a server-side skeleton. The interface specification is written in an interface definition language (IDL) and defines remote classes, methods, and data structures.
Service Discovery
A client process needs to find out how to set up a network connection to the appropriate service: what host and port to use. Several mechanisms exist for service discovery:
Name servers: A dedicated service that maps service names or identifiers to network addresses. Examples include Sun’s portmapper (rpcbind on Linux and macOS) and Java’s rmiregistry. When a service starts, it registers with the name server. Clients query the name server to find services.
DNS-based discovery: Services can be registered in DNS, either using special record types or by naming convention. This leverages existing DNS infrastructure but updates propagate slowly due to caching.
DNS-based discovery is particularly common in container orchestration systems. In Kubernetes, for example, each Service automatically gets a DNS name following the pattern service-name.namespace.svc.cluster.local. Code can connect to a stable name such as payment.production.svc.cluster.local, and the platform routes the request to one of the running service instances. The application code needs only the service name; all routing happens transparently through DNS.
Configuration: In simpler deployments, service addresses may be specified in configuration files or environment variables. This is straightforward but requires manual updates when services move.
Service meshes: In some cloud environments, a service mesh is infrastructure that handles communication between services. The application simply connects to a logical service name; the mesh handles finding an actual server, load balancing across multiple instances, retrying failed requests, and collecting metrics. Examples include Istio and Linkerd. The application code does not need to implement any of these concerns.
Security
RPC calls often traverse networks where messages could be intercepted or modified. Two concerns arise:
Authentication: How does the server know the client is who it claims to be? And how does the client know it is talking to the legitimate server? Solutions range from shared secrets to certificate-based authentication.
Encryption: How do we prevent eavesdroppers from reading message contents? Instead of defining custom encryption and authentication frameworks, modern RPC systems typically use TLS (Transport Layer Security, the same protocol used to secure HTTP traffic in HTTPS) to encrypt all traffic. TLS also provides authentication through certificates.
Most modern RPC frameworks either require or strongly encourage TLS. In internal networks, service meshes often handle TLS automatically, encrypting all service-to-service traffic without application changes.
Dealing with Failures
Distributed systems must handle partial failure. We do not have a concept of local procedure calls not working. If the machine crashes, the entire process dies; if the procedure has a bug and crashes, the process dies as well.
With remote calls, however, problems can arise. The server can stop working or network connectivity may break or experience unexpected delays, preventing or delaying requests from reaching the server or responses from reaching the client.
Timeouts and Deadlines
Every RPC call should have a limit on how long to wait for a response. Without limits, a hung server or network partition can cause clients to wait indefinitely, consuming resources, hurting usability, and potentially causing cascading failures.
A timeout specifies a duration: “wait at most 5 seconds.”
A deadline specifies an absolute point in time: “this request must complete by 10:30:00 UTC.” Both limit waiting, but deadlines have an advantage in systems with layers of services.
Consider a request that passes through services A, B, and C. If each service sets a 5-second timeout, the total wait could be 15 seconds. Worse, if Service A spends 4 seconds, Service B might still wait its full 5 seconds even though the original caller’s patience has likely expired.
With deadline propagation, the original deadline is passed along with each call. Service A might receive a deadline of “10:30:05.” After spending 2 seconds on its work, it calls Service B with the same deadline. Service B knows it has only 3 seconds remaining. If Service B’s work takes 2.5 seconds, it can call Service C with only 0.5 seconds remaining. Service C, seeing insufficient time, can immediately return an error rather than starting work it cannot complete. This “fail fast” behavior avoids wasted effort.
Keep in mind that each service must handle the possibility of a remote procedure failing. If services A and B involve making updates that assume service C succeeds, they will have to undo those changes if C fails. We will discuss this topic when we explore distributed transactions.
Cancellation
When a deadline expires or a client gives up, in-progress work should ideally stop. Cancellation signals to the server that results are no longer needed. Without cancellation, servers may continue expensive computations for abandoned requests, wasting resources.
Implementing cancellation requires cooperation: the server must periodically check whether the request has been cancelled and abort gracefully. Many RPC frameworks provide built-in cancellation support, passing cancellation signals alongside the request so servers can check them.
RPC Semantics
To deal with failures, RPC libraries may attempt to retransmit requests if a response is not received in time. This may have the side effect of invoking a procedure more than once if the network is slow.
Functions that can be run multiple times without undesirable side effects are called idempotent functions. Examples include retrieving the contents of a shopping cart or setting a user’s name to a specific value.
Functions with undesirable side effects if run multiple times are called non-idempotent functions. An example is transferring $500 from checking to savings or adding an item to a shopping cart.
Most RPC systems offer one of two semantics:
-
At-least-once semantics: A remote procedure will be executed one or more times if there are network delays. The RPC library may resend the request if it does not receive a timely response.
-
At-most-once semantics: The RPC system tries to ensure that the server executes the procedure no more than once, typically by tagging requests with unique IDs and suppressing duplicates. Retransmissions may still occur, but duplicates should not be re-executed unless the server loses its deduplication state (for example, after a crash). The procedure will execute zero or one times.
Local procedure calls have exactly-once semantics. Achieving exactly-once for remote calls is extremely difficult because you cannot distinguish “the server never received the request” from “the server processed it, but the response was lost.”
Retries and Backoff
Transient failures are common in distributed systems. Retries allow clients to survive temporary problems by resending the same request.
However, immediate aggressive retries can make problems worse by increasing load on struggling services. Exponential backoff increases the wait time between subsequent retries (e.g., 100ms, 200ms, 400ms, 800ms), keeping load on the backend manageable.
Jitter adds randomness to backoff times, preventing synchronized retry storms where many clients retry at exactly the same time after a shared failure.
Idempotency Keys
For non-idempotent operations that must be retryable, idempotency keys provide a solution. The client generates a unique identifier (typically a UUID) for each logical request. If a retry occurs, the same key is sent.
The server stores results keyed by this identifier and returns the cached result for duplicate requests, avoiding re-executing the operation.
Keep in mind the design challenges in implementing this. For example:
-
How many past keys and results do you store, realizing that some might be large?
-
How do you ensure the identifiers are never reused, even across systems?
-
Can you recognize that a request is old even if it’s no longer stored? If so, do you fail it or execute it again?
-
Do you cache keys and results in memory? What if the system crashes and reboots?
Circuit Breakers
When a service fails repeatedly, continuing to send requests wastes resources and may delay recovery. A circuit breaker monitors failure rates and “trips” when failures exceed a threshold, immediately failing subsequent requests without attempting the call.
After a cooling-off period, the circuit breaker allows test requests through. If these succeed, normal operation resumes.
Observability
Debugging failures in distributed systems is challenging because a single request may traverse multiple services. Observability refers to the ability to understand system behavior through external outputs: logs, metrics, and traces.
A key technique is request IDs (also called correlation IDs or trace IDs). When a request enters the system, a unique identifier is generated and propagated through all subsequent service calls. This allows for reconstructing the complete path of a request when investigating problems.
We will discuss observability more in the Web Services and gRPC notes.
Early RPC Systems
Understanding early RPC systems provides context for why modern systems evolved as they did. These systems solved important design problems and established the fundamental patterns while revealing limitations that later systems addressed.
ONC RPC
Sun’s RPC, formally called Open Network Computing (ONC) RPC, was one of the first RPC systems to achieve widespread use, thanks to the early popularity of Sun workstations and the Network File System (NFS). It remains in use on UNIX-derived systems, including Linux, macOS, and various flavors of BSD.
ONC RPC introduced several concepts that became standard:
-
Interface definition: The
rpcgencompiler takes an IDL file and generates client and server stubs -
Program numbers: A 32-bit identifier for each service interface, registered with a name server called the
portmapper(rpcbindon Linux and macOS systems) -
Versioning: Different versions of an interface can coexist, allowing gradual client migration
-
XDR (eXternal Data Representation) encoding: A canonical binary format for data representation using big-endian byte order and 4-byte number alignment
The main limitations were the need to manually choose unique program numbers (risking collisions) and the single canonical data format (requiring conversion even when communicating between identical architectures).
DCE RPC
The Distributed Computing Environment RPC, defined by the Open Group, addressed some ONC RPC limitations:
-
UUIDs: 128-bit Universal Unique Identifiers replaced manually-chosen program numbers, eliminating collision risk. UUIDs can be generated independently by any system with negligible probability of collision3.
-
Cells: Administrative groupings of machines with a cell directory server, enabling location transparency. Clients need not know which machine hosts a service; they can contact a cell directory server to find out.
-
Receiver-makes-right data representation: NDR (Network Data Representation) allows the sender to transmit data in its native format. The receiver converts only if necessary, avoiding unnecessary conversion between machines with identical architectures.
Microsoft DCOM/COM+
DCE RPC became the foundation for Microsoft’s RPC implementation. Microsoft licensed DCE RPC in the early 1990s and adapted it for Windows. This formed the basis for DCOM (Distributed Component Object Model), which extended Microsoft’s COM (Component Object Model) to work across networks.
DCOM later evolved into COM+, which added services like transactions, security, and object pooling. We discuss DCOM’s approach to distributed objects in more detail later, as it introduced important concepts for managing object identity and lifecycle in distributed systems.
Data Serialization
Serialization is the process of converting data into a pointerless format: an array of bytes suitable for transmission over a network or storage in a file. The serialized format must be standardized between sender and receiver so both can make sense of the data.
Remote machines may have different byte ordering, different sizes for integers and other types, different floating-point representations, different character sets, and different alignment requirements. Network protocols standardized on big-endian byte ordering, but that only covers protocol headers, not application data. RPC systems need more general solutions for serializing arbitrary data structures.
There are two approaches to encoding data:
Implicit typing: Only values are transmitted, not data types or parameter information. The remote side must know the precise expected sequence of parameters. This can be highly efficient but does not easily allow for optional parameters or parameters that may hold different types (polymorphism).
Explicit typing: Type information is transmitted with each value. JSON and XML formats are examples of this. Self-describing data is larger but more flexible.
Schemas
A schema is a formal description of data structure. It specifies what fields exist, what type each field has (integer, string, array, etc.), whether fields are required or optional, and any constraints on values. Think of a schema as a blueprint or contract that defines the shape of your data.
For example, a schema for a person record might specify:
-
A required string field called “name”
-
A required integer field called “id”
-
An optional string field called “email”
-
A repeated field called “phone_numbers” containing strings
Schemas serve several purposes in remote procedure calls (and distributed systems in general):
Validation: Before processing data, you can check whether it conforms to the expected schema. If a client sends a message missing a required field or with the wrong type, the error can be caught immediately rather than causing mysterious failures later.
Documentation: A schema formally documents what data a service expects and produces. Developers can read the schema to understand the API without examining source code.
Code generation: Tools can read schemas and automatically generate code for serializing and deserializing data. This eliminates hand-written parsing code and ensures type safety at compile time.
Evolution: When schemas change over time (adding new fields, deprecating old ones), well-designed schema systems provide rules for maintaining compatibility between old and new versions. This is called schema evolution.
An IDL (interface definition language) defines procedures, interfaces, and types for RPC services. A schema only defines data structures and field encodings. RPC systems use a schema inside the IDL.
Versioning and Evolution
In real-world systems, interfaces change over time. New features require new parameters; old features become obsolete. However, you cannot assume all clients will upgrade simultaneously. At any given time, clients running different software versions may be calling the same server, or updated clients may need to work with older servers.
Versioning addresses this by explicitly identifying interface versions. A server might support versions 1, 2, and 3 of an API, handling requests appropriately based on the version the client specifies. This allows gradual migration: new clients can use version 3 features while old clients continue using version 1.
Schema evolution provides finer-grained compatibility. Well-designed serialization formats like Protocol Buffers allow adding new optional fields without breaking existing clients. The key goals are:
-
Backward compatibility: New code can read data written by old code
-
Forward compatibility: Old code can read data written by new code (ignoring unknown fields)
Achieving both typically requires that new fields be optional and have default values, and that field identifiers (like Protocol Buffers tags) are never reused.
For remote procedure calls, it’s far more likely that a service will be upgraded while some clients will run old versions. Thus, it’s important for a service to handle old versions of requests.
In the general case, since schemas refer to data formats, they are also used to store data (e.g., archive it in a file system or object store). In this case, it’s possible that updated clients will retrieve old formats of data.
Text-Based Serialization Formats
XML (eXtensible Markup Language)
XML is a verbose but human-readable format using tags to describe data structure. It is self-describing, meaning the data includes field names. XML can be paired with XML Schema Definition (XSD) files that formally specify the expected structure, enabling automatic validation. However, XML is extremely verbose and slow to parse, making it unsuitable for high-performance applications.
<ShoppingCart>
<Items>
<Item>
<ItemID>00120</ItemID>
<n>Back Scratcher</n>
<Price>5.99</Price>
</Item>
</Items>
</ShoppingCart>
JSON (JavaScript Object Notation)
JSON emerged as a lightweight alternative to XML. It is human-readable, self-describing, language-independent, and easier to parse than XML. JSON has become the dominant format for web APIs due to its simplicity and wide support.
{
"items": [
{
"item_id": 120,
"name": "Back Scratcher",
"price": 5.99
}
]
}
JSON’s limitations include no distinction between integers and floating point numbers (both are just “numbers”), no native support for binary data (you must encode binary as text, typically using Base64 which represents binary bytes as printable ASCII characters), and no built-in schema enforcement. Any JSON parser will accept any valid JSON document, regardless of whether it matches what your application expects. External tools like JSON Schema can add validation, but this is not part of JSON itself.
Binary Serialization Formats
For high-performance applications, binary serialization provides significant advantages in size and parsing speed. While early RPC systems used formats like XDR and NDR (discussed earlier), modern systems use more sophisticated formats designed for schema evolution.
Protocol Buffers
Protocol Buffers (often called protobuf) is a binary serialization format originally developed at Google and now widely used. It uses a schema definition language to define message types, and a compiler generates data access classes for the target programming language.
syntax = "proto3";
message Person {
string name = 1;
int32 id = 2;
string email = 3;
message PhoneNumber {
string number = 1;
PhoneType type = 2;
}
repeated PhoneNumber phones = 4;
}
enum PhoneType {
MOBILE = 0;
HOME = 1;
WORK = 2;
}
Each field has a unique numeric tag (the numbers after the equals sign) that identifies it in the binary format. In proto3 (the current version), fields may be omitted and will take default values; the repeated keyword indicates a field can appear multiple times (like an array). If you need presence tracking for scalar fields, proto3 supports an explicit optional qualifier. This design enables schema evolution: new fields can be added with new tags, and old code simply ignores fields it does not recognize.
Compared with XML or JSON, Protocol Buffers is usually more compact on the wire and faster to parse for large volumes of structured data, but the magnitude depends on the workload and implementation. Benchmarks show it to be typically 3 to 10 times smaller than equivalent JSON and parsed 10 to 100 times faster, though exact numbers depend on the specific data and implementation.
Apache Avro
Avro was developed within the Apache Hadoop project as an alternative with different tradeoffs. Avro schemas are defined in JSON, and the schema is stored alongside the data, making the format self-describing.
{
"type": "record",
"name": "Person",
"fields": [
{"name": "userName", "type": "string"},
{"name": "favouriteNumber", "type": ["null", "long"]},
{"name": "interests", "type": {"type": "array", "items": "string"}}
]
}
Some key differences of Avro from Protocol Buffers:
-
Avro matches fields by name during schema resolution, while Protocol Buffers matches by numeric tag. This means Avro field names matter for compatibility, while Protocol Buffers field names are just for code readability.
-
The schema must be present to read Avro data. This enables tools to process Avro files without pre-compiled code, since the schema tells them how to interpret the bytes. Protocol Buffers requires generating code from the schema before you can read data.
-
Avro is widely used in big data ecosystems (Hadoop, Kafka, Spark) where data is stored long-term and schemas evolve over time.
Protocol Buffers is generally preferred for RPC due to slightly better performance and simpler tooling, while Avro is preferred for data storage and streaming platforms.
Cap’n Proto and Flatbuffers
While Protocol Buffers are regarded as among the most efficient formats, their creator created a far more efficient format called Cap’n Proto. It doesn’t appear to have gained much adoption yet, but is used in Sandstorm.io and Cloudflare Workers. It’s supported on a decent number of languages, including C++, C#, Erlang, Go, Haskell, JavaScript, Python, and Rust. You probably won’t use it, but it’s good to know it exists if you find yourself searching for maximum performance.
Another competing high-performance serialization format is flatbuffers. This was created at Google and open sourced. Facebook switched from JSON to Flatbuffers for disk storage and communication with Facebook servers.
There are many other formats. Like any software frameworks, you will have to find one that not only provides good performance but provides support for your languages and systems and is actively supported.
Distributed Objects
As object-oriented programming became dominant, traditional RPC systems proved inadequate for some object-oriented constructs. Objects have identity, state, and lifecycle, all of which create challenges in distributed systems.
The Object Problem
A remote object is fundamentally different from a remote procedure:
Identity: Each object instance is distinct. When you call a method on object A, you expect it to affect A, not some other object B of the same class. The RPC system must track which specific object instance a call targets.
State: Objects maintain state between method calls. If instantiate an account and call account.deposit(100) followed by account.getBalance(), you expect the balance to reflect the deposit. This requires the server to maintain per-object state and route calls to the correct instance.
Lifecycle: Objects are created and destroyed. Unlike stateless procedures that simply exist, objects must be instantiated on demand and cleaned up when no longer needed.
Systems like Microsoft’s DCOM/COM+ and Java RMI addressed these challenges with mechanisms for remote object identity, stateful interaction, and distributed garbage collection.
Microsoft DCOM and COM+
Microsoft’s Distributed Component Object Model (DCOM), released in 1996, extended Microsoft’s Component Object Model (COM) to support remote objects. As mentioned earlier, DCOM built on DCE RPC but added the infrastructure needed for distributed objects.
COM defines a binary interface standard for component interfaces, allowing components written in different languages (C++, Visual Basic, Delphi) to interoperate. DCOM extended this across network boundaries, allowing remote method invocation on objects located on other machines.
DCOM tracked object instances using interface pointer identifiers. When a client obtained a reference to a remote object, the identifier ensured that subsequent method calls targeted that specific instance. The server maintained state for each object instance and routed calls accordingly.
COM introduced the concept of surrogate processes (dllhost.exe), which allowed components to run in a standard Windows-provided host. When a client requested a remote object, Windows could automatically start the surrogate and load the component’s code, eliminating the need for developers to write custom server processes. This provided fault isolation since component crashes affected only the surrogate, not the client.
DCOM faced significant real-world deployment challenges:
-
Configuration complexity: Setting up DCOM across firewalls and different Windows security domains required extensive configuration of registry settings, DCOM permissions, and network protocols
-
Platform dependence: Despite theoretical cross-platform support, DCOM remained primarily a Windows technology
-
Stateful objects: Maintaining server-side object state complicated load balancing and fault tolerance since state was tied to specific server processes
COM+ (introduced with Windows 2000) added enterprise features like transaction support, object pooling, and queued components. However, DCOM’s complexity led Microsoft to embrace web services and eventually .NET technologies. DCOM remains primarily as legacy technology in enterprise Windows environments.
Java RMI
Java Remote Method Invocation (RMI) was introduced in 1997 with Java 1.1 as Java’s native approach to distributed objects. Unlike CORBA or DCOM which attempted cross-language communication, RMI was designed specifically for Java-to-Java communication, allowing it to leverage Java’s type system and garbage collection.
In Java RMI, a remote object is created on the server and implements a remote interface that extends java.rmi.Remote. The server registers this object with an RMI registry. Clients look up the remote object by name in the registry and receive a stub (proxy) that implements the same interface. When the client calls methods on the stub, RMI transparently marshals the arguments, sends them to the server, and returns the results.
// Remote interface
public interface Calculator extends Remote {
int add(int a, int b) throws RemoteException;
}
// Server implementation
public class CalculatorImpl extends UnicastRemoteObject implements Calculator {
public int add(int a, int b) throws RemoteException {
return a + b;
}
}
RMI uses dynamic stub generation through Java’s reflection capabilities. Starting with Java 1.5, stubs are generated at runtime by the JVM rather than requiring a separate compilation step with the rmic tool. This simplifies development but requires the Java runtime’s reflection support.
RMI handles distributed garbage collection through leases and dirty messages, which we discuss in the next section. Objects are serialized using Java’s native serialization mechanism, which can handle complex object graphs but is Java-specific and relatively inefficient compared to modern binary formats.
Java RMI influenced later distributed object systems but its Java-only nature limited adoption in heterogeneous environments. It remains relevant primarily in legacy enterprise Java applications. Modern Java microservices typically use gRPC or REST instead.
Distributed Garbage Collection
The lifecycle problem is particularly challenging. In a local program, the garbage collector tracks object references and frees memory when objects become have no more references to them. In a distributed system, references span machine boundaries, and the garbage collector on one machine cannot see references held by another.
Reference counting seems like a solution: the server tracks how many clients hold references to each object, deleting objects when the count reaches zero. Clients send “add reference” messages when they obtain a reference and “release reference” messages when they are done.
This fails for several reasons:
-
If a client crashes without sending “release,” the count never reaches zero and the object leaks
-
Network messages can be lost, corrupting the count
-
Network partitions can make it impossible for clients to send releases
Leases address the crash problem. Instead of permanent references, clients obtain time-limited leases on objects. To keep an object alive, the client must periodically renew the lease by sending a heartbeat or “dirty” message. If the client crashes or loses connectivity, it stops renewing, and the lease eventually expires. The server can then safely delete the object.
Keep-alive pinging (heartbeats) is a related approach where clients periodically send a list of all objects they are using. The server treats silence as abandonment. Microsoft’s COM+ uses this approach, with clients sending periodic “ping sets” listing active objects.
The concepts are similar but the distinction is that leases use time bounds while keep-alive pinging requires continuous pings. If several consecutive pings are not received, the server assumes the client is dead:
-
Lease: “you own this reference until time T, unless you renew”
-
Pinging: “I will assume you’re dead if I stop hearing from you”
Java RMI uses lease-based collection with explicit dirty/clean messages. When the first reference to a remote object is made, the client JVM sends a dirty message. As long as local references exist, the client periodically sends dirty messages to renew the lease. When the client’s garbage collector detects no more local references, it sends a clean message.
Microsoft COM+/DCOM uses pinging. The client periodically sends ping messages that contain a list of all the objects (interface IDs) that the client still holds. If the server does not receive pings for several intervals, it assumes the client is gone and releases those references.
Both approaches accept that objects might occasionally be deleted prematurely (if network delays prevent timely renewal) or kept alive slightly longer than necessary. These tradeoffs are generally acceptable given the alternative of memory leaks from crashed clients.
Modern RPC Frameworks
The early RPC systems were designed for local networks with homogeneous systems. As distributed computing moved to the internet with heterogeneous systems, cloud environments, and microservices, new challenges emerged:
-
Heterogeneous (polyglot) environments: Services written in different languages need to communicate
-
High performance: Microservice architectures multiply the cost of serialization and network overhead
-
Streaming: Request-response is insufficient for real-time data feeds or large transfers
-
Observability: Complex call graphs require sophisticated debugging tools
Several modern RPC frameworks address these challenges. Unlike REST APIs designed for public web consumption, these frameworks focus on efficient internal service-to-service communication.
gRPC
Developed at Google and released in 2015, gRPC uses Protocol Buffers for serialization and HTTP/2 for transport. HTTP/2 improves on HTTP/1.1 in several ways relevant to RPC: it allows multiple requests and responses to flow over a single TCP connection simultaneously (rather than waiting for each to complete before starting the next), uses a compact binary format for headers, and supports server-initiated messages. These features reduce connection overhead and latency.
gRPC supports four communication patterns:
-
Unary (traditional request-response)
-
Server streaming (one request, multiple responses)
-
Client streaming (multiple requests, one response)
-
Bidirectional streaming (multiple requests and responses interleaved)
It also includes built-in support for deadlines, cancellation, and metadata propagation. gRPC has become the dominant choice for internal microservice communication.
Apache Thrift
Originally developed at Facebook, Thrift emphasizes cross-language support with code generators for many languages. Unlike gRPC’s fixed use of HTTP/2 and Protocol Buffers, Thrift supports multiple transports (TCP, HTTP, pipes) and protocols (binary, compact, JSON). This flexibility is useful when different parts of a system have different requirements.
Smithy
Smithy is Amazon’s interface definition language for service APIs, used internally and for AWS service definitions. Rather than defining a wire protocol, Smithy focuses on API modeling: describing operations, inputs, outputs, and errors in a way that can generate clients for multiple protocols (REST, RPC, etc.).
Finagle
Developed at Twitter, Finagle is a protocol-agnostic RPC system for the JVM. It emphasizes composability: services are built as chains of filters that can add concerns like retries, timeouts, tracing, and load balancing. This modular approach allows sophisticated behavior without modifying application code.
These modern systems share common themes: strong typing through IDLs, efficient binary serialization, support for streaming, and built-in observability. They are primarily designed for internal service-to-service communication rather than public web APIs, where REST over HTTP typically remains the standard due to its simplicity and universal client support.
Summary
Remote procedure calls transform network communication from a low-level message-passing model to a familiar procedural interface. The key concepts include:
-
Stubs and skeletons provide the illusion of local function calls, generated either statically from IDL or dynamically through reflection
-
Marshalling and serialization convert data for network transmission, handling differences in data representation (including byte ordering) across systems
-
Schemas define data structure and enable validation, code generation, and evolution
-
Versioning allows interfaces to change while maintaining compatibility with existing clients
-
Service discovery enables clients to locate services through name servers, DNS, configuration, or service meshes
-
Security through TLS provides encryption and authentication for RPC traffic
-
Synchronous execution is the default, though asynchronous variants exist for non-blocking calls
-
Failure handling through timeouts, deadlines, cancellation, and retries is essential for reliability
-
Idempotency determines whether operations are safe to retry
-
Distributed garbage collection through leases and heartbeats manages remote object lifecycles
-
Observability through request IDs and tracing enables debugging of distributed systems
-
Modern frameworks like gRPC add streaming, efficient binary protocols, and built-in operational features
Understanding these mechanisms and their trade-offs is essential for building reliable distributed systems, whether using traditional RPC frameworks or modern systems like gRPC.
Next: Web services
Back to CS 417 Documents
-
Note that even if a language doesn’t support pointers, they may be used behind the scenes. For example, references to Java objects are simply pointers. ↩
-
Java RMI stands for Remote Method Invocation, Java’s name for their RPC framework. ↩
-
UUIDs are created from a link-layer network address, timestamp, and sequence number; see RFC 9562. ↩