--- /dev/null
+<html>
+ <body>
+ <h1>libvirt RPC infrastructure</h1>
+
+ <ul id="toc"></ul>
+
+ <p>
+ libvirt includes a basic protocol and code to implement
+ an extensible, secure client/server RPC service. This was
+ originally designed for communication between the libvirt
+ client library and the libvirtd daemon, but the code is
+ now isolated to allow reuse in other areas of libvirt code.
+ This document provides an overview of the protocol and
+ structure / operation of the internal RPC library APIs.
+ </p>
+
+
+ <h2><a name="protocol">RPC protocol</a></h2>
+
+ <p>
+ libvirt uses a simple, variable length, packet based RPC protocol.
+ All structured data within packets is encoded using the
+ <a href="http://en.wikipedia.org/wiki/External_Data_Representation">XDR standard</a>
+ as currently defined by <a href="https://tools.ietf.org/html/rfc4506">RFC 4506</a>.
+ On any connection running the RPC protocol, there can be multiple
+ programs active, each supporting one or more versions. A program
+ defines a set of procedures that it supports. The procedures can
+ support call+reply method invocation, asynchronous events,
+ and generic data streams. Method invocations can be overlapped,
+ so waiting for a reply to one will not block the receipt of the
+ reply to another outstanding method. The protocol was loosely
+ inspired by the design of SunRPC. The definition of the RPC
+ protocol is in the file <code>src/rpc/virnetprotocol.x</code>
+ in the libvirt source tree.
+ </p>
+
+ <h3><a href="protocolframing">Packet framing</a></h3>
+
+ <p>
+ On the wire, there is no explicit packet framing marker. Instead
+ each packet is preceded by an unsigned 32-bit integer giving
+ the total length of the packet in bytes. This length includes
+ the 4-bytes of the length word itself. Conceptually the framing
+ looks like this:
+ </p>
+
+<pre>
+|~~~ Packet 1 ~~~|~~~ Packet 2 ~~~|~~~ Packet 3 ~~~|~~~
+
++-------+------------+-------+------------+-------+------------+...
+| n=U32 | (n-4) * U8 | n=U32 | (n-4) * U8 | n=U32 | (n-4) * U8 |
++-------+------------+-------+------------+-------+------------+...
+
+|~ Len ~|~ Data ~|~ Len ~|~ Data ~|~ Len ~|~ Data ~|~
+
+</pre>
+
+ <h3><a href="protocoldata">Packet data</a></h3>
+
+ <p>
+ The data in each packet is split into two parts, a short
+ fixed length header, followed by a variable length payload.
+ So a packet from the illustration above is more correctly
+ shown as
+ </p>
+
+<pre>
+
++-------+-------------+---------------....---+
+| n=U32 | 6*U32 | (n-(7*4))*U8 |
++-------+-------------+---------------....---+
+
+|~ Len ~|~ Header ~|~ Payload .... ~|
+</pre>
+
+
+ <h3><a href="protocolheader">Packet header</a></h3>
+ <p>
+ The header contains 6 fields, encoded as signed/unsigned 32-bit
+ integers.
+ </p>
+
+ <pre>
++---------------+
+| program=U32 |
++---------------+
+| version=U32 |
++---------------+
+| procedure=S32 |
++---------------+
+| type=S32 |
++---------------+
+| serial=U32 |
++---------------+
+| status=S32 |
++---------------+
+ </pre>
+
+ <dl>
+ <dt><code>program</code></dt>
+ <dd>
+ This is an arbitrarily chosen number that will uniquely
+ identify the "service" running over the stream.
+ </dd>
+ <dt><code>version</code></dt>
+ <dd>
+ This is the version number of the program, by convention
+ starting from '1'. When an incompatible change is made
+ to a program, the version number is incremented. Ideally
+ both versions will then be supported on the wire in
+ parallel for backwards compatibility.
+ </dd>
+ <dt><code>procedure</code></dt>
+ <dd>
+ This is an arbitrarily chosen number that will uniquely
+ identify the method call, or event associated with the
+ packet. By convention, procedure numbers start from 1
+ and are assigned monotonically thereafter.
+ </dd>
+ <dt><code>type</code></dt>
+ <dd>
+ <p>
+ This can be one of the following enumeration values
+ </p>
+ <ol>
+ <li>call: invocation of a method call</li>
+ <li>reply: completion of a method call</li>
+ <li>event: an asynchronous event</li>
+ <li>stream: control info or data from a stream</li>
+ </ol>
+ </dd>
+ <dt><code>serial</code></dt>
+ <dd>
+ This is an number that starts from 1 and increases
+ each time a method call packet is sent. A reply or
+ stream packet will have a serial number matching the
+ original method call packet serial. Events always
+ have the serial number set to 0.
+ </dd>
+ <dt><code>status</code></dt>
+ <dd>
+ <p>
+ This can one of the following enumeration values
+ </p>
+ <ol>
+ <li>ok: a normal packet. this is always set for method calls or events.
+ For replies it indicates successful completion of the method. For
+ streams it indicates confirmation of the end of file on the stream.</li>
+ <li>error: for replies this indicates that the method call failed
+ and error information is being returned. For streams this indicates
+ that not all data was sent and the stream has aborted</li>
+ <li>continue: for streams this indicates that further data packets
+ will be following</li>
+ </ol>
+ </dl>
+
+ <h3><a href="protocolpayload">Packet payload</a></h3>
+
+ <p>
+ The payload of a packet will vary depending on the <code>type</code>
+ and <code>status</code> fields from the header.
+ </p>
+
+ <ul>
+ <li>type=call: the in parameters for the method call, XDR encoded</li>
+ <li>type=reply+status=ok: the return value and/or out parameters for the method call, XDR encoded</li>
+ <li>type=reply+status=error: the error information for the method, a virErrorPtr XDR encoded</li>
+ <li>type=event: the parameters for the event, XDR encoded</li>
+ <li>type=stream+status=ok: no payload</li>
+ <li>type=stream+status=error: the error information for the method, a virErrorPtr XDR encoded</li>
+ <li>type=stream+status=continue: the raw bytes of data for the stream. No XDR encoding</li>
+ </ul>
+
+ <p>
+ For the exact payload information for each procedure, consult the XDR protocol
+ definition for the program+version in question
+ </p>
+
+ <h3><a name="wireexamples">Wire examples</a></h3>
+
+ <p>
+ The following diagrams illustrate some example packet exchanges
+ between a client and server
+ </p>
+
+ <h4><a name="wireexamplescall">Method call</a></h4>
+
+ <p>
+ A single method call and successful
+ reply, for a program=8, version=1, procedure=3, which 10 bytes worth
+ of input args, and 4 bytes worth of return values. The overall input
+ packet length is 4 + 24 + 10 == 38, and output packet length 32
+ </p>
+
+ <pre>
+ +--+-----------------------+-----------+
+ C --> |38| 8 | 1 | 3 | 0 | 1 | 0 | .o.oOo.o. | --> S (call)
+ +--+-----------------------+-----------+
+
+ +--+-----------------------+--------+
+ C <-- |32| 8 | 1 | 3 | 1 | 1 | 0 | .o.oOo | <-- S (reply)
+ +--+-----------------------+--------+
+ </pre>
+
+ <h4><a name="wireexamplescallerr">Method call with error</a></h4>
+
+ <p>
+ An unsuccessful method call will instead return an error object
+ </p>
+
+ <pre>
+ +--+-----------------------+-----------+
+ C --> |38| 8 | 1 | 3 | 0 | 1 | 0 | .o.oOo.o. | --> S (call)
+ +--+-----------------------+-----------+
+
+ +--+-----------------------+--------------------------+
+ C <-- |48| 8 | 1 | 3 | 2 | 1 | 0 | .o.oOo.o.oOo.o.oOo.o.oOo | <-- S (error)
+ +--+-----------------------+--------------------------+
+ </pre>
+
+ <h4><a name="wireexamplescallup">Method call with upload stream</a></h4>
+
+ <p>
+ A method call which also involves uploading some data over
+ a stream will result in
+ </p>
+
+ <pre>
+ +--+-----------------------+-----------+
+ C --> |38| 8 | 1 | 3 | 0 | 1 | 0 | .o.oOo.o. | --> S (call)
+ +--+-----------------------+-----------+
+
+ +--+-----------------------+--------+
+ C <-- |32| 8 | 1 | 3 | 1 | 1 | 0 | .o.oOo | <-- S (reply)
+ +--+-----------------------+--------+
+
+ +--+-----------------------+-------------....-------+
+ C --> |38| 8 | 1 | 3 | 3 | 1 | 2 | .o.oOo.o.oOo....o.oOo. | --> S (stream data up)
+ +--+-----------------------+-------------....-------+
+ +--+-----------------------+-------------....-------+
+ C --> |38| 8 | 1 | 3 | 3 | 1 | 2 | .o.oOo.o.oOo....o.oOo. | --> S (stream data up)
+ +--+-----------------------+-------------....-------+
+ +--+-----------------------+-------------....-------+
+ C --> |38| 8 | 1 | 3 | 3 | 1 | 2 | .o.oOo.o.oOo....o.oOo. | --> S (stream data up)
+ +--+-----------------------+-------------....-------+
+ ...
+ +--+-----------------------+-------------....-------+
+ C --> |38| 8 | 1 | 3 | 3 | 1 | 2 | .o.oOo.o.oOo....o.oOo. | --> S (stream data up)
+ +--+-----------------------+-------------....-------+
+ +--+-----------------------+
+ C --> |24| 8 | 1 | 3 | 3 | 1 | 0 | --> S (stream finish)
+ +--+-----------------------+
+ +--+-----------------------+
+ C <-- |24| 8 | 1 | 3 | 3 | 1 | 0 | <-- S (stream finish)
+ +--+-----------------------+
+ </pre>
+
+ <h4><a name="wireexamplescallbi">Method call bidirectional stream</a></h4>
+
+ <p>
+ A method call which also involves a bi-directional stream will
+ result in
+ </p>
+
+ <pre>
+ +--+-----------------------+-----------+
+ C --> |38| 8 | 1 | 3 | 0 | 1 | 0 | .o.oOo.o. | --> S (call)
+ +--+-----------------------+-----------+
+
+ +--+-----------------------+--------+
+ C <-- |32| 8 | 1 | 3 | 1 | 1 | 0 | .o.oOo | <-- S (reply)
+ +--+-----------------------+--------+
+
+ +--+-----------------------+-------------....-------+
+ C --> |38| 8 | 1 | 3 | 3 | 1 | 2 | .o.oOo.o.oOo....o.oOo. | --> S (stream data up)
+ +--+-----------------------+-------------....-------+
+ +--+-----------------------+-------------....-------+
+ C --> |38| 8 | 1 | 3 | 3 | 1 | 2 | .o.oOo.o.oOo....o.oOo. | --> S (stream data up)
+ +--+-----------------------+-------------....-------+
+ +--+-----------------------+-------------....-------+
+ C <-- |38| 8 | 1 | 3 | 3 | 1 | 2 | .o.oOo.o.oOo....o.oOo. | <-- S (stream data down)
+ +--+-----------------------+-------------....-------+
+ +--+-----------------------+-------------....-------+
+ C --> |38| 8 | 1 | 3 | 3 | 1 | 2 | .o.oOo.o.oOo....o.oOo. | --> S (stream data up)
+ +--+-----------------------+-------------....-------+
+ +--+-----------------------+-------------....-------+
+ C --> |38| 8 | 1 | 3 | 3 | 1 | 2 | .o.oOo.o.oOo....o.oOo. | --> S (stream data up)
+ +--+-----------------------+-------------....-------+
+ +--+-----------------------+-------------....-------+
+ C <-- |38| 8 | 1 | 3 | 3 | 1 | 2 | .o.oOo.o.oOo....o.oOo. | <-- S (stream data down)
+ +--+-----------------------+-------------....-------+
+ +--+-----------------------+-------------....-------+
+ C <-- |38| 8 | 1 | 3 | 3 | 1 | 2 | .o.oOo.o.oOo....o.oOo. | <-- S (stream data down)
+ +--+-----------------------+-------------....-------+
+ +--+-----------------------+-------------....-------+
+ C <-- |38| 8 | 1 | 3 | 3 | 1 | 2 | .o.oOo.o.oOo....o.oOo. | <-- S (stream data down)
+ +--+-----------------------+-------------....-------+
+ +--+-----------------------+-------------....-------+
+ C --> |38| 8 | 1 | 3 | 3 | 1 | 2 | .o.oOo.o.oOo....o.oOo. | --> S (stream data up)
+ +--+-----------------------+-------------....-------+
+ ..
+ +--+-----------------------+-------------....-------+
+ C --> |38| 8 | 1 | 3 | 3 | 1 | 2 | .o.oOo.o.oOo....o.oOo. | --> S (stream data up)
+ +--+-----------------------+-------------....-------+
+ +--+-----------------------+
+ C --> |24| 8 | 1 | 3 | 3 | 1 | 0 | --> S (stream finish)
+ +--+-----------------------+
+ +--+-----------------------+
+ C <-- |24| 8 | 1 | 3 | 3 | 1 | 0 | <-- S (stream finish)
+ +--+-----------------------+
+ </pre>
+
+
+ <h4><a name="wireexamplescallmany">Method calls overlapping</a></h4>
+ <pre>
+ +--+-----------------------+-----------+
+ C --> |38| 8 | 1 | 3 | 0 | 1 | 0 | .o.oOo.o. | --> S (call 1)
+ +--+-----------------------+-----------+
+ +--+-----------------------+-----------+
+ C --> |38| 8 | 1 | 3 | 0 | 2 | 0 | .o.oOo.o. | --> S (call 2)
+ +--+-----------------------+-----------+
+ +--+-----------------------+--------+
+ C <-- |32| 8 | 1 | 3 | 1 | 2 | 0 | .o.oOo | <-- S (reply 2)
+ +--+-----------------------+--------+
+ +--+-----------------------+-----------+
+ C --> |38| 8 | 1 | 3 | 0 | 3 | 0 | .o.oOo.o. | --> S (call 3)
+ +--+-----------------------+-----------+
+ +--+-----------------------+--------+
+ C <-- |32| 8 | 1 | 3 | 1 | 3 | 0 | .o.oOo | <-- S (reply 3)
+ +--+-----------------------+--------+
+ +--+-----------------------+-----------+
+ C --> |38| 8 | 1 | 3 | 0 | 4 | 0 | .o.oOo.o. | --> S (call 4)
+ +--+-----------------------+-----------+
+ +--+-----------------------+--------+
+ C <-- |32| 8 | 1 | 3 | 1 | 1 | 0 | .o.oOo | <-- S (reply 1)
+ +--+-----------------------+--------+
+ +--+-----------------------+--------+
+ C <-- |32| 8 | 1 | 3 | 1 | 4 | 0 | .o.oOo | <-- S (reply 4)
+ +--+-----------------------+--------+
+ </pre>
+
+
+ <h2><a name="security">RPC security</a></h2>
+
+ <p>
+ There are various things to consider to ensure an implementation
+ of the RPC protocol can be satisfactorily secured
+ </p>
+
+ <h3><a name="securitytls">Authentication/encryption</a></h3>
+
+ <p>
+ The basic RPC protocol does not define or require any specific
+ authentication/encryption capabilities. A generic solution to
+ providing encryption for the protocol is to run the protocol
+ over a TLS encrypted data stream. x509 certificate checks can
+ be done to form a crude authentication mechanism. It is also
+ possible for an RPC program to negotiate an encryption /
+ authentication capability, such as SASL, which may then also
+ provide per-packet data encryption. Finally the protocol data
+ stream can of course be tunnelled over transports such as SSH.
+ </p>
+
+ <h3><a name="securitylimits">Data limits</a></h3>
+
+ <p>
+ Although the protocol itself defines many arbitrary sized data values in the
+ payloads, to avoid denial of service attack there are a number of size limit
+ checks prior to encoding or decoding data. There is a limit on the maximum
+ size of a single RPC message, limit on the maximum string length, and limits
+ on any other parameter which uses a variable length array. These limits can
+ be raised, subject to agreement between client/server, without otherwise
+ breaking compatibility of the RPC data on the wire.
+ </p>
+
+ <h3><a name="securityvalidate">Data validation</a></h3>
+
+ <p>
+ It is important that all data be fully validated before performing
+ any actions based on the data. When reading an RPC packet, the
+ first four bytes must be read and the max packet size limit validated,
+ before any attempt is made to read the variable length packet data.
+ After a complete packet has been read, the header must be decoded
+ and all 6 fields fully validated, before attempting to dispatch
+ the payload. Once dispatched, the payload can be decoded and passed
+ onto the appropriate API for execution. The RPC code must not take
+ any action based on the payload, since it has no way to validate
+ the semantics of the payload data. It must delegate this to the
+ execution API (e.g. corresponding libvirt public API).
+ </p>
+
+ <h2><a name="internals">RPC internal APIs</a></h2>
+
+ <p>
+ The generic internal RPC library code lives in the <code>src/rpc/</code>
+ directory of the libvirt source tree. Unless otherwise noted, the
+ objects are all threadsafe. The core object types and their
+ purposes are:
+ </p>
+
+ <h3><a name="apioverview">Overview of RPC objects</a></h3>
+
+ <p>
+ The following is a high level overview of the role of each
+ of the main RPC objects
+ </p>
+
+ <dl>
+ <dt><code>virNetSASLContextPtr</code> (virnetsaslcontext.h)</dt>
+ <dd>The virNetSASLContext APIs maintain SASL state for a network
+ service (server or client). This is primarily used on the server
+ to provide a whitelist of allowed SASL usernames for clients.
+ </dd>
+
+ <dt><code>virNetSASLSessionPtr</code> (virnetsaslcontext.h)</dt>
+ <dd>The virNetSASLSession APIs maintain SASL state for a single
+ network connection (socket). This is used to perform the multi-step
+ SASL handshake and perform encryption/decryption of data once
+ authenticated, via integration with virNetSocket.
+ </dd>
+
+ <dt><code>virNetTLSContextPtr</code> (virnettlscontext.h)</dt>
+ <dd>The virNetTLSContext APIs maintain TLS state for a network
+ service (server or client). This is primarily used on the server
+ to provide a whitelist of allowed x509 distinguished names, as
+ well as diffie-hellman keys. It can also do validation of
+ x509 certificates prior to initiating a connection, in order
+ to improve detection of configuration errors.
+ </dd>
+
+ <dt><code>virNetTLSSessionPtr</code> (virnettlscontext.h)</dt>
+ <dd>The virNetTLSSession APIs maintain TLS state for a single
+ network connection (socket). This is used to perform the multi-step
+ TLS handshake and perform encryption/decryption of data once
+ authenticated, via integration with virNetSocket.
+ </dd>
+
+ <dt><code>virNetSocketPtr</code> (virnetsocket.h)</dt>
+ <dd>The virNetSocket APIs provide a higher level wrapper around
+ the raw BSD sockets and getaddrinfo APIs. They allow for creation
+ of both server and client sockets. Data transports supported are
+ TCP, UNIX, SSH tunnel or external command tunnel. Internally the
+ TCP socket impl uses the getaddrinfo info APIs to ensure correct
+ protocol-independent behaviour, thus supporting both IPv4 and IPv6.
+ The socket APIs can be associated with a virNetSASLSessionPtr or
+ virNetTLSSessionPtr object to allow seamless encryption/decryption
+ of all writes and reads. For UNIX sockets it is possible to obtain
+ the remote client user ID and process ID. Integration with the
+ libvirt event loop also allows use of callbacks for notification
+ of various I/O conditions
+ </dd>
+
+ <dt><code>virNetMessagePtr</code> (virnetmessage.h)</dt>
+ <dd>The virNetMessage APIs provide a wrapper around the libxdr
+ API calls, to facilitate processing and creation of RPC
+ packets. There are convenience APIs for encoding/encoding the
+ packet headers, encoding/decoding the payload using an XDR
+ filter, encoding/decoding a raw payload (for streams), and
+ encoding a virErrorPtr object. There is also a means to
+ add to/serve from a linked-list queue of messages.</dd>
+
+ <dt><code>virNetClientPtr</code> (virnetclient.h)</dt>
+ <dd>The virNetClient APIs provide a way to connect to a
+ remote server and run one or more RPC protocols over
+ the connection. Connections can be made over TCP, UNIX
+ sockets, SSH tunnels, or external command tunnels. There
+ is support for both TLS and SASL session encryption.
+ The client also supports management of multiple data streams
+ over each connection. Each client object can be used from
+ multiple threads concurrently, with method calls/replies
+ being interleaved on the wire as required.
+ </dd>
+
+ <dt><code>virNetClientProgramPtr</code> (virnetclientprogram.h)</dt>
+ <dd>The virNetClientProgram APIs are used to register a
+ program+version with the connection. This then enables
+ invocation of method calls, receipt of asynchronous
+ events and use of data streams, within that program+version.
+ When created a set of callbacks must be supplied to take
+ care of dispatching any incoming asynchronous events.
+ </dd>
+
+ <dt><code>virNetClientStreamPtr</code> (virnetclientstream.h)</dt>
+ <dd>The virNetClientStream APIs are used to control transmission and
+ receipt of data over a stream active on a client. Streams provide
+ a low latency, unlimited length, bi-directional raw data exchange
+ mechanism layered over the RPC connection
+ </dd>
+
+ <dt><code>virNetServerPtr</code> (virnetserver.h)</dt>
+ <dd>The virNetServer APIs are used to manage a network server. A
+ server exposed one or more programs, over one or more services.
+ It manages multiple client connections invoking multiple RPC
+ calls in parallel, with dispatch across multiple worker threads.
+ </dd>
+
+ <dt><code>virNetServerMDNSPtr</code> (virnetservermdns.h)</dt>
+ <dd>The virNetServerMDNS APIs are used to advertise a server
+ across the local network, enabling clients to automatically
+ detect the existence of remote services. This is done by
+ interfacing with the Avahi mDNS advertisement service.
+ </dd>
+
+ <dt><code>virNetServerClientPtr</code> (virnetserverclient.h)</dt>
+ <dd>The virNetServerClient APIs are used to manage I/O related
+ to a single client network connection. It handles initial
+ validation and routing of incoming RPC packets, and transmission
+ of outgoing packets.
+ </dd>
+
+ <dt><code>virNetServerProgramPtr</code> (virnetserverprogram.h)</dt>
+ <dd>The virNetServerProgram APIs are used to provide the implementation
+ of a single program/version set. Primarily this includes a set of
+ callbacks used to actually invoke the APIs corresponding to
+ program procedure numbers. It is responsible for all the serialization
+ of payloads to/from XDR.</dd>
+
+ <dt><code>virNetServerServicePtr</code> (virnetserverservice.h)</dt>
+ <dd>The virNetServerService APIs are used to connect the server to
+ one or more network protocols. A single service may involve multiple
+ sockets (ie both IPv4 and IPv6). A service also has an associated
+ authentication policy for incoming clients.
+ </dd>
+ </dl>
+
+ <h3><a name="apiclientdispatch">Client RPC dispatch</a></h3>
+
+ <p>
+ The client RPC code must allow for multiple overlapping RPC method
+ calls to be invoked, transmission and receipt of data for multiple
+ streams and receipt of asynchronous events. Understandably this
+ involves coordination of multiple threads.
+ </p>
+
+ <p>
+ The core requirement in the client dispatch code is that only
+ one thread is allowed to be performing I/O on the socket at
+ any time. This thread is said to be "holding the buck". When
+ any other thread comes along and needs to do I/O it must place
+ its packets on a queue and delegate processing of them to the
+ thread that has the buck. This thread will send out the method
+ call, and if it sees a reply will pass it back to the waiting
+ thread. If the other thread's reply hasn't arrived, by the time
+ the main thread has got its own reply, then it will transfer
+ responsibility for I/O to the thread that has been waiting the
+ longest. It is said to be "passing the buck" for I/O.
+ </p>
+
+ <p>
+ When no thread is performing any RPC method call, or sending
+ stream data there is still a need to monitor the socket for
+ incoming I/O related to asynchronous events, or stream data
+ receipt. For this task, a watch is registered with the event
+ loop which triggers whenever the socket is readable. This
+ watch is automatically disabled whenever any other thread
+ grabs the buck, and re-enabled when the buck is released.
+ </p>
+
+ <h4><a name="apiclientdispatchex1">Example with buck passing</a></h4>
+
+ <p>
+ In the first example, a second thread issues a API call
+ while the first thread holds the buck. The reply to the
+ first call arrives first, so the buck is passed to the
+ second thread.
+ </p>
+
+ <pre>
+ Thread-1
+ |
+ V
+ Call API1()
+ |
+ V
+ Grab Buck
+ | Thread-2
+ V |
+ Send method1 V
+ | Call API2()
+ V |
+ Wait I/O V
+ |<--------Queue method2
+ V |
+ Send method2 V
+ | Wait for buck
+ V |
+ Wait I/O |
+ | |
+ V |
+ Recv reply1 |
+ | |
+ V |
+ Pass the buck----->|
+ | V
+ V Wait I/O
+ Return API1() |
+ V
+ Recv reply2
+ |
+ V
+ Release the buck
+ |
+ V
+ Return API2()
+ </pre>
+
+ <h4><a name="apiclientdispatchex2">Example without buck passing</a></h4>
+
+ <p>
+ In this second example, a second thread issues an API call
+ which is sent and replied to, before the first thread's
+ API call has completed. The first thread thus notifies
+ the second that its reply is ready, and there is no need
+ to pass the buck
+ </p>
+
+ <pre>
+ Thread-1
+ |
+ V
+ Call API1()
+ |
+ V
+ Grab Buck
+ | Thread-2
+ V |
+ Send method1 V
+ | Call API2()
+ V |
+ Wait I/O V
+ |<--------Queue method2
+ V |
+ Send method2 V
+ | Wait for buck
+ V |
+ Wait I/O |
+ | |
+ V |
+ Recv reply2 |
+ | |
+ V |
+ Notify reply2------>|
+ | V
+ V Return API2()
+ Wait I/O
+ |
+ V
+ Recv reply1
+ |
+ V
+ Release the buck
+ |
+ V
+ Return API1()
+ </pre>
+
+ <h4><a name="apiclientdispatchex3">Example with async events</a></h4>
+
+ <p>
+ In this example, only one thread is present and it has to
+ deal with some async events arriving. The events are actually
+ dispatched to the application from the event loop thread
+ </p>
+
+ <pre>
+ Thread-1
+ |
+ V
+ Call API1()
+ |
+ V
+ Grab Buck
+ |
+ V
+ Send method1
+ |
+ V
+ Wait I/O
+ | Event thread
+ V ...
+ Recv event1 |
+ | V
+ V Wait for timer/fd
+ Queue event1 |
+ | V
+ V Timer fires
+ Wait I/O |
+ | V
+ V Emit event1
+ Recv reply1 |
+ | V
+ V Wait for timer/fd
+ Return API1() |
+ ...
+ </pre>
+
+ <h3><a name="apiserverdispatch">Server RPC dispatch</a></h3>
+
+ <p>
+ The RPC server code must support receipt of incoming RPC requests from
+ multiple client connections, and parallel processing of all RPC
+ requests, even many from a single client. This goal is achieved through
+ a combination of event driven I/O, and multiple processing threads.
+ </p>
+
+ <p>
+ The main libvirt event loop thread is responsible for performing all
+ socket I/O. It will read incoming packets from clients and willl
+ transmit outgoing packets to clients. It will handle the I/O to/from
+ streams associated with client API calls. When doing client I/O it
+ will also pass the data through any applicable encryption layer
+ (through use of the virNetSocket / virNetTLSSession and virNetSASLSession
+ integration). What is paramount is that the event loop thread never
+ do any task that can take a non-trivial amount of time.
+ </p>
+
+ <p>
+ When reading packets, the event loop will first read the 4 byte length
+ word. This is validated to make sure it does not exceed the maximum
+ permissible packet size, and the client is set to allow receipt of the
+ rest of the packet data. Once a complete packet has been received, the
+ next step is to decode the RPC header. The header is validated to
+ ensure the request is sensible, ie the server should not receive a
+ method reply from a client. If the client has not yet authenticated,
+ a security check is also applied to make sure the procedure is on the
+ whitelist of those allowed prior to auth. If the packet is a method
+ call, it will be placed on a global processing queue. The event loop
+ thread is now done with the packet for the time being.
+ </p>
+
+ <p>
+ The server has a pool of worker threads, which wait for method call
+ packets to be queued. One of them will grab the new method call off
+ the queue for processing. The first step is to decode the payload of
+ the packet to extract the method call arguments. The worker does not
+ attempt to do any semantic validation of the arguments, except to make
+ sure the size of any variable length fields is below defined limits.
+ </p>
+
+ <p>
+ The worker now invokes the libvirt API call that corresponds to the
+ procedure number in the packet header. The worker is thus kept busy
+ until the API call completes. The implementation of the API call
+ is responsible for doing semantic validation of parameters and any
+ MAC security checks on the objects affected.
+ </p>
+
+ <p>
+ Once the API call has completed, the worker thread will take the
+ return value and output parameters, or error object and encode
+ them into a reply packet. Again it does not attempt to do any
+ semantic validation of output data, aside from variable length
+ field limit checks. The worker thread puts the reply packet onto
+ the transmission queue for the client. The worker is now finished
+ and goes back to wait for another incoming method call.
+ </p>
+
+ <p>
+ The main event loop is back in charge and when the client socket
+ becomes writable, it will start sending the method reply packet
+ back to the client.
+ </p>
+
+ <p>
+ At any time the libvirt connection object can emit asynchronous
+ events. These are handled by callbacks in the main event thread.
+ The callback will simply encode the event parameters into a new
+ data packet and place the packet on the client transmission
+ queue.
+ </p>
+
+ <p>
+ Incoming and outgoing stream packets are also directly handled
+ by the main event thread. When an incoming stream packet is
+ received, instead of placing it in the global dispatch queue
+ for the worker threads, it is sidetracked into a per-stream
+ processing queue. When the stream becomes writable, queued
+ incoming stream packets will be processed, passing their data
+ payload onto the stream. Conversely when the stream becomes
+ readable, chunks of data will be read from it, encoded into
+ new outgoing packets, and placed on the client's transmit
+ queue
+ </p>
+
+ <h4><a name="apiserverdispatchex1">Example with overlapping methods</a></h4>
+
+ <p>
+ This example illustrates processing of two incoming methods with
+ overlapping execution
+ </p>
+
+ <pre>
+ Event thread Worker 1 Worker 2
+ | | |
+ V V V
+ Wait I/O Wait Job Wait Job
+ | | |
+ V | |
+ Recv method1 | |
+ | | |
+ V | |
+ Queue method1 V |
+ | Serve method1 |
+ V | |
+ Wait I/O V |
+ | Call API1() |
+ V | |
+ Recv method2 | |
+ | | |
+ V | |
+ Queue method2 | V
+ | | Serve method2
+ V V |
+ Wait I/O Return API1() V
+ | | Call API2()
+ | V |
+ V Queue reply1 |
+ Send reply1 | |
+ | V V
+ V Wait Job Return API2()
+ Wait I/O | |
+ | ... V
+ V Queue reply2
+ Send reply2 |
+ | V
+ V Wait Job
+ Wait I/O |
+ | ...
+ ...
+ </pre>
+
+ <h4><a name="apiserverdispatchex2">Example with stream data</a></h4>
+
+ <p>
+ This example illustrates processing of stream data
+ </p>
+
+ <pre>
+ Event thread
+ |
+ V
+ Wait I/O
+ |
+ V
+ Recv stream1
+ |
+ V
+ Queue stream1
+ |
+ V
+ Wait I/O
+ |
+ V
+ Recv stream2
+ |
+ V
+ Queue stream2
+ |
+ V
+ Wait I/O
+ |
+ V
+ Write stream1
+ |
+ V
+ Write stream2
+ |
+ V
+ Wait I/O
+ |
+ ...
+ </pre>
+
+ </body>
+</html>