Srivatsa S. Bhat [Tue, 29 Nov 2011 09:07:38 +0000 (17:07 +0800)]
Implement the core API to suspend/resume the host
Add the core functions that implement the functionality of the API.
Suspend is done by using an asynchronous mechanism so that we can return
the status to the caller before the host gets suspended. This asynchronous
operation is achieved by suspending the host in a separate thread of
execution. However, returning the status to the caller is only best-effort,
but not guaranteed.
To resume the host, an RTC alarm is set up (based on how long we want to
suspend) before suspending the host. When this alarm fires, the host
gets woken up.
Suspend-to-RAM operation on a host running Linux can take upto more than 20
seconds, depending on the load of the system. (Freezing of tasks, an operation
preceding any suspend operation, is given up after a 20 second timeout).
And Suspend-to-Disk can take even more time, considering the time required
for compaction, creating the memory image and writing it to disk etc.
So, we do not allow the user to specify a suspend duration of less than 60
seconds, to be on the safer side, since we don't want to prematurely declare
failure when we only had to wait for some more time.
Srivatsa S. Bhat [Tue, 29 Nov 2011 06:56:10 +0000 (14:56 +0800)]
Add 'Hybrid-Suspend' power management discovery for the host
Some systems support a feature known as 'Hybrid-Suspend', apart from the
usual system-wide sleep states such as Suspend-to-RAM (S3) or Suspend-to-Disk
(S4). Add the functionality to discover this power management feature and
export it in the capabilities XML under the <power_management> tag.
Jiri Denemark [Mon, 28 Nov 2011 16:41:25 +0000 (17:41 +0100)]
rpc: Really send non-blocking calls while waiting for another call
When another thread was dispatching while we wanted to send a
non-blocking call, we correctly queued the call and woke up the thread
but the thread just threw the call away since it forgot to recheck if
its socket was writable.
Michal Privoznik [Mon, 28 Nov 2011 16:43:51 +0000 (17:43 +0100)]
virsh: Don't traverse childless nodes in vshNodeIsSuperset
If both nodes do not have any children, we pass zero to
virBitmapAlloc which returns NULL. In turn we report OOM error
and return false (meaning nodes are different). This is not true.
Christian Franke [Mon, 28 Nov 2011 12:15:25 +0000 (13:15 +0100)]
virnetsocket: pass XAUTORITY for ssh connection
When spawning an ssh connection, the environment variables
DISPLAY, SSH_ASKPASS, ... are passed. However XAUTHORITY,
which is necessary if the .Xauthority is in a non default
place, was not passed.
Signed-off-by: Christian Franke <nobody@nowhere.ws>
Lorin Hochstein [Mon, 28 Nov 2011 14:26:57 +0000 (09:26 -0500)]
conf: make virt-xml-validate work with vbox domains
virt-xml-validate fails when run on a domain XML file of type 'vbox'.
For failing test case, see https://bugzilla.redhat.com/show_bug.cgi?id=757097
This patch updates the XML schema to accept all valid hypervisor
types, as well as dropping hypervisor types that are not in use
by the current code base.
While LXC does not have the concept of VCPUS, so we can't do
per-VCPU pCPU placement, we can support the VM level CPU
placement. Todo this simply set the CPU affinity of the LXC
controller at startup. All child processes will inherit this
affinity.
Michal Privoznik [Fri, 25 Nov 2011 12:25:19 +0000 (13:25 +0100)]
storage: Refetch file status after open
This partly reverts my previous patch f88de3eb. We need to
get file status after open, as given path could have been symlink,
so fstat() will operate on different file than lstat().
Michal Privoznik [Thu, 24 Nov 2011 15:53:30 +0000 (16:53 +0100)]
conf: Don't drop console definition on domain restart
One of my latest patches 2e37bf42d28d8bb5d045b206587c64643c64d02a
copy serial console definition. On domain shutdown we save this
info into state XML. However, later on the daemon start we simply
drop this info and since we are not re-reading qemu log,
vm->def->consoles[0] does not get populated with copy. Therefore
we need to avoid dropping console definition if it is just alias
for serial console.
If a connection to destination host is lost during peer-to-peer
migration (because keepalive protocol timed out), we won't be able to
finish the migration and it doesn't make sense to wait for qemu to
transmit all data. This patch automatically cancels such migration
without waiting for virDomainAbortJob to be called.
This API can be used to check if the socket associated with
virConnectPtr is still open or it was closed (probably because keepalive
protocol timed out). If there the connection is local (i.e., no socket
is associated with the connection, it is trivially always alive.
virConnectSetKeepAlive public API can be used by a client connecting to
remote server to start using keepalive protocol. The API is handled
directly by remote driver and not transmitted over the wire to the
server.
The keepalive program has two procedures: PING, and PONG.
Both are used only in asynchronous messages and the sender doesn't wait
for any reply. However, the party which receives PING messages is
supposed to react by sending PONG message the other party, but no
explicit binding between PING and PONG messages is made. For backward
compatibility neither server nor client are allowed to send keepalive
messages before checking that remote party supports them.
Jiri Denemark [Tue, 22 Nov 2011 13:11:02 +0000 (14:11 +0100)]
rpc: Fix handling of non-blocking calls that could not be sent
When virNetClientIOEventLoop is called for a non-blocking call and not
even a single byte can be sent from this call without blocking, we
properly reported that to the caller which properly frees the call. But
we never removed the call from a call queue.
Peter Krempa [Wed, 23 Nov 2011 14:51:28 +0000 (15:51 +0100)]
qemu: Avoid dereference of NULL pointer
If something fails while initializing qemu job object in
qemuDomainObjPrivateAlloc(), memory to the private pointer is freed, but
after that, the pointer is still dereferenced, which may result in a
segfault.
Eric Blake [Wed, 23 Nov 2011 14:26:32 +0000 (07:26 -0700)]
qemu: fix a const-correctness issue
Generally, functions which return malloc'd strings should be typed
as 'char *', not 'const char *', to make it obvious that the caller
is responsible to free things. free(const char *) fails to compile,
and although we have a cast embedded in VIR_FREE to work around poor
code that frees const char *, it's better to not rely on that hack.
Eric Blake [Wed, 23 Nov 2011 00:15:43 +0000 (17:15 -0700)]
API: prefer 'disk' over 'block' or 'path'
Given that we can now handle the target's disk shorthand, in addition
to an absolute path to the file or block device used on the host,
the term 'disk' fits a bit better as the parameter name than 'path'.
Eric Blake [Tue, 22 Nov 2011 22:55:30 +0000 (15:55 -0700)]
blockstats: support lookup by path in blockstats
Commit 89b6284f made it possible to pass either a source name or
the target device to most API demanding a disk designation, but
forgot to update the documentation. It also failed to update
virDomainBlockStats to take both forms. This patch fixes both the
documentation and the remaining function.
Xen continues to use just device shorthand (that is, I did not
implement path lookup there, since xen does not track a domain_conf
to quickly tie a path back to the device shorthand).
Up to now users have to give a full XML description on input when
device-detaching. If they omitted something it lead to unclear
error messages (like generated MAC wasn't found, etc.).
With this patch users can specify only those information which
specify one device sufficiently precise. Remaining information is
completed from domain.
Stefan Berger [Wed, 23 Nov 2011 00:05:45 +0000 (19:05 -0500)]
Enable detection of multiple IP addresses
In preparation of DHCP Snooping and the detection of multiple IP
addresses per interface:
The hash table that is used to collect the detected IP address of an
interface can so far only handle one IP address per interface. With
this patch we extend this to allow it to handle a list of IP addresses.
Above changes the returned variable type of virNWFilterGetIpAddrForIfname()
from char * to virNWFilterVarValuePtr; adapt all existing functions calling
this function.
It will show this error message:
'no connection driver available for No connection for URI jj#j'
Actually,we expect this message below:
Malformed 'uri_aliases' config entry 'jj#j=qemu+ssh://root@127.0.0.1/system', aliases may only contain 'a-Z, 0-9, _, -'
Stefan Berger [Tue, 22 Nov 2011 20:12:03 +0000 (15:12 -0500)]
Add support for STP filtering
This patch adds support for filtering of STP (spanning tree protocol) traffic
to the parser and makes us of the ebtables support for STP filtering. This code
now enables the filtering of traffic in chains with prefix 'stp'.
Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
Stefan Berger [Tue, 22 Nov 2011 20:12:03 +0000 (15:12 -0500)]
Add a 'mac' chain
With hunks borrowed from one of David Steven's previous patches, we now
add the capability of having a 'mac' chain which is useful to filter
for multiple valid MAC addresses.
Signed-off-by: David L Stevens <dlstevens@us.ibm.com> Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
Chang Liu [Tue, 22 Nov 2011 07:24:25 +0000 (15:24 +0800)]
storage: Fallback to use lvchange first if lvremove fails
virStorageBackendLogicalDeleteVol() could not remove the lv with error
"could not remove open logical volume" sometimes. Generally it's caused
by the volume is still active, even if lvremove tries to remove it with
option "--force".
This patch is to fix it by disbale the lv first using "lvchange -aln"
and "lvremove -f" afterwards if the direct "lvremove -f" failed.
Srivatsa S. Bhat [Tue, 22 Nov 2011 03:31:22 +0000 (11:31 +0800)]
Export KVM Host Power Management capabilities
This patch exports KVM Host Power Management capabilities as XML so that
higher-level systems management software can make use of these features
available in the host.
The script "pm-is-supported" (from pm-utils package) is run to discover if
Suspend-to-RAM (S3) or Suspend-to-Disk (S4) is supported by the host.
If either of them are supported, then a new tag "<power_management>" is
introduced in the XML under the <host> tag.
However in case the query to check for power management features succeeded,
but the host does not support any such feature, then the XML will contain
an empty <power_management/> tag. In the event that the PM query itself
failed, the XML will not contain any "power_management" tag.
To use this, new APIs could be implemented in libvirt to exploit power
management features such as S3/S4.
Eric Blake [Fri, 18 Nov 2011 18:27:24 +0000 (11:27 -0700)]
conf: don't modify cpu set string during parsing
None of the callers cared if str was updated to point to the next
byte after the parsed cpuset; simplifying this results in quite
a few code simplifications. Additionally, virCPUDefParseXML was
strdup()'ing a malloc()'d string; avoiding a memory copy resulted
in less code.
Roopa Prabhu [Thu, 17 Nov 2011 02:34:32 +0000 (18:34 -0800)]
qemu: don't release network actual device twice
For direct attach devices, in qemuBuildCommandLine, we seem to be freeing
actual device on error path (with networkReleaseActualDevice). But the actual
device is not deleted.
qemuProcessStop eventually deletes the direct attach device and releases
actual device. But by the time qemuProcessStop is called qemuBuildCommandLine
has already freed actual device, leaving stray macvtap devices behind on error.
So the simplest fix is to remove the networkReleaseActualDevice in
qemuBuildCommandLine. This patch does just that.
Michal Privoznik [Tue, 15 Nov 2011 08:01:31 +0000 (09:01 +0100)]
qemu: Copy console definition from serial
Now, when we support multiple consoles per domain,
the vm->def->console[0] can still remain an alias
for vm->def->serial[0]; However, we need to copy
it's source definition as well otherwise we'll regress
on virDomainOpenConsole.
Osier Yang [Fri, 18 Nov 2011 11:15:10 +0000 (19:15 +0800)]
storage: Skips backingStore of virtual snapshot lv
lvs outputs "[$lvname_vorigin]" for the virtual snapshot lv
(created with "--virtualsize"), and the original device pointed
by "$lvname_vorigin" is just for lvm internal use, one should
never use it.
Per lvm's nameing rules, "[" is not valid as part of the vg/lv name.
(man 8 lvm).
<quote>
VALID NAMES
The following characters are valid for VG and LV names: a-z A-Z 0-9 + _
. -
VG and LV names cannot begin with a hyphen. There are also various
reserved names that are used internally by lvm that can not be used as
LV or VG names. A VG cannot be called anything that exists in /dev/ at
the time of creation, nor can it be called '.' or '..'. A LV cannot be
called '.' '..' 'snapshot' or 'pvmove'. The LV name may also not con‐
tain the strings '_mlog' or '_mimage'
</quote>
So we can skip the set the lv's backingStore by checking if the name
begins with a "[".
Stefan Berger [Sat, 19 Nov 2011 12:26:56 +0000 (07:26 -0500)]
Add support for VLAN filtering
This patch adds support for filtering of VLAN (802.1Q) traffic to the
parser and makes us of the ebtables support for VLAN filtering. This code
now enables the filtering of traffic in chains with prefix 'vlan'.
Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
Jim Fehlig [Fri, 18 Nov 2011 21:54:38 +0000 (14:54 -0700)]
Don't copy sexpr node value that is an empty string
Xen4.1 initializes some unspecified sexpr config items to an empty
string, unlike previous Xen versions that would leave the item unset.
E.g. the kernel item for an HVM guest (non-direct kernel boot):
Xen4.0 and earlier
...
(image
(hvm
(kernel )
...
Xen4.1
...
(image
(hvm
(kernel '')
...
The empty string for kernel causes some grief in subsequent parsing
where existence of specified kernel is checked, e.g.
if (!def->os.kernel)
...
This patch solves the problem in sexpr_node_copy() by not copying
a node containing an empty string.
Eric Blake [Fri, 18 Nov 2011 21:21:54 +0000 (14:21 -0700)]
tests: avoid xend ABRT crash report
I installed the xen development packages on my non-Xen F16 machine
in order to compile-test xen code and ensure we don't break things
on that front, but being a non-xen machine, /usr/sbin/xend is
obviously not running. Unfortunately, xen-4.1.2-1.fc16 has a bug
where merely trying to probe xend status on a non-xen kernel causes
xend to issue an ABRT crash report:
Even though libvirt (correctly) skips the test, the xend crash report
is unnecessary noise. Fix this by first filtering out non-xen
kernels even before attempting to probe xend. The test still runs
and passes on a RHEL 5 xen kernel after this patch.
Hu Tao [Thu, 17 Nov 2011 09:44:12 +0000 (17:44 +0800)]
enable cgroup cpuset by default
This prepares for subsequent patches which introduce dependence
on cgroup cpuset. Enable cgroup cpuset by default so users don't
have to modify configuration file before encountering a cpuset
error.
Eric Blake [Fri, 18 Nov 2011 17:36:35 +0000 (10:36 -0700)]
build: fix accidental POTFILES.in regression
The original patch for commit 4789fb2 considered renaming a file,
then backed out the name change, but forgot to back out the POTFILES.in
change, resulting in 'make syntax-check' failure.
Stefan Berger [Fri, 18 Nov 2011 16:58:18 +0000 (11:58 -0500)]
Extend NWFilter parameter parser to cope with lists of values
This patch modifies the NWFilter parameter parser to support multiple
elements with the same name and to internally build a list of items.
An example of the XML looks like this:
Stefan Berger [Fri, 18 Nov 2011 16:58:18 +0000 (11:58 -0500)]
Create rules for each member of a list
This patch extends the NWFilter driver for Linux (ebiptables) to create
rules for each member of a previously introduced list. If for example
an attribute value (internally) looks like this:
IP = [10.0.0.1, 10.0.0.2, 10.0.0.3]
then 3 rules will be generated for a rule accessing the variable 'IP',
one for each member of the list. The effect of this is that this now
allows for filtering for multiple values in one field. This can then be
used to support for filtering/allowing of multiple IP addresses per
interface.
An iterator is introduced that extracts each member of a list and
puts it into a hash table which then is passed to the function creating
a rule. For the above example the iterator would cause 3 loops.
Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
The internal representation currently is so that a name is stored as a
string and the value as well. This patch now addresses the value part of it
and introduces a data structure for storing a value either as a simple
value or as an array for later support of lists.
This patch adjusts all code that was handling the values in hash tables
and makes it use the new data type.
Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
Stefan Berger [Fri, 18 Nov 2011 16:58:18 +0000 (11:58 -0500)]
Documentation about chains' priorities, lists of elements etc.
This patch adds several aspects of documentation about the network filtering
system:
- chains, chains' priorities and chains' default priorities
- talks about lists of elements, i.e., a variable assigned multiple values
(part of already ACK-ed series)
- already mentions the vlan, stp and mac chains added later on
(https://www.redhat.com/archives/libvir-list/2011-October/msg01238.html)
- mentions limitations of vlan filtering (when sent by VM) on Linux systems
Stefan Berger [Fri, 18 Nov 2011 16:58:18 +0000 (11:58 -0500)]
Interleave jumping into chains with filtering rules in 'root' table
The previous patch extends the priority of filtering rules into negative
numbers. We now use this possibility to interleave the jumping into
chains with filtering rules to for example create the 'root' table of
an interface with the following sequence of rules:
The '-p ARP -j ACCEPT' rule now appears between the jumps.
Since the 'arp' chain has been assigned priority -700 and the 'rarp'
chain -600, the above ordering can now be achieved with the following
rule:
Stefan Berger [Fri, 18 Nov 2011 16:58:18 +0000 (11:58 -0500)]
Extend rule priorities into negative numbers
So far rules' priorities have only been valid in the range [0,1000].
Now I am extending their priority into the range [-1000, 1000] for subsequently
being able to sort rules and the access of (jumps into) chains following
priorities.
Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
Stefan Berger [Fri, 18 Nov 2011 16:58:18 +0000 (11:58 -0500)]
Enable chains with names having a known prefix
This patch enables chains that have a known prefix in their name.
Known prefixes are: 'ipv4', 'ipv6', 'arp', 'rarp'. All prefixes
are also protocols that can be evaluated on the ebtables level.
Following the prefix they will be automatically connected to an interface's
'root' chain and jumped into following the protocol they evaluate, i.e.,
a table 'arp-xyz' will be accessed from the root table using
The permitted values for priorities are [-1000, 1000].
By setting the priority of a chain the order in which it is accessed
from the interface root chain can be influenced.
Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
Stefan Berger [Fri, 18 Nov 2011 16:58:17 +0000 (11:58 -0500)]
Use the actual names of chains in data structure
Use the name of the chain rather than its type index (enum).
This pushes the later enablement of chains with user-given names
into the XML parser. For now we still only allow those names that
are well known ('root', 'arp', 'rarp', 'ipv4' and 'ipv6').
Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
Stefan Berger [Fri, 18 Nov 2011 16:58:17 +0000 (11:58 -0500)]
Use scripting for cleaning and renaming of chains
Use scripts for the renaming and cleaning up of chains. This allows us to get
rid of some of the code that is only capable of renaming and removing chains
whose names are hardcoded.
A shell function 'collect_chains' is introduced that is given the name
of an ebtables chain and then recursively determines the names of all
chains that are accessed from this chain and its sub-chains using 'jumps'.
The resulting list of chain names is then used to delete all the found
chains by first flushing and then deleting them.
The same function is also used for renaming temporary filters to their final
names.
I tested this with the bash and dash as script interpreters.
Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
Stefan Berger [Fri, 18 Nov 2011 16:58:17 +0000 (11:58 -0500)]
Make filter creation in root table more flexible
Use the previously introduced chain priorities to sort the chains for access
from an interface's 'root' table and have them created in the proper order.
This gets rid of a lot of code that was previously creating the chains in a
more hardcoded way.
To determine what protocol a filter is used for evaluation do prefix-
matching, i.e., the filter 'arp' is used to filter for the 'arp' protocol,
'ipv4' for the 'ipv4' protocol and 'arp-xyz' will also be used to filter
for the 'arp' protocol following the prefix 'arp' in its name.
Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
Stefan Berger [Fri, 18 Nov 2011 16:58:17 +0000 (11:58 -0500)]
Introduce an internal priority for chains
For better handling of the sorting of chains introduce an internally used
priority. Use a lookup table to store the priorities. For now their actual
values do not matter just that the values cause the chains to be properly
sorted through changes in the following patches. However, the values are
chosen as negative so that once they are sorted along with filtering rules
(whose priority may only be positive for now) they will always be instantiated
before them (lower values cause instantiation before higher values). This
is done to maintain backwards compatibility.
Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
This patch adds support for a systemd init service for libvirtd
and libvirt-guests. The libvirtd.service is *not* written to use
socket activation, since we want libvirtd to start on boot so it
can do guest auto-start.
The libvirt-guests.service is pretty lame, just exec'ing the
original init script for now. Ideally we would factor out the
functionality, into some shared tool.
Instead of
./configure --with-init-script=redhat
You can now do
./configure --with-init-script=systemd
Or better still:
./configure --with-init-script=systemd+redhat
We can also now support install of the upstart init script
* configure.ac: Add systemd, and systemd+redhat options to
--with-init-script option
* daemon/Makefile.am: Install systemd services
* daemon/libvirtd.sysconf: Add note about unused env variable
with systemd
* daemon/libvirtd.service.in: libvirtd systemd service unit
* libvirt.spec.in: Add scripts to installing systemd services
and migrating from legacy init scripts
* tools/Makefile.am: Install systemd services
* tools/libvirt-guests.init.sh: Rename to tools/libvirt-guests.init.in
* tools/libvirt-guests.service.in: systemd service unit
Add support for interfaces with type=direct to LXC
Support creation of macvlan devices for LXC containers. Do not
allow setting of bandwidth controls or vport profiles due to the
complication that there is no host side visible device to work
with.
* src/lxc/lxc_driver.c: Support type=direct interfaces
Update virNetDevMacVLanCreateWithVPortProfile to allow creation
of plain macvlan devices, as well as macvtap devices. The former
is useful for LXC containers
* src/qemu/qemu_command.c: Explicitly request a macvtap device
* src/util/virnetdevmacvlan.c, src/util/virnetdevmacvlan.h: Add
new flag to allow switching between macvlan and macvtap
creation