Oliver Kurth [Fri, 9 Nov 2018 22:59:39 +0000 (14:59 -0800)]
Adding the libappmonitor source to the open-vm-tools distribution.
In response to customer requests that the libappmonitor library from
the VMwares Guest SDK be opensourced, the libappmonitor code has been
bundled in open-vm-tools.
Oliver Kurth [Fri, 2 Nov 2018 22:28:26 +0000 (15:28 -0700)]
Hide static function declaration for Linux from FreeBSD and Solaris OVT
A recent change introduced a new static function
HgfsInvalidateParentsChildren() in bora-vmsoft/hgfs/fuse/cache.c. That
function is defined and called only in Linux builds.
The static function declaration should not be visible on FreeBSD OVT
builds.
Oliver Kurth [Fri, 2 Nov 2018 22:28:25 +0000 (15:28 -0700)]
GuestMapper: Detailed data fixes
The guestInfo detailed data for Photon was being reported incorrectly,
sometimes adding trailing whitespace when not needed.
The problem was how the release file was processed. It was being
processed (open/read/code/close) multiple times, and wasn't separating
each of the fields as it should. Fixed this.
Adding logging of what is sent by the guest; how things are mapped.
Oliver Kurth [Fri, 2 Nov 2018 22:28:25 +0000 (15:28 -0700)]
VIGOR and RPCI definitions for the tools hang detector events.
Define an array of ToolsHealthEvent to record the last N tools hang
events. By keeping a list of historical events since the power on,
we help ourselves trouble shooting guest/toolsd issues.
Use the first array element for the latest event. This simplifies
implementation as the DynArray can be easily capped by setting the
array count. This requires us to add a PushFront function to add
new element to the front of the DynArray.
Added the RPCI handler for the tools hang detector RPCI messages.
Oliver Kurth [Fri, 2 Nov 2018 22:28:20 +0000 (15:28 -0700)]
Common source file changes not directly applicable to open-vm-tools.
On Windows, the CDR tmp files under folder "VMwareDnD" will be kept
permanently in some cases after drag & drop.
Root cause: The existing strategy is to delete the folder before next reboot
of the machine, which is implemented through writing Windows HKLM registry.
However, for the non-administrator user, that user has no permission to
write the registry which in turn results in the temp files not being removed.
Solution: The temp files will be removed when the user disconnect the remote
desktop/app. The details are:
1. Client will remove the temp folder when remote desktop/app is disconnected
(rmks exits). Server will remove the temp folder when mksvchServer plugin
gets "Not Ready" notification which means mksvchanServer is disconnected
from the mksvchanClient.
2. Use prefix "Horizon_xxxx(pid)-" to distinguish if the temp folder is
being used by DnD or not. For Client, the pid is the rmks process id,
for Server, the pid is Clipboard pid.
Oliver Kurth [Fri, 2 Nov 2018 22:28:20 +0000 (15:28 -0700)]
Issue: Sometimes, there is a message "The system cannot find the file
specified" popped up when drag and drop over multiple remote apps.
Root cause:
When drag over multiple remote apps, multiple pairs of dragEnter and
dragLeave messages are sending. The DnD state or the dropSource/dataObject
are messed up when handling the 2nd DragEnter with the 1st DragLeave at
the same time.
Solution:
Besides fixing this issue, fixed several other issues also to enhance the
handling for multiple pairs of messages in the same time:
1. In DnDController layer, avoid to reset the DnD state when handling the
message responded from agent for previous sessions.
2. In agent, only set the DnD state to be Ready when the previous
DoDragDrop is really cancelled by OLE.
3. In agent, only response to the button event when the drop is notified.
4. In agent, add a 2s timeout checking for cancelling process to avoid
conflicting with another DnDThread to create dropSrouce.
5. In agent, add a 2s timeout checking for dragBegin process to avoid
conflicting with previous dragBegin processing.
6. Add virtual prefix to the method "GetData".
Oliver Kurth [Fri, 2 Nov 2018 22:28:19 +0000 (15:28 -0700)]
Tools hang detector now handles slow guest.
Tools hang detector can check its own past running history and figure
out whether itself is running slow due to a resource contention inside
the guest.
Tools hang detector shall report a different event to the VMX if the
main thread is not checking in and the tools hang detector itself is
not running frequently enough indicating the guest is running slow due
to a possible resource contention.
Oliver Kurth [Fri, 2 Nov 2018 22:28:18 +0000 (15:28 -0700)]
More logging improvements
vSECR doesn't want usernames going to the VMX, so remove them.
Dump cert details when xmlsec fails to add the cert to its keystore.
This can occur when the cert chain in the token has a bad cert,
or one that isn't signed by the root cert in the token's chain.
This can occur if a user has mis-configured an SSO server.
Oliver Kurth [Fri, 2 Nov 2018 22:28:18 +0000 (15:28 -0700)]
Solaris: Synchronize between vmxnet3_tx and vmxnet3
Vmxnet3 driver on Solaris is not properly synchronized
between vmxnet3_tx and vmxnet3_stop. When the driver
receives a stop event from the device, it doesn't
synchronize with the TX function invoked from the
networking stack before it releases the TXQ resources.
Thus, when the TX function vmxnet3_tx() is executed,
and a stop event/interrupt comes in, the TXQ may
suddenly disappear while vmxnet3_tx is still accessing
the descriptors, thus the guest OS crashes.
Oliver Kurth [Fri, 2 Nov 2018 22:28:18 +0000 (15:28 -0700)]
Hgfs FUSE Client: fix attribute caching of folders
When a directory is invalidated from the cache due to any change such as
a rename then any cached children of that parent folder should be
invalidated also. Otherwise the cache holds stale information leading
to incorrect behavior and failing applications.
Oliver Kurth [Fri, 26 Oct 2018 17:45:00 +0000 (10:45 -0700)]
Clear DnDPluginResetTimer on receiving SIGUSR1.
DnDPluginResetTimer callback handler DnDPluginResetSent() accesses
the RPC channel managed by the main loop. However, mainloop destroys
this channel on receiving SIGUSR1. So, we need to destroy this timer
as well when SIGUSR1 is received.
There are at least 2 ways to do it:
1. Add Linux specific code in vmCopyPasteDnDWrapper.cpp and any other
places where we need to do similar cleanup.
2. Have main loop generate a signal and do the necessary cleanup as
handling that signal.
In order to keep the code clean and also let other places/future
changes leverage the same solution, approach #2 is used here to
define and generate a new signal TOOLS_CORE_SIG_NO_RPC. Also,
implement a handler for the same in the test plugin (for testing)
and dndcp for this bug.
While there also fixed the log domain for few files that are supposed
to logging under "dndcp" domain.
Oliver Kurth [Fri, 26 Oct 2018 17:44:59 +0000 (10:44 -0700)]
Develop log APIs to fix security holes in the tools log messages.
Security artifacts such as command args on the host should not be logged
in the VMX log files on the host. This change creates APIs so that
different log messages can be used for host and guest.
Refactored the log plumbing to minimize code duplication when calling
the different implementation of logging to the vmx and logging in guest.
Fixed one instance of security issue in vmbackup, to show how to use
the new APIs.
Oliver Kurth [Fri, 26 Oct 2018 17:44:59 +0000 (10:44 -0700)]
Clear channel restart timer when RPC channel is destroyed.
Vmusr main loop destroys the RPC channel when it receives SIGUSR1.
However, it may have left the restart timer around that would end
up accessing the destroyed channel structure. So, we destroy the
timer along with the RPC channel to be safe.
Oliver Kurth [Fri, 26 Oct 2018 17:44:59 +0000 (10:44 -0700)]
Tools Windows Drivers: split header for reporting versions
The rpcvmx.h header file has been split to allow some tools drivers to
report their versions to the VMX while limiting the number of RPC APIs that are exposed to the drivers like VMCI and Csock.
Oliver Kurth [Fri, 26 Oct 2018 17:44:58 +0000 (10:44 -0700)]
Implement tools hang detection logic
Create a dedicated detector thread. The thread sits in a loop and
wakes up periodically to decrement an atomic counter. Also schedules
a checkin timer with the main loop to reset the counter periodically.
If the counter ever drops to/below zero, the tools hang is detected,
and a tools hang event is generated. Otherwise, if there was a hang,
but the counter has now come back up to positive, a tools recovery
event is generated.
In order to properly create hang and recovery event, previous state
needs to be tracked.
In order to properly handle shutdown, we need a condition variable so
that the detector thread can wake up on it while sleeping. This is
because the toolsd calls the thread pool shutdown function which
in turn calls each thread's terminate function and wait for the threads
to quit. Therefore, our terminate function shall wake up the detector
thread and make it quit. Otherwise, the toolsd shutdown shall hang.
Next change shall implement the new RPCI command to send the
hang/recovery event to VMX.
Oliver Kurth [Fri, 26 Oct 2018 17:44:57 +0000 (10:44 -0700)]
Hgfs Server Linux: fix the share permissions on a file rename or delete
Coverity found that the argumeents to obtain the share permissions
on a file rename were swapped. Even though they are tested together
in an if statement which is not an issue, it could be in the future.
Reverse arguments so the share read and write permissions are correct.
Oliver Kurth [Fri, 26 Oct 2018 17:44:57 +0000 (10:44 -0700)]
Handle Linux kernel /proc FS uint32 type stat overflow when calculating tools rate stats.
On both 32-bit and 64-bit Linux, tools always parses Linux kernel /proc FS
stats as uint64 values. For rate stats, current - previous can handle uint64
type stat overflow, but not uint32 type.
Oliver Kurth [Fri, 26 Oct 2018 17:44:57 +0000 (10:44 -0700)]
Accomodate kmem_malloc() and kmem_free() changes in FreeBSD 12
The kmem_alloc() and kmem_free() APIs have been changed in the
upcoming FreeBSD 12.0 release. The change was to drop the now
unused arena parameter from both functions.
This fix defines and uses several macros whose definitions are
specific to FreeBSD version 10, 11 and 12 kernel memory API changes.
Github open-vm-tools pull request from Josh Paetzel.
https://github.com/vmware/open-vm-tools/pull/286
Oliver Kurth [Fri, 26 Oct 2018 17:44:56 +0000 (10:44 -0700)]
Fix Guest RPC Channel clean up memory leak on the guest side.
The RpcChannel_Destroy() is leaking the memory on the outLock.
Refactored the code so that RpcChannel_Shutdown() matches the
RpcChannel_Setup() if invoked, and RpcChannel_Destroy() just calls
RpcChannel_Shutdown()
Oliver Kurth [Fri, 26 Oct 2018 17:44:55 +0000 (10:44 -0700)]
ToolsCore_GetVmusrLimit(): use app name from ToolsServiceState struct
The vmusr process on Windows is calling ToolsCore_GetVmusrLimit() early
in the process before ToolsCore_Setup() is called to initialize the
ctx member. In ToolsCore_GetVmusrLimit(), changing the call to
VMTools_ConfigGetInteger to use "state->name" instead of the
state->ctx.name.
Oliver Kurth [Fri, 26 Oct 2018 17:44:55 +0000 (10:44 -0700)]
Allow only a single instance of vmusr when multiple users are logged into a VM
When a vmusr process gets the "channel conflict" error while attempting
to open the toolbox-dnd channel, a channel reset is triggered. That
reset results in the channel being restarted and a subsequent conflict
and reset occurs - every second until the channel becomes available.
For *nix guests:
The fix is making use of the repetitive channel resets where the only
RpcIn message received is a "reset" to catch this channel "permanently"
unavailable state. If other RpcIn messages are received, a channel
is considered to be working and the cumulative error count is cleared..
lib/rpcin/rpcin.c:
- struct RpcIn: Added error status boolean and callback function to
notify the dependent layer that a channel error has been
resolved.
- RpcInLoop(): If a non "reset" message is received, clear any channel
error status. This will also notify the dependent layer
that the channel is functioning.
- RpcIn_start(): Added additional argument for new callback; NULL if
not needed.
lib/rpcChannel/rpcChannel.c:
- struct rpcChannelInt:
- Renamed "rpcErrorCount" to "rpcResetErrorCount" since it is actually
a count of the consecutive channel reset failures and not a count
of RpcChannel errors.
- Added counter "rpcFailureCount" for the cumulative channel errors.
- Added "rpcFailureCb" for optional callback to notify the app of a
"permanent" channel failure.
- New function RpcChannelClearError() for RpcIn to notify when the
channel is working; to clear the rpcFailureCount .
- RpcChannel_Setup() - added two arguments for (1) an optional function
to be called when there is a channel failure
and (2) a failure count threshold.
These optional values are stored in the
RpcChannel structure being created.
- RpcChannelError(): Added logic to notify the calling app if the error
threshold has been reached and notify the app if a
callback was provided. A zero threshold signifies
the single vmusr limit should not be enforced.
(fix disable switch).
services/vmtoolsd/mainLoop.c:
- New function ToolsCore_GetVmusrLimit() to retrieve the channel error
threshold default or over-ride from tools.conf.
services/vmtoolsd/toolsRpc.c:
- Added ToolsCoreAppChannelFail(): Callback for "permanent" channel
connection failure. A warning is logged based on whether another
"vmtoolsd -n vmusr" is running or not and the process is terminated.
On Mac OS, the process is terminated with exit(1) as an indication
to launchd that the vmusr process should not automatically be
restarted.
The current implementation uses the error callback only for the vmusr
server on Linux (*nix).
The default channel error limit is 5 (approx. 5 second), but is user
configurable in tools.conf.
[vmusr]
maxChannelAttempts = n # where allowed n = 0, 3-15
When "maxChannelAttempts = 0" is used, the restriction to a single
running vmusr process is not enforced. The existing behavior is
restored with all the accompanying VMX log spew. This is essentially
a user configurable feature disablement switch.