WCP 服务导致多个工作流程失败,无法将主机置于维护模式。

weimoo 4月前 160

Products

VMware Cloud FoundationVMware vCenter Server

Issue/Introduction

"com.vmware.vapi.std.errors.unauthenticated" and "vapi.security.authentication.invalid" errors for the WCP service causing multiple workflow failures

"com.vmware.vapi.std.errors.unauthenticated" and "vapi.security.authentication.invalid" errors for the WCP service causing multiple workflow failures (broadcom.com)

Repair the status of the WCP service using a new registration and solution user certificate, so that it is authenticating correctly with vCenter services.

Symptoms:

Unable to put a host in maintenance mode.

vpxd.log shows errors similar to:

 

2020-12-21T12:43:56.848-08:00 info vpxd[10034] [Originator@6876 sub=MoHost opID=opId-18b14-105289-d9] WCP exitMaintenanceMode vAPI returns error: Error:
-->    com.vmware.vapi.std.errors.unauthenticated
--> Messages:
-->    vapi.security.authentication.invalid<Unable to authenticate user>
-->  
2020-12-21T12:43:56.851-08:00 error vpxd[10034] [Originator@6876 sub=MoHost opID=opId-18b14-105289-d9] [Delete] Failed to delete vAPI session. Error:
--> Error:
-->    com.vmware.vapi.std.errors.unauthenticated
--> Messages:
-->    vapi.security.authentication.invalid<Unable to authenticate user>
..
..
..
2020-12-21T12:43:56.860-08:00 info vpxd[10034] [Originator@6876 sub=Default opID=opId-18b14-105289-d9] [VpxLRO] -- ERROR task-6215 -- host-9421 -- vim.HostSystem.enterMaintenanceMode: vim.fault.InvalidState:
--> Result:
--> (vim.fault.InvalidState) {
-->    faultCause = (vmodl.MethodFault) null, 
-->    faultMessage = (vmodl.LocalizableMessage) [
-->       (vmodl.LocalizableMessage) {
-->          key = "com.vmware.cdrs.maintenancemode.wcp.entermaintenancemode", 
-->          arg = <unset>, 
-->          message = <unset>


SDDC Manager workflows fail with the error:

FAILED_TO_GET_WCP_CLUSTER_STATUS
Failed to get Workload Management cluster status for vCenter <VC_FQDN>

 

Environment

VMware vCenter Server 7.0.x
VMware Cloud Foundation 4.x

Cause

This issue can be caused due any one of the following reasons :
  1. Duplicate solution user certificate for the WCP service in vCenter linked mode setup. Replacing the solution user certificate via the certificate-manager utility would cause this issue and VMware vCenter Server 7.0 Update 2 and above versions has the fix for this issue.
  2. Expired WCP solution user certificate. By default all the solution user certificates has 10 years validity except for 'wcp' solution user. Only for 'wcp' solution user, the certificate validity was set to 2 years which caused this certificate getting expired when compared to other solution users. This issue is fixed in vCenter Server 7.0 U3 and above versions by changing the default validity of 'wcp' solution user to 10 years.

Resolution

0. Take offline snapshots of all vCenters in the SSO domain.
1. SSH to the vCenter in question where the WCP service needs to be repaired.
2. Get the unique Machine ID and hostname:
/usr/lib/vmware-vmafd/bin/vmafd-cli get-machine-id --server-name localhost

hostname -f
3. Generate WCP solution user key:
/usr/lib/vmware-vmca/bin/certool --server localhost --genkey --privkey=/tmp/wcp.key --pubkey=/tmp/wcp.pub
4. Generate WCP solution user certificate:
/usr/lib/vmware-vmca/bin/certool --server=localhost --genCIScert --privkey=/tmp/wcp.key --cert=/tmp/wcp.crt --Name=wcp --Hostname=<VC_FQDN>
5. Get WCP service name using dir-cli: [default name - wcp-<machine id>]
/usr/lib/vmware-vmafd/bin/dir-cli service list
6. Update the WCP service with the new WCP certificate:
/usr/lib/vmware-vmafd/bin/dir-cli service update --name <insert wcp service name from the service list> --cert /tmp/wcp.crt
7. Delete the WCP solution user entry from VECS store:
/usr/lib/vmware-vmafd/bin/vecs-cli entry delete --store wcp --alias wcp -y
/usr/lib/vmware-vmafd/bin/vecs-cli force-refresh

8. Update the new WCP solution user certificate to VECS store:

/usr/lib/vmware-vmafd/bin/vecs-cli entry create --store wcp --alias wcp --cert  /tmp/wcp.crt --key /tmp/wcp.key

9. Verify that the WCP certificate is updated - The Subject should contain unique CN as updated in wcp.cfg, as well as a new Issue and Expiration date:

/usr/lib/vmware-vmafd/bin/vecs-cli entry getcert --store wcp --alias wcp --text

10. Restart services on the vCenter:

service-control --stop --all && service-control --start --all

11. Re-try the previous workflow which was failing due to WCP errors.

Additional Information

Impact/Risks:
WARNING: The process involves making changes to the solution user registration and certificate, which is stored in VMDIR. Highly recommended to take offline snapshots of all vCenters in the SSO prior to attempting the steps in the resolution.
最新回复 (0)
全部楼主
返回
发新帖