Introduction

The other day we updated our UsageMeter environment to 4.8. We have a couple and all of them including this appliance worked fine after the upgrade. After a couple of days I received a message that the Usage Meter consumption data is no longer being uploaded to the VMware VCP environment. In the VMware VCP environment the data is being translated to usage for each of the products you use as a Cloud Provider in your environment. This part is essential so that VMware knows what you are using and can bill accordingly.

Troubleshooting

Let’s start troubleshooting! The UsageMeter UI provided me with not much except for an error message that it cannot upload to the Cloud Partner Navigator portal. While going to Settings -> Send Update To Cloud Partner Navigator and clicking on “Send Update to CPN” it didn’t actually do anything and it seemed to hang the entire appliance itself. After a while you would receive a dozen red banner messages. You can also see this from the UI in the notifications tab as can be seen below:

UsageMeter 4.8 Detected health issues for services
Uploader ThreadsWatcher
UsageMeter 4.8 Detected health issues for services Uploader ThreadsWatcher

There is not much more you can do from the UI, so let’s dive a bit deeper into the logging. Doing this resolved in me finding the following log entries:

dss_error.log:
2024-03-13 00:46:55.974 ERROR --- [ProcessWatcherThread] c.v.um.commoncomp.procwatch.ProcessInfo  : Not found 1 thread instances with name mask uploaderThread
2024-03-13 00:46:55.975 ERROR --- [ProcessWatcherThread] c.v.um.common.health.UmHealthReporter    : reportFatalErorr call with errorCode 'ERR_THREAD_WATCHER' and errorMessage 'Issue detected for UploaderThreadsWatcher'


gw_error.log:
2024-03-12 10:47:47.260 ERROR --- [nginx-clojure-worker-1] com.vmware.um.umconnection.api.Journal   : Unable to retrieve the Journal logs.
com.vmware.um.common.err.UmException: Unable to call service at https://localhost:8051/api/v2/journal/search-results: 500 - {"timestamp":"2024-03-12T09:47:47.256+00:00","status":500,"error":"Internal Server Error","path":"/api/v2/journal/search-results"}, body was okhttp3.RequestBody$Companion$toRequestBody$2@40147e57
	at com.vmware.um.common.platform.UmPlatformClient.callPlatformService(UmPlatformClient.java:420)
	at com.vmware.um.common.platform.UmPlatformClient.searchJournal(UmPlatformClient.java:477)
	at com.vmware.um.umconnection.api.Journal.read(Journal.java:166)
	at com.vmware.um.gw.handler.UmJournalR.processRequest(UmJournalR.java:77)
	at com.vmware.um.gw.RestApiHandler.processRestAPI(RestApiHandler.java:122)
	at com.vmware.um.gw.ApiEndPoint.invoke(ApiEndPoint.java:243)
	at nginx.clojure.java.NginxJavaHandler.process(NginxJavaHandler.java:125)
	at nginx.clojure.NginxSimpleHandler.handleRequest(NginxSimpleHandler.java:217)
	at nginx.clojure.NginxSimpleHandler.lambda$execute$0(NginxSimpleHandler.java:181)
	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at java.base/java.lang.Thread.run(Unknown Source)

dss_main.log:
2024-03-13 06:14:54.957  WARN --- [dss_Runner] c.v.um.common.http.LoggingInterceptor    : POST /api/v2/journal failed with Failed to connect to localhost/127.0.0.1:8051
java.net.ConnectException: Failed to connect to localhost/127.0.0.1:8051
	at okhttp3.internal.connection.RealConnection.connectSocket(RealConnection.kt:297)
	at okhttp3.internal.connection.RealConnection.connect(RealConnection.kt:207)
	at okhttp3.internal.connection.ExchangeFinder.findConnection(ExchangeFinder.kt:226)
	at okhttp3.internal.connection.ExchangeFinder.findHealthyConnection(ExchangeFinder.kt:106)
	at okhttp3.internal.connection.ExchangeFinder.find(ExchangeFinder.kt:74)
	at okhttp3.internal.connection.RealCall.initExchange$okhttp(RealCall.kt:255)
	at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.kt:32)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)
	at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.kt:95)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)
	at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.kt:83)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)
	at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.kt:76)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)
	at com.vmware.um.common.http.LoggingInterceptor.intercept(LoggingInterceptor.java:26)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)
	at com.vmware.um.common.http.RetryInterceptor.intercept(RetryInterceptor.java:65)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)
	at okhttp3.internal.connection.RealCall.getResponseWithInterceptorChain$okhttp(RealCall.kt:201)
	at okhttp3.internal.connection.RealCall.execute(RealCall.kt:154)
	at com.vmware.um.common.platform.UmPlatformClient.callPlatformService(UmPlatformClient.java:391)
	at com.vmware.um.common.platform.UmPlatformClient.createJournalEntry(UmPlatformClient.java:446)
	at com.vmware.um.umconnection.api.Journal.checkParamsAndCreateJournalPayload(Journal.java:202)
	at com.vmware.um.umconnection.api.Journal.create_async(Journal.java:97)
	at com.vmware.um.uploader.UploaderComponent.notifyOnlineMode(UploaderComponent.java:571)
	at com.vmware.um.uploader.UploaderComponent.afterStart(UploaderComponent.java:313)
	at com.vmware.um.umcomponent.ComponentManager.afterStart(ComponentManager.java:248)
	at com.vmware.um.umcomponent.Runner.start(Runner.java:166)
	at com.vmware.um.umcomponent.Runner.main(Runner.java:251)
Caused by: java.net.ConnectException: Connection timed out
	at java.base/sun.nio.ch.Net.pollConnect(Native Method)
	at java.base/sun.nio.ch.Net.pollConnectNow(Unknown Source)
	at java.base/sun.nio.ch.NioSocketImpl.timedFinishConnect(Unknown Source)
	at java.base/sun.nio.ch.NioSocketImpl.connect(Unknown Source)
	at java.base/java.net.SocksSocketImpl.connect(Unknown Source)
	at java.base/java.net.Socket.connect(Unknown Source)
	at okhttp3.internal.platform.Platform.connectSocket(Platform.kt:128)
	at okhttp3.internal.connection.RealConnection.connectSocket(RealConnection.kt:295)
	... 28 common frames omitted

Because this did not specifically ring any bells, I had to create a Support ticket with VMware GSS. They eventually told me if was due to a specific network connection timeout to the CPN and it might be because our consumption data is rather large. This specific UsageMeter is monitoring a really large number of environments and VM’s so that might also be a factor. This issue can be fixed by doing the following.

  1. Create a backup for the dss_proces.conf file at /opt/vmware/cloudusagemetering/conf.
  2. Go to the line with -componentName processWatcher and edit the -timeoutMs value from 15000 to 350000.
  3. Go to the line with -componentParams { and add the following lines:
-readTimeoutSeconds 300
-writeTimeoutSeconds 300
  1. Save the file and reboot the UsageMeter appliance with a Guest Restart.

After this re-test the upload with Settings -> Send Update To Cloud Partner Navigator and clicking on “Send Update to CPN”. This should now give you the following result:

UsageMeter 4.8 succesful upload to Cloud Partner Navigator
UsageMeter 4.8 succesful upload to Cloud Partner Navigator

There you have it, your UsageMeter will start re-uploading the consumption data to VMware CPN and you should receive an e-mail more or less instantly that everything is working fine again.


Bryan van Eeden

Bryan is an ambitious and seasoned IT professional with almost a decade of experience in designing, building and operating complex (virtual) IT environments. In his current role he tackles customers, complex issues and design questions on a daily basis. Bryan holds several certifications such as VCIX-DCV, VCAP-DCA, VCAP-DCD, V(T)SP and vSAN and vCloud Specialist badges.

0 Comments

Leave a Reply

Avatar placeholder

Your email address will not be published. Required fields are marked *