Introduction
The other day we updated our UsageMeter environment to 4.8. We have a couple and all of them including this appliance worked fine after the upgrade. After a couple of days I received a message that the Usage Meter consumption data is no longer being uploaded to the VMware VCP environment. In the VMware VCP environment the data is being translated to usage for each of the products you use as a Cloud Provider in your environment. This part is essential so that VMware knows what you are using and can bill accordingly.
Troubleshooting
Let’s start troubleshooting! The UsageMeter UI provided me with not much except for an error message that it cannot upload to the Cloud Partner Navigator portal. While going to Settings -> Send Update To Cloud Partner Navigator and clicking on “Send Update to CPN” it didn’t actually do anything and it seemed to hang the entire appliance itself. After a while you would receive a dozen red banner messages. You can also see this from the UI in the notifications tab as can be seen below:
There is not much more you can do from the UI, so let’s dive a bit deeper into the logging. Doing this resolved in me finding the following log entries:
dss_error.log: 2024-03-13 00:46:55.974 ERROR --- [ProcessWatcherThread] c.v.um.commoncomp.procwatch.ProcessInfo : Not found 1 thread instances with name mask uploaderThread 2024-03-13 00:46:55.975 ERROR --- [ProcessWatcherThread] c.v.um.common.health.UmHealthReporter : reportFatalErorr call with errorCode 'ERR_THREAD_WATCHER' and errorMessage 'Issue detected for UploaderThreadsWatcher' gw_error.log: 2024-03-12 10:47:47.260 ERROR --- [nginx-clojure-worker-1] com.vmware.um.umconnection.api.Journal : Unable to retrieve the Journal logs. com.vmware.um.common.err.UmException: Unable to call service at https://localhost:8051/api/v2/journal/search-results: 500 - {"timestamp":"2024-03-12T09:47:47.256+00:00","status":500,"error":"Internal Server Error","path":"/api/v2/journal/search-results"}, body was okhttp3.RequestBody$Companion$toRequestBody$2@40147e57 at com.vmware.um.common.platform.UmPlatformClient.callPlatformService(UmPlatformClient.java:420) at com.vmware.um.common.platform.UmPlatformClient.searchJournal(UmPlatformClient.java:477) at com.vmware.um.umconnection.api.Journal.read(Journal.java:166) at com.vmware.um.gw.handler.UmJournalR.processRequest(UmJournalR.java:77) at com.vmware.um.gw.RestApiHandler.processRestAPI(RestApiHandler.java:122) at com.vmware.um.gw.ApiEndPoint.invoke(ApiEndPoint.java:243) at nginx.clojure.java.NginxJavaHandler.process(NginxJavaHandler.java:125) at nginx.clojure.NginxSimpleHandler.handleRequest(NginxSimpleHandler.java:217) at nginx.clojure.NginxSimpleHandler.lambda$execute$0(NginxSimpleHandler.java:181) at java.base/java.util.concurrent.FutureTask.run(Unknown Source) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) at java.base/java.util.concurrent.FutureTask.run(Unknown Source) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.base/java.lang.Thread.run(Unknown Source) dss_main.log: 2024-03-13 06:14:54.957 WARN --- [dss_Runner] c.v.um.common.http.LoggingInterceptor : POST /api/v2/journal failed with Failed to connect to localhost/127.0.0.1:8051 java.net.ConnectException: Failed to connect to localhost/127.0.0.1:8051 at okhttp3.internal.connection.RealConnection.connectSocket(RealConnection.kt:297) at okhttp3.internal.connection.RealConnection.connect(RealConnection.kt:207) at okhttp3.internal.connection.ExchangeFinder.findConnection(ExchangeFinder.kt:226) at okhttp3.internal.connection.ExchangeFinder.findHealthyConnection(ExchangeFinder.kt:106) at okhttp3.internal.connection.ExchangeFinder.find(ExchangeFinder.kt:74) at okhttp3.internal.connection.RealCall.initExchange$okhttp(RealCall.kt:255) at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.kt:32) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109) at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.kt:95) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109) at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.kt:83) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109) at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.kt:76) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109) at com.vmware.um.common.http.LoggingInterceptor.intercept(LoggingInterceptor.java:26) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109) at com.vmware.um.common.http.RetryInterceptor.intercept(RetryInterceptor.java:65) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109) at okhttp3.internal.connection.RealCall.getResponseWithInterceptorChain$okhttp(RealCall.kt:201) at okhttp3.internal.connection.RealCall.execute(RealCall.kt:154) at com.vmware.um.common.platform.UmPlatformClient.callPlatformService(UmPlatformClient.java:391) at com.vmware.um.common.platform.UmPlatformClient.createJournalEntry(UmPlatformClient.java:446) at com.vmware.um.umconnection.api.Journal.checkParamsAndCreateJournalPayload(Journal.java:202) at com.vmware.um.umconnection.api.Journal.create_async(Journal.java:97) at com.vmware.um.uploader.UploaderComponent.notifyOnlineMode(UploaderComponent.java:571) at com.vmware.um.uploader.UploaderComponent.afterStart(UploaderComponent.java:313) at com.vmware.um.umcomponent.ComponentManager.afterStart(ComponentManager.java:248) at com.vmware.um.umcomponent.Runner.start(Runner.java:166) at com.vmware.um.umcomponent.Runner.main(Runner.java:251) Caused by: java.net.ConnectException: Connection timed out at java.base/sun.nio.ch.Net.pollConnect(Native Method) at java.base/sun.nio.ch.Net.pollConnectNow(Unknown Source) at java.base/sun.nio.ch.NioSocketImpl.timedFinishConnect(Unknown Source) at java.base/sun.nio.ch.NioSocketImpl.connect(Unknown Source) at java.base/java.net.SocksSocketImpl.connect(Unknown Source) at java.base/java.net.Socket.connect(Unknown Source) at okhttp3.internal.platform.Platform.connectSocket(Platform.kt:128) at okhttp3.internal.connection.RealConnection.connectSocket(RealConnection.kt:295) ... 28 common frames omitted
Because this did not specifically ring any bells, I had to create a Support ticket with VMware GSS. They eventually told me if was due to a specific network connection timeout to the CPN and it might be because our consumption data is rather large. This specific UsageMeter is monitoring a really large number of environments and VM’s so that might also be a factor. This issue can be fixed by doing the following.
- Create a backup for the
dss_proces.conf
file at/opt/vmware/cloudusagemetering/conf
. - Go to the line with
-componentName processWatcher
and edit the-timeoutMs
value from15000
to350000
. - Go to the line with
-componentParams {
and add the following lines:
-readTimeoutSeconds 300 -writeTimeoutSeconds 300
- Save the file and reboot the UsageMeter appliance with a Guest Restart.
After this re-test the upload with Settings -> Send Update To Cloud Partner Navigator and clicking on “Send Update to CPN”. This should now give you the following result:
There you have it, your UsageMeter will start re-uploading the consumption data to VMware CPN and you should receive an e-mail more or less instantly that everything is working fine again.
0 Comments