Today we have another UsageMeter issue I would like to discuss with everybody. I’ve been using UsageMeter since the old days so I’ve seen pretty much everything there is. But this time I was surprised to learn something new on these things. Last month while I was reporting our UsageMeter metered licences in the VMware Commerce Portal I’ve stumbled upon a fact that we’ve only had about 10% of the amount of VCPP points that we would regularly have. After troublehooting there didn’t seem anything wrong with the environment. All connected environments were healthy and green. Next to this the upload button on the “Send update to Usage Insight” page also worked fine and all data was uploaded. As can be seen below:
Going through the log files there also wasn’t that much that could be found as to why there would not be enough metered data to use. The only thing we’ve found was the following:
vccol_error.log:54109:java.lang.OutOfMemoryError: Java heap space
vccol_error.log:54111:java.lang.OutOfMemoryError: Java heap space
vccol_error.log:54121:2022-06-17 07:37:30.098 ERROR --- [vCenter collector thread] c.vmware.um.collector.CollectionHelper : OutOfMemoryError collecting from server 17.
Since there wasn’t much else to have a look at or do, I contacted VMware GSS to have a discussion about this issue. After a couple of sessions it seemed that the UsageMeter appliance was running low on memory. So we’ve upgraded this from the regular 8GB RAM to 12GB RAM. Next to this we’ve also upgraded the vCPU’s from 2 to 6, just to be sure that the appliance had enough resources to collect everything it needed. Eventhough this was not necessary if you take a look at the UsageMeter maximums at the following page, which we are not hitting. This unfortunately also did not help.
Turns out that we’ve also had to adjust the memory configurations for two processes within UsageMeter. The “VC Collector” service which is responsible for collecting the data from the vCenter Servers and the “DSS” service which is responsible for the upload of metered data to vCloud Usage Insight.You can do this by following the next couple of steps:
- Login to the UsageMeter appliance through SSH.
- Browse to the following script and open it:
- Make sure to make a back-up of the file with:
cp common_utils.sh common_utils.sh.bak
- Make sure to make a back-up of the file with:
- Go to the following lines where there is:
if [[ -z $DEBUG ]]; then
# Command lines to start every process on the Agent side
DSS_START_CMD="$JAVA_COMMAND $ARGS -Xms128M -Xmx2048M com.vmware.um.umcomponent.Runner @conf/common.conf @conf/dss_process.conf @conf/products/ @conf/dev.conf @conf/local.conf"
VCCOL_START_CMD="$JAVA_COMMAND $ARGS -Xms256M -Xmx8192M com.vmware.um.umcomponent.Runner @conf/common.conf @conf/vccollector_process.conf @conf/products/ @conf/dev.conf @conf/local.conf"
# Command lines to start every process on the Agent side in debug mode
# NGINX closure listens on debug port 8400
# platform process (DSS and Core) listen on 840X
# Collector listen on ports greater than 8410
DSS_START_CMD="$JAVA_COMMAND $DEBUG,address=*:8401 $ARGS -Xms128M -Xmx2048M com.vmware.um.umcomponent.Runner @conf/common.conf @conf/dss_process.conf @conf/products/ @conf/dev.conf @conf/local.conf"
VCCOL_START_CMD="$JAVA_COMMAND $DEBUG,address=*:8411 $ARGS -Xms256M -Xmx8192M com.vmware.um.umcomponent.Runner @conf/common.conf @conf/vccollector_process.conf @conf/products/ @conf/dev.conf @conf/local.conf"
- Now we have to change the 4 lines where it starts with “DSS_START” and “VCCOL_START”. In both the IF and ELSE part.
- Change this from whatever there is to (on our appliance the DSS_START was already at 2GB and the VCCOL_START was at 4GB):
- DSS_START -Xmx2048M
- VCCOL_START -Xmx8192M
- Save the file and reboot the appliance. The changes should now be in effect.
We can have a look at the logging again to see if the collection errors are still occuring. On our environment they stopped and everything started working again including the correct metering. Now there are a couple of things with this solution that I found not all that useful and which I hope get fixed in a next release. First part being that there is no way to know this unless you are seeing less metered points or dig through the logging each day (or have it automated). There should be an indication within the UsageMeter interface that you are hitting some limits or that the metering isn’t working correctly.
The second thing is that this ‘tweak’ is not permanent. Each UsageMeter upgrade this will have to be done again. VMware GSS pointed out that for now they did not plan to include this in the appliance by default, since almost nobody is hitting this. Our environment is nowhere near the maximums that have been declared for the application, but still it seems we are hitting something.
Lastly, I want to point out the reason that we were in fact seeing any points within vCloud Usage Insight. It turns out that if UsageMeter meters atleast the first day of the month, it can choose (if no other data is coming in) to have the state of the VM’s and environment as the entire interval for the metering of the used licenses. So if a VM is turned on the first of the month, UsageMeter says that the VM has been on for the entire month. Which essentially means that you don’t get the benefit of powered down VM’s. But if UsageMeter does not receive anything the first day of the month the interval will start on whatever day it receives the first update.
So for example, if UsageMeter receives a VM for the first time on the 20th of a month, the last 10 days are the entire interval. I think you can now understand why our environment did not have as many metered points as it normally would have, our UsageMeter instance was broken on the first weeks of the month. So no intervals get registered for that period.