Bugzilla – Bug 4788
VMWare Fusion Host: OSOL Guest: Time-of-day chip unresponsive; dead batteries?
Last modified: 2009-04-03 10:50:27 UTC
You need to log in before you can comment on or make changes to this bug.
MacBook Pro (10.5.5) running VMWare Fusion 2.0 (as host). OpenSolaris 2008.05 or 2008.11 as guest. I have had this problem with both 2008.05 and 2008.11 (101a rc1b). OpenSolaris comes up thinking the time of day is December 26, 1986. In the /var/adm/messages file there is a line: WARNING: Time-of-day chip unresponsive; dead batteries? I got this failure today even on a brand new virtual machine. Not clear if it is a VMWare or an OpenSolaris issue. I haven't seen other people discussing it for example on indiana-discuss, though. This makes it impossible for me to use OpenSolaris as my default work environment on my Macbook Pro.
I believe this is a VMware issue. I've seen it from time to time but it seems inconsistent. Can the Reporter set their clock back to the present time or does it keep going back to the epoch?
This happens for me on a full VMware ESX 3.5 server, and after googling, many other people seem to have seen this too. The general consensus is that Solaris 10U3 and 10U5 work fine, as does OpenSolaris 2008.05. However, builds from snv_91 onwards have been reported as showing this problem: http://forums.sun.com/thread.jspa?threadID=5310441 http://jp.opensolaris.org/jive/thread.jspa?threadID=70642&tstart=0
*** Bug 5358 has been marked as a duplicate of this bug. ***
*** Bug 5587 has been marked as a duplicate of this bug. ***
I should maybe have said that this is quite a problem if you're running OpenSolaris as a storage server. We're using the CIFS service to host files to our domain from an OpenSolaris server running within vmware ESX. However because of this time clock problem, Solaris gets the wrong time every time it reboots, which breaks Kerberos authentication and renders the files inaccessible. We have to manually connect to the server, issue a ntpdate command and restart the services to get it running again, as mentioned here: http://www.opensolaris.org/jive/thread.jspa?threadID=80139&tstart=0
*** Bug 5690 has been marked as a duplicate of this bug. ***
(In reply to comment #1) > I believe this is a VMware issue. I've seen it from time to time but it seems > inconsistent. > > Can the Reporter set their clock back to the present time or does it keep going > back to the epoch? No, I cannot persistently set the clock back to the present time. I can manually set the time back to the present time, yet each time I reboot it is back to Dec 27, 1969. Not exactly the epoch, but pretty close. By the way, David, Bug 5358 does not look to me like a duplicate of this bug: a) it refers to virtual box, not VMware b) Time is off by minutes not by decades. Could you clarify why you think the two bugs are related? Is anyone looking into this issue?
In my case the time gets reset to epoch on every reboot. UFS supplies a last mounted time which can be used to initialize the TOD clock in case TOD can be synced to the machine. Afaik ZFS does not provide this, so clock gets reset.
Peter, I believe Rob is using VMware in 5358 - there's just an additional comment about VirtualBox in the bug report but the Summary and initial comment mention VMware. I'll work on getting someone assigned to look at this.
My suspicion here is that starting in build 88, zfs_mountroot() began calling clkset(-1) and that's reacting badly when being a guest under VMware.
I've tested under VMWare Server on Ubuntu, and see the problem every boot. The problem is that Solaris believes it cannot read the RTC; reading CMOS address 0xd (register D) returns 0, which does not have RTC_VRT set, which means the virtual RTC didn't respond "I have a valid RAM and time". I don't know why this would have changed recently. There have been changes to the TOD support, but none that seem as though they would affect this basic functionality.
During "bootadm update archive -v" I see , missing /etc/rtc_config ?
So, somewhere between VMWare Workstation 5.5.8 and VMWare Server 6.5, this behavior changed: the CMOS Register D, which has bit 7 which says "the RTC is valid and working", now reads as 0, which makes Solaris give up on the RTC. The code in Solaris that validates that register is old, and the behavior has clearly changed in VMWare. Presumably other OSes don't demand that bit be set. So, a workaround could be coded, but I'd like to try to ask VMWare if they know about this issue first.
I'm using VMware Server 2.0.0, not 6.5.
Experimenting, patching the 'unix' binary so that the absence of bit 7 is ignored makes the kernel come up without complaint, and the time is correct. So this is merely a missing bit in the RTC emulation from VMware, apparently introduced somewhat recently (later than Workstation 5.5.8, anyway).
(In reply to comment #13) > So, somewhere between VMWare Workstation 5.5.8 and VMWare Server 6.5, this > behavior changed: the CMOS Register D, which has bit 7 which says "the RTC > is valid and working", now reads as 0, which makes Solaris give up on the RTC. > > The code in Solaris that validates that register is old, and the behavior > has clearly changed in VMWare. Presumably other OSes don't demand that bit > be set. So, a workaround could be coded, but I'd like to try to ask VMWare > if they know about this issue first. Yes, we already know about this issue and vmm patch is in the works. Stay tuned.
Is it known which versions of the various VMs are affected?
VMware is planning to fix this bug in next Fusion release -- v2.0.2. For ESX, it will be fixed in the next version of ESX, and patched for ESX3.5. For Workstation, the fix is planned for WS6.5.1.
Andrei, thanks for the update - it's much appreciated. Is there any known workaround on the VMware-side that's available?
I am afraid that there is none.
1. I found in some blog which reffers then deleting nvram file before startng VM sometimes help for opensolaris. I try it one and it works. 2. Sometimes after clear shutdown of OpenSolaris and restart I see a VMware PLayer 2.5.1 message them nvram is corupted and default valuaes are forced instead ? My feeling is then this situation happen when OpenSolaris stops monitoring time of day because clock was changed in large step (Yes, no VMware tools installed)
Andrei, is there a "bugid" or VMware equivalent that's being used to track this?
(In reply to comment #22) > Andrei, is there a "bugid" or VMware equivalent that's being used to track > this? ESX: 358798 Fusion: 357796 Workstation: 354249
(In reply to comment #13) > So, somewhere between VMWare Workstation 5.5.8 and VMWare Server 6.5, this > behavior changed: the CMOS Register D, which has bit 7 which says "the RTC > is valid and working", now reads as 0, which makes Solaris give up on the RTC. > > The code in Solaris that validates that register is old, and the behavior > has clearly changed in VMWare. Presumably other OSes don't demand that bit > be set. So, a workaround could be coded, but I'd like to try to ask VMWare > if they know about this issue first. The VMware behavior hasn't changed; it's just that the bug is a bit more subtle than what you describe above. The VMware behavior has always been that register D bit 7 is initialized to 1, but if the guest writes to this register, all 8 bits are written, so the guest can change that 1 to a zero. We were unaware until recently that this behavior is wrong and bit 7 should be hardwired to read back as 1 no matter what. We've fixed that internally and are trying to get it backported to the various branches that we still periodically release patches for. At some point between Solaris 10 and the current version, Solaris changed. Previously it did not write to register D, but now it writes a 0 there. That tickled the VMware bug. The write to register D comes about because the low-order bits of the register are (at least on some systems, VMware's included) the optional day-of-month match field for the CMOS alarm clock interrupt. The ACPI FADT table gives the address of this field. At some point Solaris must have been educated about this optional field. Solaris 10 wrote the hour, minute, and second alarm fields, but not the day field.
p.s. To clarify one point: when I said CMOS register D bit 7 is "initialized" to 1, that initialization occurs when the VM's nvram file (which holds the persistent CMOS contents) is created. The nvram file is created and populated with default values the first time you power on a new VM. Thereafter, changes made to the CMOS by the guest are persisted by being written back to the file. However, you can power off a VM, delete its nvram file, and then power it on, and that will cause a fresh nvram to be created with default values again.
Awesome explanation, Tim, thanks...and yes, Solaris has relatively-recently started writing the alarm field (which, as you say, uses CMOS offsets out of the FADT, and not directly, which is why I missed those updates at first) for suspend/resume and time-based wakeup. The low-level worker routines always write all the fields, whether a wakeup timer is set or not. Doubtless that's what tickled the bug.
(In reply to comment #23) > ESX: 358798 > Fusion: 357796 > Workstation: 354249 Server 2.0: 359176
(In reply to comment #19) > Andrei, thanks for the update - it's much appreciated. > > Is there any known workaround on the VMware-side that's available? I posted Bug 5358 and it was duped to here ... The "WORKaround" for 6.0.5 is to: 1. Right-Click your Windows Clock and get the Address of your Time Server. 2. Open [System][Admin][T&D] on OS 2009.06 and set the Time Server. 3. Disable the Time Server, then one-time sync. 4. Enable the Time Server and close the Pane. 5. Go back to WinXP and CTRL-ALT-DEL to activate Windows Task Manager. 6. Choose the [Processes] Tab and click on the "CPU" Header to sort by 'CPU Usage'. 7. Locate vmware-vmx.exe and vmware.exe and set them to different to different Processor Affinity (it you have Multi-Core). 8. Now your time should be "better" (looses a couple of minutes in several hours. That is unacceptable, but much better. Rob
Hmm, looking at bug 5358, I don't claim to understand it, but it doesn't look anything like bug 4788. Certainly the workaround offered for 5358 isn't relevant for 4788.
Tim, the reason it was closed as a duplicate was because the system in question had booted up with VMware and clearly the time-of-day was incorrect at the start of the installation. It seems very likely this is due to this bug - at least, that's what I've seen with installs under Fusion. But perhaps I misunderstood what the Reporter was intending with bug 5358. Rob, could you clarify the issue with bug 5358? Do you see this issue when VMware is not being used?
i have encountered this clock problem in opensolaris 2008.11 running on vmware 3.5. Are there any updates regarding the fix? I understand a vmware patch will be available at some point.
I've filed an RTI for the Solaris fix/workaround this morning, and hope to put back soon for inclusion into snv_110/The Next Opensolaris Release.
Great news, thanks for the update. Good to know that we can rely on Sun to get these things fixed, even when VMware are dragging their feet :-) Is snv_110 still due around early March? Am I right in thinking we should be able to download the ISO's mid March time?
(In reply to comment #33) > Good to know that we can rely on Sun to get these things fixed, even when > VMware are dragging their feet :-) This bug was fixed in ESX3.5 codebase, and I'm trying to find out for you if the patch was released yet. Will update this bug as soon as I have that info. This bug was fixed in Fusion 2.0.2 which is available for download now.
I certainly don't mean to say that VMWare is responding slowly at all. It just struck me that fixing it from both ends still makes sense, and there are people who will benefit immediately from this fix that will have less trouble getting the new Solaris than getting the new VMware, plus it enables people who aren't aware of the problem to never become aware. :) As far as exactly when this will hit the streets: don't know the schedules, but I have in fact put it into snv_110. Mid-March probably isn't far off.
(In reply to comment #34) > This bug was fixed in ESX3.5 codebase, and I'm trying to find out for you > if the patch was released yet. Will update this bug as soon as I have that > info. This bug was fixed in Fusion 2.0.2 which is available for download now. This issue will be fixed in ESX3.5 U4.
I don't suppose anybody has a rough eta for when VMware are likely to be releasing 3.5 U4?
(In reply to comment #35) > I certainly don't mean to say that VMWare is responding slowly at all. It just > struck me that fixing it from both ends still makes sense, and there are people > who will benefit immediately from this fix that will have less trouble getting > the new Solaris than getting the new VMware, plus it enables people who aren't > aware of the problem to never become aware. :) > > As far as exactly when this will hit the streets: don't know the schedules, but > I have in fact put it into snv_110. Mid-March probably isn't far off. I was experiencing this bug on opensolaris running in esx 3.5 update 3. I installed the opensolaris svn 110 build and no longer see this bug, but am now seeing issues with HAL not starting: [ Mar 27 16:08:45 Executing start method ("/lib/svc/method/svc-hal start"). ] hal failed to start: error 2 Perhaps it has something to do with bug: 6792302 http://bugs.opensolaris.org/view_bug.do?bug_id=6792302 if I ssh in, and do a svc clean restart on hal, it starts up immediately, and the graphical display starts up and I can login. Could my new issue be related to the bug fix for this bug?
*** Bug 7815 has been marked as a duplicate of this bug. ***
VMware just release ESX/ESXi Update4, can we check then this issues is fixed there and it't works for stock CD installed OpenSolaris 2008.11 ? Fix on ESX/ESXi platform is critical for us !!!
(In reply to comment #38) > I installed the opensolaris svn 110 build and no longer see this bug, but am now > seeing issues with HAL not starting: > [ Mar 27 16:08:45 Executing start method ("/lib/svc/method/svc-hal start"). ] > hal failed to start: error 2 > Perhaps it has something to do with bug: 6792302 > http://bugs.opensolaris.org/view_bug.do?bug_id=6792302 Do you have ntp enabled for your opensolaris snv 110 guest? I think that the root cause for 6792302 will be fixed in build 111, see bug 6811294 "APIs like nanosleep() wakeup prematurely when system time is changed".
(In reply to comment #40) ESX/ESXi 3.5.0 Update4 fixed my problem reported in duplicate bug 7815 with no changes in Guest system.
Looks like issue is fixed in all VMware products NEW refresh, they share same updated Virt Core: In ESX 3.5 and ESXi Update 4 (build 153875) VMware Workstation Version: 6.5.2 | 156735 - 03/31/09 VMware Server 2 Version 2.0.1 | 156745 - 03/31/09 VMware Fusion 2 for Mac Version 2.0.3 | 156731 - 04/02/09 I check it on VMware Workstation Version: 6.5.2 and I see corect time after reboot.
Closing this bug since this is now fixed in all VMware products.