Protect yourself against future threats.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 =========================================================================== AUSCERT External Security Bulletin Redistribution ESB-2008.0295 -- [Solaris] System Management Services (SMS) Patches 124319-01 or Later and 120648-05 or Later may Cause Multiple Domains to Dstop 20 March 2008 =========================================================================== AusCERT Security Bulletin Summary --------------------------------- Product: System Management Services Publisher: Sun Microsystems Operating System: Solaris Impact: Denial of Service Access: Existing Account Original Bulletin: http://sunsolve.sun.com/search/printfriendly.do?assetkey=1-66-201249-1 - --------------------------BEGIN INCLUDED TEXT-------------------- Solution Type: Sun Alert Solution 201249 : System Management Services (SMS) Patches 124319-01 or Later and 120648-05 or Later may Cause Multiple Domains to Dstop Previously Published As: 103091 Product System Management Services 1.5 Software System Management Services 1.6 Software Bug ID: 6592200, 6602960 Date of Workaround Release: 27-SEP-2007 SA Document Body CCKGHM1 Internal ID use only. Impact System Management Services (SMS) patches 124319-01 or later and 120648-05 or later may cause multiple domains to Dstop with timeout errors due to invalid casm allocations. Contributing Factors This issue can occur in the following releases: * System Management Services 1.5 with patch 120648-05 or later * System Management Services 1.6 with patch 124319-01 through 124319-03 and without patch 124319-04 The described issue will only be seen if either of the following conditions occur: 1. Three or more domains are brought up simultaneously. 2. Two domains are brought up simultaneously when at least one existing domain is already up. To determine the version of SMS on a system, the following command can be run: # /opt/SUNWSMS/bin/smsversion -t 1.6 To determine the version of SMS on a system from an explorer, the following command can be run: # cat <explo_dir>/sf15k/smsversion_-t.out 1.6 Note: In most cases the error will correct itself with no impact to the platform, however in a small number of cases it may result in a domain stop for multiple domains. Symptoms If the described issue occurs, domains will Dstop shortly after a state transition from down to post is completed for a separate domain. The platform will record messages similar to the following: Aug 5 15:36:18 2007 sc1 dsmd-B(): [2527 519395944837524 NOTICE Domain.cc 1366] Domain B stop occurred, restarting domain. Aug 5 15:36:18 2007 sc1 dsmd-D(): [2516 519395283444257 ERR Domain.cc 505] Domain stop has been detected in domain D Messages similar to the following may be found in the corresponding Dstop: Timeout on head of CI queue Coherent Pending Queue Safari timeout ( CPQ_TO ) Timeout on WATransID Command Pool Timeout Workaround To work around the described issue, do the following: 1. Bring all active domains to the "ok" prompt. 2. Add "no_casm_resort" to the ".postrc" for all domains (or to the platform ".postrc" if that is in use). 3. Setkeyswitch the domains to "standby". 4. Setkeyswitch the domains to "on". The casm sorting routine is used to minimize the chance that a kernel cage will be located on a split expander. The "no_casm_resort" entry will cause the casm sorting routines to be skipped. Example of a ".postrc" entry: no_casm_resort #Workaround for Sunalert xxxxx. Remove the above entry when the patch for this issue is installed. Note 1: Just adding the "no_casm_resort" and rebooting a domain could cause a second domain to come down due to BugID 6602960. Note 2: Applying this workaround on platforms using AXQ versions less than 6.3 and split expanders, could expose the system to BugID 6324819: "Presence of the caged kernel on a split-EXB shows a higher propensity for Dstops during domain boot". Use the cfgadm(1M) command to see where the caged kernel is located: <domain># cfgadm -av | grep SB | grep permanent SB17::memory connected configured ok base address 0x200000000, 8388608 KBytes total, 997016 KBytes permanent If the caged kernel is located on a split expander then utilize dynamic reconfiguration to remove and re-add the system board to the domain. Any fru that is chs disabled due to the domain stop should be re-enabled using the "setchs" command: setchs -s ok -r "Service Request #" -c <component> Resolution This issue is addressed in the following release: * System Management Services 1.6 with patch 124319-04 or later Modification History QGFHFWN Internal ID use only. Date: 26-NOV-2007 * Updated Contributing Factors and Resolution section Internal Pending Patches 120648-07 Attachments This solution has no attachment Would you recommend this Sun site to a friend or colleague? [Select -->.........] Submit Contact About Sun News & Events Employment Site Map Privacy Terms of Use Trademarks Copyright Sun Microsystems, Inc. | Sun Support Version 7.0.0 (build #1) References 1. http://www.sun.com/contact/ 2. http://www.sun.com/aboutsun/index.html/ 3. http://www.sun.com/aboutsun/media/index.html 4. http://www.sun.com/corp_emp/ 5. http://www.sun.com/sitemap/ 6. http://www.sun.com/privacy/ 7. http://www.sun.com/share/text/termsofuse.html 8. http://www.sun.com/suntrademarks/ - --------------------------END INCLUDED TEXT-------------------- You have received this e-mail bulletin as a result of your organisation's registration with AusCERT. The mailing list you are subscribed to is maintained within your organisation, so if you do not wish to continue receiving these bulletins you should contact your local IT manager. If you do not know who that is, please send an email to auscert@auscert.org.au and we will forward your request to the appropriate person. NOTE: Third Party Rights This security bulletin is provided as a service to AusCERT's members. As AusCERT did not write the document quoted above, AusCERT has had no control over its content. The decision to follow or act on information or advice contained in this security bulletin is the responsibility of each user or organisation, and should be considered in accordance with your organisation's site policies and procedures. AusCERT takes no responsibility for consequences which may arise from following or acting on information or advice contained in this security bulletin. NOTE: This is only the original release of the security bulletin. It may not be updated when updates to the original are made. If downloading at a later date, it is recommended that the bulletin is retrieved directly from the author's website to ensure that the information is still current. Contact information for the authors of the original document is included in the Security Bulletin above. If you have any questions or need further information, please contact them directly. Previous advisories and external security bulletins can be retrieved from: http://www.auscert.org.au/render.html?cid=1980 If you believe that your computer system has been compromised or attacked in any way, we encourage you to let us know by completing the secure National IT Incident Reporting Form at: http://www.auscert.org.au/render.html?it=3192 =========================================================================== Australian Computer Emergency Response Team The University of Queensland Brisbane Qld 4072 Internet Email: auscert@auscert.org.au Facsimile: (07) 3365 7031 Telephone: (07) 3365 4417 (International: +61 7 3365 4417) AusCERT personnel answer during Queensland business hours which are GMT+10:00 (AEST). On call after hours for member emergencies only. =========================================================================== -----BEGIN PGP SIGNATURE----- Comment: http://www.auscert.org.au/render.html?it=1967 iQCVAwUBR+G4kyh9+71yA2DNAQJtQgP/UFSauWe0AHp9eFYCBBT/ym9Pu0ru0+D5 ddw8JnL9fTQB01MDQ6FZead20G1xsujoG/GyB4iulnqJYr9Z3U6dzcQ7VUBCMDqW PYDjeFOHpy++8KYBTGJYd6fMIPYxjfqBpoy44HnFPV5A1WzWsfdrElTSuoKtArZz gju1Qc4IOww= =1keU -----END PGP SIGNATURE-----