Hash: SHA1

             AUSCERT External Security Bulletin Redistribution

                        ESB-2008.0295 -- [Solaris]
 System Management Services (SMS) Patches 124319-01 or Later and 120648-05
               or Later may Cause Multiple Domains to Dstop
                               20 March 2008


        AusCERT Security Bulletin Summary

Product:              System Management Services
Publisher:            Sun Microsystems
Operating System:     Solaris
Impact:               Denial of Service
Access:               Existing Account

Original Bulletin:    

- --------------------------BEGIN INCLUDED TEXT--------------------

Solution Type: Sun Alert
   Solution  201249 :   System Management Services (SMS) Patches
   124319-01 or Later and 120648-05 or Later may Cause Multiple Domains
   to Dstop          
   Previously Published As: 103091

   System Management Services 1.5 Software
   System Management Services 1.6 Software

   Bug ID: 6592200, 6602960

   Date of Workaround Release: 27-SEP-2007

   SA Document Body
   CCKGHM1 Internal ID use only.


   System Management Services (SMS) patches 124319-01 or later and
   120648-05 or later may cause multiple domains to Dstop with timeout
   errors due to invalid casm allocations.
   Contributing Factors

   This issue can occur in the following releases:
     * System Management Services 1.5 with patch 120648-05 or later
     * System Management Services 1.6 with patch 124319-01 through
       124319-03 and without patch 124319-04

   The described issue will only be seen if either of the following
   conditions occur:
    1. Three or more domains are brought up simultaneously.
    2. Two domains are brought up simultaneously when at least one
       existing domain is already up.

   To determine the version of SMS on a system, the following command can
   be run:
    # /opt/SUNWSMS/bin/smsversion -t

   To determine the version of SMS on a system from an explorer, the
   following command can be run:
    # cat <explo_dir>/sf15k/smsversion_-t.out

   Note: In most cases the error will correct itself with no impact to
   the platform, however in a small number of cases it may result in a
   domain stop for multiple domains.

   If the described issue occurs, domains will Dstop shortly after a
   state transition from down to post is completed for a separate domain.

   The platform will record messages similar to the following:
    Aug  5 15:36:18 2007 sc1 dsmd-B(): [2527 519395944837524 NOTICE 
   Domain.cc 1366] Domain B stop occurred, restarting domain.
    Aug  5 15:36:18 2007 sc1 dsmd-D(): [2516 519395283444257 ERR 
   Domain.cc 505] Domain stop has been detected in domain D

   Messages similar to the following may be found in the corresponding
    Timeout on head of CI queue
    Coherent Pending Queue Safari timeout ( CPQ_TO )
    Timeout on WATransID
    Command Pool Timeout


   To work around the described issue, do the following:
    1. Bring all active domains to the "ok" prompt.
    2. Add "no_casm_resort" to the ".postrc" for all domains (or to the
       platform ".postrc" if that is in use).
    3. Setkeyswitch the domains to "standby".
    4. Setkeyswitch the domains to "on".

   The casm sorting routine is used to minimize the chance that a kernel
   cage will be located on a split expander. The "no_casm_resort" entry
   will cause the casm sorting routines to be skipped.

   Example of a ".postrc" entry:
    no_casm_resort #Workaround for Sunalert xxxxx.

   Remove the above entry when the patch for this issue is installed.

   Note 1: Just adding the "no_casm_resort" and rebooting a domain could
   cause a second domain to come down due to BugID 6602960.

   Note 2: Applying this workaround on platforms using AXQ versions less
   than 6.3 and split expanders, could expose the system to BugID
   6324819: "Presence of the caged kernel on a split-EXB shows a higher
   propensity for Dstops during domain boot".

   Use the cfgadm(1M) command to see where the caged kernel is located:
    <domain># cfgadm -av | grep SB | grep permanent
    SB17::memory  connected  configured  ok base address 0x200000000, 
   8388608 KBytes total, 997016 KBytes permanent

   If the caged kernel is located on a split expander then utilize
   dynamic reconfiguration to remove and re-add the system board to the

   Any fru that is chs disabled due to the domain stop should be
   re-enabled using the "setchs" command:
    setchs -s ok -r "Service Request #" -c <component>


   This issue is addressed in the following release:
     * System Management Services 1.6 with patch 124319-04 or later

   Modification History
   QGFHFWN Internal ID use only.

   Date: 26-NOV-2007
     * Updated Contributing Factors and Resolution section

   Internal Pending Patches

   This solution has no attachment

   Would you recommend this Sun site to a friend or colleague?
                                        [Select -->.........]

   Contact About Sun News & Events Employment Site Map
   Privacy Terms of Use Trademarks Copyright Sun Microsystems,
   Inc. | Sun Support Version 7.0.0 (build #1)


   1. http://www.sun.com/contact/
   2. http://www.sun.com/aboutsun/index.html/
   3. http://www.sun.com/aboutsun/media/index.html
   4. http://www.sun.com/corp_emp/
   5. http://www.sun.com/sitemap/
   6. http://www.sun.com/privacy/
   7. http://www.sun.com/share/text/termsofuse.html
   8. http://www.sun.com/suntrademarks/

- --------------------------END INCLUDED TEXT--------------------

You have received this e-mail bulletin as a result of your organisation's
registration with AusCERT. The mailing list you are subscribed to is
maintained within your organisation, so if you do not wish to continue
receiving these bulletins you should contact your local IT manager. If
you do not know who that is, please send an email to auscert@auscert.org.au
and we will forward your request to the appropriate person.

NOTE: Third Party Rights
This security bulletin is provided as a service to AusCERT's members.  As
AusCERT did not write the document quoted above, AusCERT has had no control
over its content. The decision to follow or act on information or advice
contained in this security bulletin is the responsibility of each user or
organisation, and should be considered in accordance with your organisation's
site policies and procedures. AusCERT takes no responsibility for consequences
which may arise from following or acting on information or advice contained in
this security bulletin.

NOTE: This is only the original release of the security bulletin.  It may
not be updated when updates to the original are made.  If downloading at
a later date, it is recommended that the bulletin is retrieved directly
from the author's website to ensure that the information is still current.

Contact information for the authors of the original document is included
in the Security Bulletin above.  If you have any questions or need further
information, please contact them directly.

Previous advisories and external security bulletins can be retrieved from:


If you believe that your computer system has been compromised or attacked in 
any way, we encourage you to let us know by completing the secure National IT 
Incident Reporting Form at:


Australian Computer Emergency Response Team
The University of Queensland
Qld 4072

Internet Email: auscert@auscert.org.au
Facsimile:      (07) 3365 7031
Telephone:      (07) 3365 4417 (International: +61 7 3365 4417)
                AusCERT personnel answer during Queensland business hours
                which are GMT+10:00 (AEST).
                On call after hours for member emergencies only.

Comment: http://www.auscert.org.au/render.html?it=1967