A recent comment highlighted that i was missing a step during Step 8: UPGRADE THE FABRIC INTERCONNECTS AND I/O MODULES. This was to manually change the primary status for the Fabric Interconnects to be the recently upgraded Fabric Interconnect. This way your environment remains accessible while the second FI is being upgraded. I’ve updated this step now to include the manual failover steps on the FI. It is a step I always follow when performing upgrades but I can’t think of why I didn’t originally put it into the post.
Recently I was tasked with performing an upgrade to our UCS environment to support some new B200 M4 blades. The current firmware version only supported the B200 M3 blades. As part of the process I performed the below steps to complete the upgrade. I split the upgrade into a planning phase and an implementation/upgrade phase. Upgrading firmware of any system can lead to potential outages, things have definitely improved on this over the past decade, but it’s imperative that you plan correctly to make the implementation process go without any hitches. With UCS firmware there is a specific order to follow during the upgrade process. The order to follow is:
- Upgrade UCS Manager
- Upgrade Fabric Interconnect B (subordinate) & Fabric B I/O modules
- Upgrade Fabric Interconnect A (primary) & Fabric A I/O modules
- Upgrade Blade firmware by placing each ESXi Host
During the upgrade process, and particularly during the Fabric Interconnect and I/O module upgrades you will see a swarm of alerts coming from UCSM. This is expected as some of the links will be unavailable as part of the upgrade process so wait until all the steps have been completed before raising an issue with support. As a caveat however, if there is anything that really stands out as not being right open a call with support and get it fixed before proceeding to the next step. During my upgrade process some of the blades showed critical alerts that did not clear and the blades needed to be decommissioned and re-acknowledged to resolve it. This problem stood out and wasn’t hard to recognise that it was more serious that the other alerts.
You will need to check the release requirements from Cisco regarding the upgrade path from your current firmware version to the desired version and also the capabilities that the desired firmware version contains. The upgrade process takes some time so it’s best to review everything in advance and not have to do this on the day of the upgrade itself. For instance during this upgrade process I needed to ensure capability to support B200 M4 blades and VIC 1340 modules as they had been purchased as part of an order for a new project. The steps to carry out in the planning phase are:
Step 1: Verify the Upgrade requirements
Check the release version notes on Cisco’s website for the version you want to upgrade to. For this example we are upgrading to version 2.2
Check the required minimum software version as this should be the one that you should update to. This depends on what new infrastructure you are adding such as new blade type or VIC module.
Also check the Capability Catalog support to ensure that the CPU and RAM. In this instance we are using UCS-CPU-E52670D CPU.
Check the VIC type support also, we are using the latest VIC 1340 which is support in release 2.2 (3a). You may notice that that version also supports the UCS B200 M4 with the CPU listed above so we could go to 2.2 (3a) but it’s best to move to the minimum suggested software release from Cisco.
Review the resolved caveats to see if any issues from a previous release (such as 2.2(3a) has been resolved in the most recent release, in this instance 2.2(3d))
Step 2: Verify Interoperability
Check the interoperability matrix site also to check if there are any specifics relating to your version of ESXi
Note 19: Cluster-on-Die feature is not supported. For VmWare, please see http://kb.vmware.com/kb/2087032
Note 37: B200 M4 and C220/C240 M4(SFF/LFF) requires ESX 5.1 U2 patch P05 Cisco Custom ISO
Read the VMware guide to see if there are any extra items required to get the system working:
Step 3: Review Faults and Open Case with Cisco TAC
Launch a browser and log into UCSM with your Administrative account. Once you login you will be presented with the UCS Manager screen. By default it will open on the Equipment tab from where you can expand each component to get a more in-depth view of the infrastructure
Review the faults on each UCS Chassis or just click on one of the links in fault summary in the top left. This will open up the list of faults for the fabric.
To view the full alerts go to Admin -> Faults, Events and Alerts. It is important to remove critical and major alerts before completing the UCS upgrade. In this instance the critical alert is due to grace license period expired on UCS Central. For the critical issues it is best to open a support call with TAC to advise that a firmware upgrade is going to be performed to ensure that there are no recommended steps missing. In this instance the error for UCS Central could safely be ignored.
Open a support call with TAC to ask them to quickly review the upgrade procedure you are going to follow and it’s good to clarify that there are no other issues to be faced. Also, it gives you a support case to reference when you perform the upgrade in case there are issues and you need to quickly escalate any problems.
The major alert is due to Keyrings certificate is invalid. I showed how to resolve this issue in a separate blog post
Another alert that has appeared is for FCoE membership down which has shown to be a bug but it’s worth checking with TAC if this is the case. More information on the bug can be found here:
STEP 4: DOWNLOAD FIRMWARE
Download the Cisco UCS Firmware from cisco.com. You will need a valid Cisco Support login to download this firmware.
1. Open a Web browser to software.cisco.com. Select Upgrade and Update.
2. In the Products window select Servers –Unified Computing, then select Cisco UCS B-Series Blade Server Software
3. In the Download Software screen select Unified Computing System (UCS) Server Software Bundle
4. A new window will appear displaying the versions of software for the Cisco UCS B-Series Blade Server Software. Select the highest version of suggested firmware, which in this example is 2.2(3d).
5. There are three binaries to download, an Infastructure Software Bundle (ucs-k9-bundle-infra.2.2.3d.A.bin), a Software for the UCS B-Series blade server products (ucs-k9-bundle-b-series.2.2.3d.B.bin) and a Software for the UCS C-Series server products (ucs-k9-bundle-b-series.2.2.3d.C.bin). Click Download for each binary file.
6. Accept the End User License Agreement
STEP 5: UPLOAD FIRMWARE BINARIES to UCS MANAGER
1. Select the Equipment tab -> select the Equipment tree ->then select Firmware Management tab.
2. Select the Download Tasks tab and press the Download Firmware button.
3. Select location of the file as Local File System.
4. Press the Browse button and locate the three binary files downloaded earlier.
5. Select the first binary (A.bin) and press the OK button.
6. Click ok to view the Transfer State
7. Click Ok on the Download Task dialog to continue
8. Under Download Tasks you can now see the new file has been uploaded. It is confusingly labelled as having a Transfer State as Downloaded
9. Do the above steps to import the B and C bin files.
10. Next go to Firmware Management -> select the Packages tab. The three binaries should now be available
11. Select the Installed Firmware tab. Expand all the items to view the Running, Startup and Backup Firmware versions
STEP 6: TAKE BACKUP
1. Take a backup of the UCS configuration before continuing. Select the Admin tab -> select the General tab -> then select Backup Configuration.
2. Select Download to Local System, select type as Full State and ensure the file extension on the local file system is set to .xml.
3. Select Browse and go to c:/Temp on your local drive and enter the file name. Click ok
4. Click ok to complete the backup
STEP 7: UPGRADE UCS MANAGER
1. Select the Equipment tab -> select the Equipment tree -> select Firmware Management tab -> then select the Installed Firmware tab.
2. Select UCS Manager at the top and click the Activate Firmware button.
3. In the Activate Firmware dialog select the new version of firmware from the Version to Be Activated drop-down menu and click ok (Note: The screenshot below shows 2.2(1d) but the version selected to activate was 2.2(3d))
4. UCS Manager will now upgrade. You will lose access to UCSM during the upgrade. Sit tight for a few minutes and it will be available again.
5. Log in again to UCSM and accept any Java warnings.
6. Select the Equipment tab -> select the Equipment tree -> select Firmware Management tab -> then select the Installed Firmware tab and verify that the Running Version has changed (2.2(3d))
Note: You may see some new alerts and warnings after you upgrade UCSM firmware versions. I’d recommend investigating them to ensure there are no obvious issues before continuing with the FI upgrades. Some issues I encountered involved CIMC errors on the blade and also critical alerts for ethlanflowmon. You can find out more on both issues here:
Cisco UCS – CIMC did not detect storage controller error
Cisco UCS – FSM:FAILED: Ethernet traffic flow monitoring configuration error
STEP 8: UPGRADE THE FABRIC INTERCONNECTS AND I/O MODULES
1. Select the Equipment tab -> select the Equipment tree -> select Fabric Interconnects
2. Select Fabric Interconnect A (primary) -> select High Availability Details on General Tab to expand
3. Verify that HA settings are as follows: Ready: Yes, State: Up, Leadership: Primary, Cluster Link State: Full.
4. Select Fabric Interconnect B (subordinate) -> click Activate Firmware in Actions.
5. From the drop down menus select the desired Kernel and System Versions and then click OK.
6. Fabric Interconnect B will now upgrade including the I/O modules. This process will take approximately 20 to 25 minutes. You can monitor the progress of the firmware upgrade from the FSM tab.
7. Use the FSM tab to monitor the progress of the firmware upgrade. Once the progress reaches 100% you can continue with Fabric Interconnect A.
8. Connect via SSH directly to Fabric Interconnect IP address. Once connected you can run the below commands to make the newly updated fabric interconnect the lead FI within the environment. This stops your systems from going offline during the upgrade of the A fabric interconnect. The change over will be pretty much instantaneous.
UCS-A# connect local-mgmt UCS-A(local-mgmt)# cluster lead b UCS-A(local-mgmt)#
9. Now that Fabric Interconnect B has taken over the primary role, back in the UI select Fabric Interconnect A (primary) -> click Activate Firmware in Actions.
10. From the drop down menus select the desired Kernel and System Versions and then click OK.
11. Fabric Interconnect A will now upgrade including the I/O modules. This process will take approximately 20 to 25 minutes. You can monitor the progress of the firmware upgrade from the FSM tab. You may lose connectivity to UCSM during this upgrade process as it’s the primary Fabric Interconnect that you are connected to and it will need to reboot. If this occurs just log into UCSM again and it will connect to the remaining Fabric Interconnect.
12. Select the Equipment tab -> select the Equipment tree -> select Firmware Management tab -> then select the Installed Firmware tab.
13. Check that the Fabric Interconnects and I/O Modules have a Running Version of 2.2(3d).
14. Now that both Fabric Interconnects have been upgraded you can go ahead and return to primary function back to Fabric Interconnect A. Please note that this is an optional step. Connect via SSH directly to Fabric Interconnect B IP address. Once connected you can run the below commands to make the newly updated fabric interconnect the lead FI within the environment. This stops your systems from going offline during the upgrade of the A fabric interconnect. The change over will be pretty much instantaneous.
UCS-B# connect local-mgmt UCS-B(local-mgmt)# cluster lead a UCS-B(local-mgmt)#
STEP 9: UPGRADE BLADE SERVERS
1. Select the Servers tab -> select the Policies tree -> expand Host Firmware Packages.
2. Enter the Name as per your companies naming convention e.g FWP2.2-3d. Enter a description and select Simple for How would you like to configure the Host Firmware Package?
3. From the drop down menus select the desired firmware version for Blade and Rack packages and then click ok.
4. Select Maintenance Policies and verify that the Reboot Policy is set to User Ack. Otherwise the ESXi hosts will reboot automatically once the firmware upgrade process begins
5. Now you are ready to start upgrading the Blade firmware of your ESXi hosts. Go into vCenter and select the ESXi host that you want to upgrade. Place the ESXi host into Maintenance Mode
6. For Service Profiles that are bound to a service profile template the change can be made across the board quite easily. Select the “Servers” tab, select Service Profile Template, select root, select Sub-organizations and then select the Service Profile Template Name. Select Service Profile Template -> select root -> select Sub-organizations -> select the Service Profile Template Name of the associated ESXi hosts that you want to update.
6. Select the Policies tab and then expand Firmware Policies. From drop down menu select the newly created Host Firmware policy and click Save Changes.
7. Select the General tab and press Reboot now from the Pending Activities section. You can select multiple hosts to reboot
8. As part of the blade upgrade the following will be upgraded – BIOS, CIMC Controller and Board Controller. This can take between 35 and 60 minutes per blade
9. Use the FSM tab to monitor the progress of the firmware upgrade.
10. Once the FSM progress shows 100% you can exit Maintenance Mode of the upgraded ESXi host.
11. Verify that all firmware has been upgraded. Select the Equipment tab -> select the Equipment tree -> select Firmware Management tab -> select the Installed Firmware tab. Ensure that the same version appears for all components
Reblogged this on 50963182600013.
Pingback: Cisco UCS – CIMC did not detect storage controller error « Virtual Notions
Pingback: Cisco UCS – FSM:FAILED: Ethernet traffic flow monitoring configuration error « Virtual Notions
Thank you very much for taking the time to screenshot this and post it !
Hi John, you’re very welcome. Thanks for the feedback
Found one important missing item in step 8 between sub steps 7 and 8. You need to login to the CLI and change the cluster lead so primary and subordinate change roles. I missed this step and it caused my cluster to go unresponsive during the upgrade and took all my servers down with it. FYI so that doesn’t happen to anyone else 🙂
Hi LS, thanks for the feedback. Sorry to hear about the issues you faced. I’ll update the document in the next 24 hours to reflect the CLI change. I’m not sure why it is missing and it’s a really important step. I really appreciate the feedback and I don’t want anyone else to have this issue in future.
why does the cluster go down just because the active Fabric Interconnect was activated/rebooted? When the Nics are configured for failover or the servers have two Nics from fabric A and B, the traffic should not be disrupted, should it?
Does Autoinstall takes care if this step?
Thanks in advance,
It can depend on which load balancing algorithm is in place and how the MAC addresses are pinned to the Fabric Interconnect. Even with the NICs configured for two ‘fabrics’ there’s the potential for both NICs on a VM to get pinned to the same interconnect. This is due to the network visualization on the IO module. Without knowing the exact configuration of LS’s environment I can’t tell if that was the case which might explain why VMs went down. It doesn’t happen often but it is something I’ve seen once before. With the blades rebooting however I’d re-check the cabling of the environment. I can understand losing management connectivity during the upgrade but the blade should stay active unless the blade was set to reboot at the same time the FIs were being upgraded.
It’s still best practice to migrate the primary role between the FI’s during the upgrade process and I should have had this step list.
I’m not sure if Autoinstall takes care of this step. I haven’t followed that procedure but it’s something I’ll look into.
thanks for your feedback. I just wanted to follow up and give the response from Cisco TAC about the auto install feature and cluster lead failover. Quote: “Regarding the auto install needing for SSH there is not documentation. This will be required if the cluster lead does not changes during the Fabric Interconnect reboots . I just confirmed the Fabric Interconnect should change the cluster lead automatically. That’s why the ssh information on the auto-install does not appear in the upgrade steps document”
Honestly, this answer was not expected because why is this step critical when upgrading the manual way and not with auto-install? Also the question: What will happen if the primary FI just fails due to power outage or hardware failure, will the infrastructure go down as well?
Nevertheless, we plan to perform the upgrade the manual way and not to use auto-install.
Thanks and regards
STEP 7: UPGRADE UCS MANAGER:
I think, there is a mistake . In the mode “Activate firmware”, you must select 2.2(3d) and not 2.2(1.d), right?
Yes that’s correct. The screenshot specifies 2.2(1d). I just hadn’t grabbed the correct screenshot at the time. I’ve updated the section to specify that it should be 2.2(3d) despite the screenshot. Thanks for letting me know.