Windows Servers Patching Best Practices!

As we all know applying security/cumulative updated in the Windows environment is very important to secure the environment from external attack. It also helps to fix identified bugs in previous version and improve the stability and performance.

I am writing this blog, because I have seen many administrators struggle to accomplish patching activity into own organization Or in customer environment. Most of the struggle is not because of technical challenges, but it’s due to operational challenges, for example server downtime , scheduling , change management etc.

I am taking this opportunity to share the best practices which I have followed while performing Windows Patching. This may help to organize/structure the patching activity in your organization.

Audience: Windows Server Administrators and the people who follow server patching via any patching tool available in the market.

Note: This article cover for the Windows Server environment and applicable for Operating System patching  - like Windows Server 2008 , 2012, 2016, this will not cover for installed Microsoft or any other application on servers - like Microsoft Exchange, SharePoint, SQL etc. For application patching separate test needs to be carried by application specialist before deployment to production environment.

Downtime:

Downtime is the key factor in entire patching activity. Many of us are experiencing issues while getting downtime for business.  Every organization handles it differently, based on business approval. I have captured a few of the methods which can be used.
By following one of this method you can reduce the efforts which we add for getting downtime for business.
  1. Follow standard maintenance window . There are a couple of organization follow standard maintenance window for the environment. We can utilize same window for patching. Example – Standard maintenance window for DEV and UAT environment during weekday post business hours (10 PM – 3 AM) and weekend for production and DR environment. Servers can be restarted within the provided Standard maintenance window.
  2. Or Standard four hours patching window, which is only approved to perform patching activity. This window should be agreed by all business units. For example – Second Friday (DEV)  , Second Saturday (UAT) , Third Saturday (Production) and Fourth Saturday ( DR).
  3. Or Another example – First week after second Tuesday à (Week 1) DEV, (Week 2) UAT, (Week 3 ) Production, (Week 4) DR (DR week can be a first week of upcoming month).
Scheduling:

As I mentioned above, getting downtime is always challenging task  for all the administrators. There are a couple of points, that we need to consider while preparing a schedule to line up it with agreed downtime.

In bellow scenario I am describing based on the downtime method (3) mentioned in the above section of “Server Downtime”.
  1. It’s recommended to perform Windows patching on a monthly basis, not by quarterly.
  2. List out the Servers which are in scope for patching. If your organization have segregated environment  like DEV/UAT/Production/DR, then prepare the schedule starting with DEV than UAT, Production and DR. Using this schedule you can patch the servers within four weeks of time span.  Sample schedule should look like below (Image1). If you don’t have any tool to prepare schedules, then you can prepare It in excel sheet and share It will all the stakeholders/Servers, Application Owners via email notification.. This notification is important, because it’s remind them about the upcoming patching activity and accordingly they can do pre work on the application front if needed.  This also help, If they want to exclude the server from pathing due to scheduled application release. (It’s  not recommended to exclude servers from patching unless it’s really a valid/business need). Even If  you  have exclude the server, make sure you will take next agreed downtime window from application team to cover patching activity.
  3. Microsoft will release the patches on  the second Tuesday of every month, post that you can identify the patch and get the Security Team/CISO approval (Security Team/CISO approval process may vary  based on the organization). After approval, we have to perform Initial testing on all the versions of Windows OS (Windows Server 2008 , Windows Server 2012, Windows Server 2016).. This will clarify us whether the OS is booting and coming up without any issue, MMC snap in is working as expected , no error reported under Windows event logs, all Windows automated services are running, server utilization/performance is normal, etc.
  4. As shown in below schedule, from Second Saturday you can start your first week of patching, which will cover DEV server’s patching, second week for UAT, Third for Production and fourth week for DR servers. The Individual application team needs to carry out application level testing post completion of DEV/UAT patching before proceeding patching on Production and DR environment. This will help to avoid production impact.
  5. It’s also administrators responsibility to notify all stakeholders/Servers, Application Owners via email, post completion of patching activity so that they can carry further application level testing to make sure hosted applications on server are  working as expected.
Change Management:

Change management is also one of the important factors in patching. This gives awareness about  the upcoming changes in the environment and also help from an audit point of view. Every organization will have defined process based on business needs.Its’s recommended to use Standard Change Template, since patching activity is one of the mandatory activities which will be performed on a monthly basis. Using Standard Template we minimize the change initiator work of drafting the Change Description/Change Task etc.

Compliance and Reporting:

It’s very important to carry out compliance check post completion of patching. Measuring  the implanted work is always beneficial  to organization from the security audit point of view.
It’s recommended to perform patching compliance imitated post completion of patching. For example – if you have four hours of downtime, then perform the patching compliance scan on second of third hours so that you can re-patch the servers within same downtime under approved change. If you missed to check compliance within a same downtime window, then  you may need to request for new downtime for business and also need to raise a separate change ticket.

If your compliance mechanism is giving compliance data after 24/48 hours, then its recommended to patch missing servers in upcoming downtime windows. Do not keep a backlog for longer time. This impact on the overall compliance by end of month cycle.

Additional Notes:
  1. Make sure you are performing daily health check for the patching tool agent (The agent will be depend on patching tool which you are using example Microsoft SCCM, HPSA etc.). All agents should be reported as healthy. Agent which are not healthy  should remediate them immediately. If the agent is not healthy, it may fail to patch the server and it will impact on patching compliance.
  2. If any issue encountered on application during DEV/UAT testing, then make sure to exclude production and DR servers from patching until issue fixed on DEV/UAT
  3. If we are uninstalling the patches due to the reported issue, then make sure the application team will consult with App vendor for solution and compatibility. Because we can’t keep servers without patching for longer duration. 
Microsoft zero days/Out-of-band patches:


Microsoft zero days/Out-of-band patches can be deployed once the risk assessment is done by the internal security team. Microsoft recommends deploying OOB patches as soon as possible to avoid the external attack.

If the security team confirms to deploy the patches within the next 48 hours, then we have to define the scope by identifying servers running with an impacted software/product under the venerability. For example If the vulnerability is identified in Internet Explorer 9 , then we have to identify how many servers in the environment are running with IE9. Data can be fetched by the compliance tool which you are using in your environment. If you are using Microsoft SCCM , then you can create custom  report with custom query to fetch this data. If you don’t have any tool, then you have to use any scripting method, the last option is manual method, but fetching this information manually will be a tedious job if you have more servers.

Assume after assessment , you have 100 servers running with IE9 out of 4000 servers. In this case you have to plan patching these 100 servers on priority. Since the time line is short, you may need to notify/contact server owner/application owner to take explicit approval for server reboot. After the approval servers can be patched and reboot post business hours to minimize the business impact. If the standard changed management is not helping to fulfil the change management requirement, then you may need to go with emergency change request.

Apart from these impacted 100 servers, the rest of the servers you can patch as per your standard patching schedule.

Sometime installed antivirus software can mitigate the vulnerability, In this situation you have to take a call with the security team. As far as installed antivirus is securing your environment,  you can patch the servers in regular patching schedule. Make sure you have confirmation from antivirus vendor bout security coverage.


Check the compliance status post completion of patching.

Windows Fail-over Cluster patching:

You can use below method to patch Windows Failover Cluster, unless you are using Cluster Aware Updated feature for Windows 2012.

Consider you have two node windows Failover cluster running File Server Role.
  1. Move all the  running resources from node1 to node2.
  2. Make sure after moving resources to node2, all are online and all the shares are accessible.
  3. Install patches on node1, restart node1.
  4. Move all the resources from node2 to node1. Make sure they are online and all the shares are accessible.
  5. Install patches on node2, restart node2.
  6. Re-balance all the resources on their preferred cluster node. Check cluster log to make sure everything is green.

Microsoft recommends to run all the cluster nodes on same patch level.

Patching and restart you can automate, If you are going to take care of pre-work of resources movement before Patch deployment schedule.

Hope this article will be helpful for you. Your comments and feedback is important

Important Note: I will suggest you to consult with your senior staff/Technical Lead/Technical Manager before you follow any of the approaches, above best practices I have shared based on my experience.

Thank you.

Comments

  1. Nice article.. very informative.

    ReplyDelete
  2. Super . Also a short note on ad-hoc patches would help as well

    ReplyDelete
  3. I just have one question if Node 2 is slave node then why can't we install patches on Node 2 first?

    ReplyDelete

Post a Comment

Popular posts from this blog

How to create Storage Pool - Windows Server 2012 R2

How To Add Data Disk In Azure Virtual Machine