Windows Servers Patching Best Practices!
As we all know applying security/cumulative updated in the
Windows environment is very important to secure the environment from external
attack. It also helps to fix identified bugs in previous version and improve
the stability and performance.
I am writing this blog, because I have seen many
administrators struggle to accomplish patching activity into own organization
Or in customer environment. Most of the struggle is not because of technical
challenges, but it’s due to operational challenges, for example server downtime
, scheduling , change management etc.
I am taking this opportunity to share the best practices
which I have followed while performing Windows Patching. This may help to
organize/structure the patching activity in your organization.
Audience: Windows Server Administrators and the people
who follow server patching via any patching tool available in the market.
Note: This article cover for the Windows Server
environment and applicable for Operating System patching - like Windows
Server 2008 , 2012, 2016, this will not cover for installed Microsoft or any
other application on servers - like Microsoft Exchange, SharePoint, SQL etc.
For application patching separate test needs to be carried by application
specialist before deployment to production environment.
Downtime:
Downtime is the key factor in entire patching activity. Many
of us are experiencing issues while getting downtime for business. Every organization
handles it differently, based on business approval. I have captured a few of
the methods which can be used.
By following one of this method you can reduce the efforts
which we add for getting downtime for business.
- Follow standard maintenance window . There are a couple of organization follow standard maintenance window for the environment. We can utilize same window for patching. Example – Standard maintenance window for DEV and UAT environment during weekday post business hours (10 PM – 3 AM) and weekend for production and DR environment. Servers can be restarted within the provided Standard maintenance window.
- Or Standard four hours patching window, which is only approved to perform patching activity. This window should be agreed by all business units. For example – Second Friday (DEV) , Second Saturday (UAT) , Third Saturday (Production) and Fourth Saturday ( DR).
- Or Another example – First week after second Tuesday à (Week 1) DEV, (Week 2) UAT, (Week 3 ) Production, (Week 4) DR (DR week can be a first week of upcoming month).
Scheduling:
As I mentioned above, getting downtime is always challenging
task for all the administrators. There are a couple of points, that we
need to consider while preparing a schedule to line up it with agreed downtime.
In bellow scenario I am describing based on the downtime
method (3) mentioned in the above section of “Server Downtime”.
- It’s recommended to perform Windows patching on a monthly basis, not by quarterly.
- List out the Servers which are in scope for patching. If your organization have segregated environment like DEV/UAT/Production/DR, then prepare the schedule starting with DEV than UAT, Production and DR. Using this schedule you can patch the servers within four weeks of time span. Sample schedule should look like below (Image1). If you don’t have any tool to prepare schedules, then you can prepare It in excel sheet and share It will all the stakeholders/Servers, Application Owners via email notification.. This notification is important, because it’s remind them about the upcoming patching activity and accordingly they can do pre work on the application front if needed. This also help, If they want to exclude the server from pathing due to scheduled application release. (It’s not recommended to exclude servers from patching unless it’s really a valid/business need). Even If you have exclude the server, make sure you will take next agreed downtime window from application team to cover patching activity.
- Microsoft will release the patches on the second Tuesday of every month, post that you can identify the patch and get the Security Team/CISO approval (Security Team/CISO approval process may vary based on the organization). After approval, we have to perform Initial testing on all the versions of Windows OS (Windows Server 2008 , Windows Server 2012, Windows Server 2016).. This will clarify us whether the OS is booting and coming up without any issue, MMC snap in is working as expected , no error reported under Windows event logs, all Windows automated services are running, server utilization/performance is normal, etc.
- As shown in below schedule, from Second Saturday you can start your first week of patching, which will cover DEV server’s patching, second week for UAT, Third for Production and fourth week for DR servers. The Individual application team needs to carry out application level testing post completion of DEV/UAT patching before proceeding patching on Production and DR environment. This will help to avoid production impact.
- It’s also administrators responsibility to notify all stakeholders/Servers, Application Owners via email, post completion of patching activity so that they can carry further application level testing to make sure hosted applications on server are working as expected.
Change Management:
Change management is also one of the important factors in
patching. This gives awareness about the upcoming changes in the
environment and also help from an audit point of view. Every organization will
have defined process based on business needs.Its’s recommended to use Standard Change Template, since
patching activity is one of the mandatory activities which will be performed on
a monthly basis. Using Standard Template we minimize the change initiator work of
drafting the Change Description/Change Task etc.
Compliance and Reporting:
It’s very important to carry out compliance check post
completion of patching. Measuring the implanted work is always beneficial
to organization from the security audit point of view.
It’s recommended to perform patching compliance imitated
post completion of patching. For example – if you have four hours of downtime,
then perform the patching compliance scan on second of third hours so that you
can re-patch the servers within same downtime under approved change. If you
missed to check compliance within a same downtime window, then you may
need to request for new downtime for business and also need to raise a separate
change ticket.
If your compliance mechanism is giving compliance data after
24/48 hours, then its recommended to patch missing servers in upcoming downtime
windows. Do not keep a backlog for longer time. This impact on the overall
compliance by end of month cycle.
Additional Notes:
- Make sure you are performing daily health check for the patching tool agent (The agent will be depend on patching tool which you are using example Microsoft SCCM, HPSA etc.). All agents should be reported as healthy. Agent which are not healthy should remediate them immediately. If the agent is not healthy, it may fail to patch the server and it will impact on patching compliance.
- If any issue encountered on application during DEV/UAT testing, then make sure to exclude production and DR servers from patching until issue fixed on DEV/UAT
- If we are uninstalling the patches due to the reported issue, then make sure the application team will consult with App vendor for solution and compatibility. Because we can’t keep servers without patching for longer duration.
Microsoft zero days/Out-of-band patches can be deployed once
the risk assessment is done by the internal security team. Microsoft recommends
deploying OOB patches as soon as possible to avoid the external attack.
If the security team confirms to deploy the patches within
the next 48 hours, then we have to define the scope by identifying servers
running with an impacted software/product under the venerability. For example If
the vulnerability is identified in Internet Explorer 9 , then we have to identify
how many servers in the environment are running with IE9. Data can be fetched
by the compliance tool which you are using in your environment. If you are
using Microsoft SCCM , then you can create custom report with custom query to fetch this data.
If you don’t have any tool, then you have to use any scripting method, the last
option is manual method, but fetching this information manually will be a tedious
job if you have more servers.
Assume after assessment , you have 100 servers running with
IE9 out of 4000 servers. In this case you have to plan patching these 100
servers on priority. Since the time line is short, you may need to
notify/contact server owner/application owner to take explicit approval for
server reboot. After the approval servers can be patched and reboot post
business hours to minimize the business impact. If the standard changed management
is not helping to fulfil the change management requirement, then you may need
to go with emergency change request.
Apart from these impacted 100 servers, the rest of the
servers you can patch as per your standard patching schedule.
Sometime installed antivirus software can mitigate the vulnerability,
In this situation you have to take a call with the security team. As far as installed
antivirus is securing your environment, you
can patch the servers in regular patching schedule. Make sure you have
confirmation from antivirus vendor bout security coverage.
Check the compliance status post completion of patching.
Windows Fail-over Cluster patching:
You can use below method to patch Windows Failover Cluster, unless
you are using Cluster Aware Updated feature for Windows 2012.
Consider you have two node windows Failover cluster running
File Server Role.
- Move all the running resources from node1 to node2.
- Make sure after moving resources to node2, all are online and all the shares are accessible.
- Install patches on node1, restart node1.
- Move all the resources from node2 to node1. Make sure they are online and all the shares are accessible.
- Install patches on node2, restart node2.
- Re-balance all the resources on their preferred cluster node. Check cluster log to make sure everything is green.
Microsoft recommends to run all the cluster nodes on same
patch level.
Patching and restart you can automate, If you are going to
take care of pre-work of resources movement before Patch deployment schedule.
Hope this article will be helpful for you. Your comments and
feedback is important
Important Note: I will suggest you to consult with your senior staff/Technical
Lead/Technical Manager before you follow any of the approaches, above best
practices I have shared based on my experience.
Thank you.
Good one. Informative.
ReplyDeleteGood one!!
ReplyDeleteNice article.. very informative.
ReplyDeleteSuper . Also a short note on ad-hoc patches would help as well
ReplyDeleteThanks Goks, I'll add few points on OOB patching...
DeleteI just have one question if Node 2 is slave node then why can't we install patches on Node 2 first?
ReplyDelete