Top 10 Cybersecurity Team Effectiveness Metrics

What are the top 10 metrics used to measure cybersecurity team effectiveness?

Cybersecurity is a vital aspect of any organization that relies on digital systems and networks. However, measuring the effectiveness of a cybersecurity team can be challenging, as there are many factors and variables involved. In this blog post, we will explore some of the most common and useful metrics that can help assess how well a cybersecurity team is performing and where they can improve.

1. Mean time to detect (MTTD) – This metric measures how quickly a cybersecurity team can identify a potential threat or incident. The lower the MTTD, the better, as it means that the team can respond faster and minimize the damage.
2. Mean time to respond (MTTR) – This metric measures how quickly a cybersecurity team can contain and resolve a threat or incident. The lower the MTTR, the better, as it means that the team can restore normal operations and reduce the impact.
3. Mean time to recover (MTTR) – This metric measures how quickly a cybersecurity team can restore the affected systems and data after a threat or incident. The lower the MTTR, the better, as it means that the team can resume business continuity and reduce the downtime.
4. Number of incidents – This metric measures how many threats or incidents a cybersecurity team has to deal with in a given period. The lower the number of incidents, the better, as it means that the team has a strong security posture and can prevent most attacks.
5. Severity of incidents – This metric measures how serious or damaging a threat or incident is for an organization. The lower the severity of incidents, the better, as it means that the team can mitigate most risks and protect the most critical assets.
6. Incident response rate – This metric measures how many threats or incidents a cybersecurity team can successfully handle in a given period. The higher the incident response rate, the better, as it means that the team has enough resources and capabilities to deal with all challenges.
7. Incident resolution rate – This metric measures how many threats or incidents a cybersecurity team can successfully resolve in a given period. The higher the incident resolution rate, the better, as it means that the team has effective processes and tools to eliminate all threats.
8. Cost of incidents – This metric measures how much money an organization loses due to threats or incidents in a given period. The lower the cost of incidents, the better, as it means that the team can minimize the financial losses and optimize the security budget.
9. Customer satisfaction – This metric measures how satisfied an organization’s customers are with its security performance and service quality. The higher the level of customer satisfaction, the better, as it means that the team can meet or exceed customer expectations and build trust and loyalty.
10. Employee satisfaction – This metric measures how satisfied an organization’s employees are with its security culture and environment. The higher the employee satisfaction, the better, as it means that the team can foster a positive and collaborative atmosphere and retain talent.

These are some of the most common and useful metrics that can help measure cybersecurity team effectiveness. However, they are not exhaustive or definitive, and each organization may have different goals and priorities when it comes to security. Therefore, it is important to customize and adapt these metrics according to each organization’s specific needs and context.

Responding to Ransomware Attacks

In the event that your personal computer or even the computers on your corporate network fall victim to a successful ransomware attack, an effective response plan determine the difference between disaster and successful recovery. If you are impacted by a company-wide malware infection that takes down multiple endpoints, it could mean a permanent business closure if you are unable to recover critical data.

We will discuss  how you might respond in the beginning of an attack to help remediate any issues before you make some wrong decisions.

How to respond to a ransomware attack

If preventative measures fail, like hardening your systems from Mimikatz attacks (links here and here), making users more cybersecurity aware with Security Awareness Training tips, and all the Windows 10 hardening tips didn’t work, then your organization should take the following actions immediately after identifying a successful ransomware infection.

If you have an Incident Recovery Plan, execute the notification process and get all the teams required started communicating and remediating the systems impacted by the attack.

1. Quarantine Infected Systems

The majority of ransomware attacks will include a function to scan the target network, identifying other systems on the same network that can also be targeted for attack, and then encrypting all the files stored on network shares or other computers as the attackers movers laterally across the network. To help contain any  infection and to prevent the ransomware from spreading to all infected systems the infected systems must be removed from the network as soon as possible. This will significantly slow the spread and buy you time for analysis and troubleshooting before everything is rendered useless.

Note: This includes blocking them from wired and wireless network access.

This will also help prevent infected system from access resources like internal email, backup systems, employee record systems, critical databases, etc.

2. Block Internet Access

Every system on the network may already have the malware copied to the system and it just might not have started the encryption process yet because it hasn’t been able to access the command and control server on the internet. Disconnect all systems from the internet. Those that are still working will not start encrypting the drives, and those already encrypting have been removed from their ability to communicate to the safe systems by the step listed above.

Note: This includes blocking internet access from wired and wireless networks.

Now you have known bad systems (they are actively encrypting the user files or have already encrypted all the user files) isolated from the network (can’t see other systems on your network) and are blocked from the internet (can’t see other systems on the internet). You also have suspected good systems that are blocked from accessing the internet and are disconnected from the bad systems. You can now verify those clean looking systems are definitely clean and return them to normal as you are sure they are not infected. More about that in Step  5 below.

3. Identify Ransomware

Identify the “brand” of ransomware that has infected your systems. While this might seem strange, there are many types of ransomware from many different malware groups. Knowing which one has infected your systems could help you better identify the methods used in the attack, how to stop the spread, and how you might be able to get your data back without paying a ransom.

There have been instances of law enforcement agencies shutting down a  ransomware authors “business” and releasing the decryption keys. Also older  ransomware from groups that no longer are actively infecting new systems have sometimes released their decryption keys.

You can visit a  website like this to help identify which malware has infected your systems so you can get help stopping, removing, and decrypting your locked files. To get a better understanding of the volume of internet threats that exist today, a visual threat map can be helpful. This threat map from Fortinet helps visualize the threats in a more “real-time” visual presentation.

4. Disable Scheduled Tasks

You  should immediately disable any automated or system-scheduled maintenance tasks such as user or system clean-up routines, log deletion tasks,  deleting old backup files, etc. because these automated tasks can remove files you might wish you had later, might be something  your forensic teams might need, or you might perform an action that could prevent a successful remediation from the ransomware attack.

5. Remove Ransomware from Infected Systems

You can use available antivirus tools to identify and successfully remove the ransomware from your computer. If you are already using anti-virus and it didn’t stop the infection, this is probably a good time to investigate your current configuration issues or get a better solution. Once you have scanned and cleaned the system, it is ready to restore your files.

Once you find the right software to scan and detect the malware, run the scanner on all your systems, not just the infected systems. You might think you know which systems are infected, but the scanner can help you determine which systems are actually infected.  You want to do the clean-up and remediation just one time, so do it right the first time.

6. Don’t Pay the Ransom

Note: Only restore your files to systems that you know are clean.

I realize you may not have an option if your critical business files are encrypted, you don’t have good backups you can recover, and you can’t find a free decryption tool. If backups are unavailable or damaged and there is no free decryption tool available, you will be tempted to pay the ransom and recover your files. Just remember you may pay the ransom and still not get your files back. These people are criminals looking for easy money, they are not in the business of being your friend.

While paying the ransom may seem like an easy answer, only consider paying the ransom if all other options have been exhausted and the loss of data will likely result in your company going out of business. Paying the ransom might also get you into trouble with the law, so be very careful and consult an attorney.

7. Restore Your Backups

Note: Only restore your files to systems that you know are clean.

Hopefully you were able to jump right past Step 6 (Don’t Pay the Ransom) because you know not to pay a ransom to a criminal because it only encourages them and finances their next attack. You don’t need to pay the ransom because you either don’t need the files that were encrypted, you were able to find a free decryption tool, or you had good backups ready for you to use.

Restoring backups can take a long time, be difficult to perform, and you still might lose some data. If you have been verifying your backups, practicing the restore process at least once a year, and have a well documented process the effort will be less likely to fail.

If your user files are also backed up to the cloud using a tool like OneDrive, this might also be useful and a quick way to restore a user’s personal files including documents, music, and pictures.

8. Restore Network

Now that you know which systems are clean, the cleaned machine can have access to the internet and other network resources. The infected machines can be cleaned one at a time, files can be restored, then the systems can be returned to the proper network.

Don’t forget to restore internet access for the clean systems. Once you have verified your backup files won’t be over-written, the log files are intact, and what files are required for the audit and forensics teams are saved, you can re-enable scheduled tasks that you have reviewed and know are safe to enable.

9. Change Passwords

Now that you know someone has had access to your systems, you can’t be sure they did not steal your user and system passwords. Have all users reset their passwords. Reset the passwords for all service accounts, accounts used to run scheduled tasks, the KRBTGT account (used by Active Directory), and any enabled accounts used by your systems. Make sure all administrator-level users also change their passwords. Do a full inventory of accounts, looking at the last time the password was changed, and either change the password or disable the account.

10. Investigate Intrusion

Things are now back to normal. Users are back onto their computers, the files are all back where they should be, and users are back to work and not on the telephone with you. That doesn’t mean you are done.

You have to look at what happened so you can make sure it doesn’t happen again.

  • How was the ransomware able to get past your computer controls and be easily installed onto a user’s computer without being detected? Was it a user bypassing a control (authorized or unauthorized), or did the ransomware just not get stopped by any existing security control?
  • Are there changes required to your anti-virus software to make it a stronger defense against ransomware? Is it time to remove the existing solution and replace it with something more powerful or can you just change the configuration of the solution you already own to make it work better?
  • Do you need to make changes to the hardening of your Windows 10 devices to make it harder to bypass your security controls and encrypt the users files?
  • Do you need to alter or improve your corporate firewall controls? What about the security of your remote users and they way they connect to the Virtual Private Network (VPN)?
  • Do you need to make changes to your network to make it harder for software running on the user’s computer to get access to systems like Domain Controllers, Database Servers, File Servers, Web Servers, etc.?
  • Do you need to change the way you perform (or don’t perform) backups of user and system files? How about changes to the way you restore files? Do you have adequate documentation of the procedures used for backing up and restoring files?
  • Do user accounts have the correct level of authorization? Maybe now is a good time to remove elevated permissions from normal users, limit who has elevated permissions, and lock down the use of all admin-level accounts?

Summary

If you need help, now is the time to really get some help figuring out the changes that can help prevent a repeat of the security event. A ransomware incident can stop a company from normal business for days, weeks, or forever.  It can chase away customers, compromise business critical data, and cost you a lot of money to remediate.

Looking at the steps required now can help you practice and plan for a future incident. Careful planning, remediation of security gaps, and technical training can help prevent a successful ransomware attack, shorten the remediation timeline, and help promote confidence in your Information Technology team.

Disaster Recovery: What is RTO and RPO

Disaster Recovery, also known as Incident Recovery, is the process to getting from an unplanned event to normal operations in a predetermined amount of time. If you have a server outage, you should have predicted the amount of acceptable outage (driven by the steps required to recover from the outage) and practice the recovery steps to demonstrate your ability to perform the indicated recovery steps in the time allotted.

As a simple example, we will assume you have a database server that is important to your business. The process owners might say to you, the IT Professional, that this server should never go down. You discuss the impractical nature of such a requirement, and you compromise on the outage requirement by agreeing that if it ever has an unplanned outage you will make sure it is back online within 2 hours.

What you have probably just done is paint yourself into a corner by agreeing to something that you can’t successfully complete.

First, lets understand a couple of important terms used to generally discuss Disaster Recovery.

RTO – The Recovery Time Objective (RTO) is the duration of time within which a business process must be restored after a disaster in order to avoid unacceptable consequences associated with an outage.

RPO – Recovery Point Objective (RPO) describes the interval of time that might pass during a disruption before the quantity of data lost during that period exceeds the maximum allowable threshold you have with the process owners.

Continue reading “Disaster Recovery: What is RTO and RPO”

Disaster Recovery: What is RTO and RPO

Disaster Recovery, also known as Incident Recovery, is the process to getting from an unplanned event to normal operations in a predetermined amount of time. If you have a server outage, you should have predicted the amount of acceptable outage (driven by the steps required to recover from the outage) and practice the recovery steps to demonstrate your ability to perform the indicated recovery steps in the time allotted.

As a simple example, we will assume you have a database server that is important to your business. The process owners might say to you, the IT Professional, that this server should never go down. You discuss the impractical nature of such a requirement, and you compromise on the outage requirement by agreeing that if it ever has an unplanned outage you will make sure it is back online within 2 hours.

What you have probably just done is paint yourself into a corner by agreeing to something that you can’t successfully complete.

First, lets understand a couple of important terms used to generally discuss Disaster Recovery.

RTO – The Recovery Time Objective (RTO) is the duration of time within which a business process must be restored after a disaster in order to avoid unacceptable consequences associated with an outage.

RPO – Recovery Point Objective (RPO) describes the interval of time that might pass during a disruption before the quantity of data lost during that period exceeds the maximum allowable threshold you have with the process owners.

Using these two terms, we have a RTO (Recovery Time Objective) which, in our same example above, is just two hours. How we do that will give us our real RPO (Recovery Point Objective) data, but we don’t have an RPO in our example. If we plan on restoring the system backup to handle our imaginary database server outage, we have to look at two important pieces of information: how long will it take to restore the server; and when was the backup completed.

In our example, the backup is small and will only take about 45 minutes to restore. That means we will easily hit our RTO, provided we configure the proper alerts and establish the ability to instantly respond to an outage. Can you imagine how difficult that will be at 3 am?

The system backup is completed at 2 am, each and every day, in our imaginary example. That means if the server fails at 3 pm today, and we only take 45 minutes to restore the backup, we might think we are providing a successful Disaster Recovery in our allowed time window. The server was restored to the point in time from 2 am, meaning that at 3:45 (the point in which we have finished restoring the server backup) we are now missing the data from 2 am through 3:45 pm. What was the Recovery Point Objective from our process owner? I can guarantee you that if you haven’t discussed this in advance, the process owner will assume the RTO and RPO were both assumed to be the same thing and you have probably failed to meet their expectations through a lack of shared understanding and communication.

To avoid this issue, you need to understand both the RTO nd RPO requirements from the process owner. I also advise you to get these values in writing, and develop the written process and procedures document to meet these goals on each and every system. This means writing a Disaster Recovery Plan, a document that lists the steps required to recovery from all types of disasters. These disaster types should include natural events like fire, flood, earthquakes, tornadoes, as well as man-made events like human error, hacking, data breaches, etc.

Your response to a server that has crashed because of hacking should probably be different than if the server has crashed because of a defective hard drive. Your RTO and RPO might also be different to support those different responses. If our imaginary server is critical to the imaginary business, the process owners probably don’t care why the server is down and they only know the server has to be operational or the business will suffer.

You have some important thinking to do so you should get started either writing a plan, or reviewing your existing plan, before it is too late.

%d bloggers like this: