PCI – Going Beyond the Standard: Part 24, Disaster Recovery (DR) & Business Continuity Management (BCM)

You may be wondering why I would put this after Governance seeing as that seems to bring everything together, and you may also be wondering why I did not included Disaster Recovery (DR) in the same post as Incident Response (IR) which everyone else always does.

They would be good questions, and my reasoning is relatively simple; You cannot HAVE Business Continuity Management (BCM) without Governance so that must be formalised first, DR represents the detailed processes summarised in the BCM, and IR is the feed INTO the DR/BCM, not the output from it.

To put it another way; the Business Continuity Plan (BCP) details what must be done, in what order, and how quickly to save the business, DR puts that plan into effect, and IR would have uncovered the inciting incident that brought both the BCP and DR plans into play in the first place.

Assuming that made any sense, the question is; What if I don’t HAVE a BCP?

I am surprised every time I ask a client for a BCP and don’t get one. Mostly because I’m not too bright, but partly because it makes absolutely no sense to me that ANY organisation in any industry sector, anywhere in the world would not make such a simple effort to help themselves STAY in business. While both DR and BCP represent what amounts to contingency planning and will hopefully never have to be invoked (assuming your IR is top notch of course), NOT having a plan is nothing short of irresponsible.

There are several well known standards related to Business Continuity, and for obvious reasons they encompass more than just IT systems:

  1. ISO 22301:2012: Societal security — Business continuity management systems – Requirements
    o
  2. ISO 22313:2012: Societal security — Business continuity management systems – Guidance
    o
  3. ISO/IEC 27031:2011: Information security – Security techniques — Guidelines for information and communication technology [ICT] readiness for business continuity
    o
  4. NIST Special Publication 800-34 Rev. 1, Contingency Planning Guide for Federal Information Systems
    o
  5. ANSI/ASIS SPC.1-2009 Organizational Resilience: Security, Preparedness, and Continuity Management Systems

Unfortunately the ISO stuff will set you back a few hundred quid, so start with the NIST / ANSI stuff to ge yourself familiar enough with the concept to at least ask the right questions.

For DR, start with mapping out all of your business processes and asset dependencies. If you don’t know how things fit together, you’ll have no idea how to put them back in place. Clearly, if your asset management processes are not robust, you can’t even begin the mapping process, so get that done first.

Once you have mapped out your business processes, it’s a relatively simple task to organise all of your procedural documentation into how you reestablish all the moving parts. You have all that, right? So whether you have full redundancy in all things, hot swap, warm spares or a whole host of other DR clichés, how you get your systems back online boils down to a series of easily followed instructions.

From an IT perspective, all the BCP plan does is tell you in which order to bring those systems back online and in what timeframe. It should be needless to say – but it isn’t – the plan and all of its moving parts must be tested on an annual basis or even explicit instructions cannot get the response times to an optimal state.

No aspect of security should be performed half-arsed, DR and BCP processes are no exception. Even within the field of security BCP is a speciality, and making the plan simple and appropriate is a talent more than a skill. Expect to pay a lot for these services but rest assured it is money well spent.

PCI – Going Beyond the Standard: Part 20, Incident Response (IR)

First, you may be asking why this blog does not include Disaster Recovery (DR) and Business Continuity Management (BCM, which governs the entire IR / DR process). Because the PCI DSS section 12.10.x is almost entirely related to IR (with the exception of a VERY brief nod to DR / BCP, below in red), I will handle DR / BCP separately in the series (post 23 in fact).

“12.10.1 – Create the incident response plan to be implemented in the event of system breach. Ensure the plan addresses the following, at a minimum:

    • Roles, responsibilities, and communication and contact strategies in the event of a compromise including notification of the payment brands, at a minimum
    • Specific incident response procedures
    • Business recovery and continuity procedures [This is the only requirement in the DSS that goes beyond the protection of CHD.]
    • Data backup processes
    • Analysis of legal requirements for reporting compromises * Coverage and responses of all critical system components
    • Reference or inclusion of incident response procedures from the payment brands.

With regard Incident Response, I put it this way; “What’s the point of being in business, if you don’t intend staying in business?”, and; “Good incident response is what prevents a security event from becoming a business crippling disaster.”

It makes absolutely no sense to me that organisations who basically depend on IT for significant chunks of income (which is most of them), have very little idea how to stop bad things from happening in the first place, let alone fix things when they go wrong. Of course, no incident response is going to predict an earthquake at the datacenter, but the organisations I’ve seen don’t even perform log monitoring properly, let alone consider the impact of acts of nature.

The development of a good incident response plan start with? Yep, a good policy, from there you agree on an appropriate Risk Assessment / Business Impact Analysis process, which in turn provides you everything you need to not only determine if you have any control gaps (after a gap analysis), but – if you’ve done it properly – a good indication of what your incident response and disaster recovery plans should entail.

There is no appropriate IR without an understanding of the business goals. If you have a 4 hour Recovery Time Objective (RTO), your IR will be significantly more robust than one where you can take a week to be back online. Yes, I know that RTOs (and RPOs (Recovery Point Objective for that matter) are DR terms, but if your incident response cannot detect a business crippling event in good time, then neither of those DR goals is an option for you.

When setting up your IR program, the most important word to keep in mind is ‘baseline’. Without a baseline, you don’t have much of a concept of what constitutes an incident in the first place. Only a baseline can give you both context and relevance.

From your baselined system configuration standards (DSS 2.x), to AV (DSS 5.x), to logging (DSS 10.x), to scanning (DSS 11.1.x, and 11.2.x), to FIM (DSS 11.5.x), you have many available inputs into your IR program, none of which will be of the slightest help if you don’t know what they SHOULD look like.

That’s all IR is;, a process whereby an exception to the norm is investigated, and appropriate action taken.

In each of my individual going-beyond-the-standard blogs related to the above DSS requirements, I have stressed the importance of baselining (well, except AV perhaps). The reason I did so was because they all lead up to this. I don’t care how well you have done ANY of the previous requirements, unless you can bring the outputs all together into a comprehensive process of taking action, all you have is a bunch of data to give to your forensics investigator.

You’ll notice though that I did not say a CENTRAL process, because while having a 24X7 Security Operations Centre t manage all of this, it’s rarely practical, even if it involves a outsourced managed service provider (MSP). However, having the correct assignments and procedures to MANAGE the response is of utmost importance, and the details of this plan will vary considerably from company to company.

No IR is not easy, but there is simply too much information and help out there for this difficulty to be any sort of excuse. And no, there is not much in this blog that actually provides guidance, but if this makes SENSE, then you at have at least got enough to begin to ask the right questions.

PCI – Going Beyond the Standard: Part 16, Logging & Monitoring

From everything I have seen in my many years performing PCI assessments, logging is not only one of the least understood of the requirements, it is the most under-utilised, and the one that gives/gave my clients the most pain.

Logging is the most important detective security control you have, bar none, and done correctly, logging is the foundation of your incident response program. Notice I didn’t say ‘disaster recovery’ as well, because if your incident response was where it should be you should not HAVE to recover from a disaster.

The confusion stems mostly from a lack of understanding of logging mechanisms themselves, even for Windows (for which PCI was clearly written). For example, do you think that Windows logs to the PCI requirements out of the box? I did too, but have been assured that it does not. Do you know HOW to get it to log appropriately? No, me either.

I have been further assured this if you WERE to turn logging on to cover the 10.2.X requirements, the logging would be so verbose as to render the device that’s doing the logging useless. Is that really the INTENT of logging? Of course not.

Also, can syslog EVER record the events required in 10.2.x, or even the event content as required in 10.3.x? Once again, I have been told no, but I am no expert.

Yes, you SHOULD have people who DO know this stuff, but how many organisations out there can truly afford that kind of deep expertise in-house? Yes you can outsource, but where is the guidance on EXACTLY how to configure operating systems to log to the PCI requirements? It probably exists, but in 10 years of doing PCI I have not found it, and I’ve even asked ‘experts’ in the field; Security Incident and Event Monitoring (SIEM) vendors. On that note, I have yet to see a SIEM vendor also be an expert in PCI, which to me is an absolute joke if that’s why they are selling it to their clients.

We have free hardening guides for Windows (CIS Security Benchmarks for example), but where is the guide that breaks down the Windows operating system into a mapping between the registry settings for logging and the PCI DSS? Or *nix flavours, or Cisco, or AS400, or Power Series? If you have them, please share?!

So let’s, for the sake of argument, assume that you cannot reasonably log to the letter of PCI, or more to the point, you do not WANT to for usability issues. Are you non-compliant? Let me answer that with another question; What’s more important, configuring your logging to record events for a forensics investigation, or configure your logging to help prevent a breach in the first place?

If you chose the second one, you are correct, and if you also choose to maximise your logging mechanisms, you will not only be PCI compliant to its intent, you will also be doing security properly.

First, logging is not about crunching masses of data through a correlation engine, its about the RIGHT data put into a base-lined context. There is no such thing as log event correlation without a deep understanding of what the end systems SHOULD look like, AND what your normal business processes are from start to finish. In other words, tell any vendor trying to sell you compliance though their SIEM, to put it where the sun don’t shine.

While we’re on the subject of SIEMs, how many of them do you think can accept Windows logs natively, or have to convert the logs to syslog via an agent (e.g. Snare)? Very few. What’s the point of buying a log mechanism that cannot even read WINDOWS events without butchering them DOWN to syslog?! You MUST ask the right questions before buying ANYTHING, especially a SIEM.

OK, so how DO you go above and beyond? Simple, in one way;

Do NOT perform your log reviews daily (10.6), because that’s just plain stupid, not to mention impossible to do adequately. Perform log reviews in real-time via some form of automation.

The automation you need is threefold;

  1. Events you should NEVER see: Each system admin, from OS, to network device to application SHOULD know which they events should never be seen under normal operating conditions. Look for these ‘strings’ and alert immediately.
    o
  2. Events you should not see in a certain quantity and velocity (i.e. thresholds): I don’t care if I see an admin fail to log in once, I do care if s/he fails 10 times in 2 seconds (for example).
    o
  3. Quantity of events over the course of time (i.e. trending): You have to save logs for 1 year (DSS 10.7), so why not put them to good use by trending events over time? Even if it’s just quantity of event (as opposed to quantity of type of event per device), the information you get can be extremely useful.

Perform all three of these things, and you have not only covered the ridiculous ‘daily reviews’ automatically, you now have input into your incident response mechanism that gives you real security.

Choosing the right centralised logging mechanism for your business is one of the most important decisions you can make, and it cannot be done ONLY for PCI. You must buy a system that can cover you enterprise-wide, and unless you have significant in-house expertise, you must build into the RFP the requirement for consulting support, and potentially some form of on-going managed service. Nothing stays the same, so your future state / needs will also need to be taken into account.

Do NOT penny-pinch here, but don’t buy anything that’s not appropriate. Your risk assessment process should tell you exactly what you need, and if you’ve not done one, start there.

PCI – Going Beyond the Standard: Part 10, Anti-Virus

First, let me be clear; I hate anti-virus. I guess more accurately, I hate anti-virus companies who are still making squillions peddling their no-longer-relevant wares (in my opinion).

Blacklisting (i.e. signature based) end-point protection is meaningless and almost completely ineffective against zero-day attacks. It’s a game of constant catch-up that can (and will) never be won. Yet here we are, still buying anti-virus software because we don’t know better, and standards like the PCI DSS still call for it by name instead of dealing with the actual underlying issue.

What is the INTENT of anti-virus?  According to the DSS, you should;

5.1 Deploy anti-virus software on all systems commonly affected by malicious software (particularly personal computers and servers).

…and;

5.1.1 Ensure that anti-virus programs are capable of detecting, removing, and protecting against all known types of malicious software.

Commonly affected? As defined by whom? Clearly they mean Windows but can’t just come out and say it. Yes, other OSs are becoming increasingly affected by viruses, but would you call them common? More to the point; if you had to install and maintain anti-virus on all of your *nix and Apple products would YOU classify them as ‘commonly affected’?

No, neither would I.

The intent of anti-virus is sound; do not let bad stuff run on your systems. However, if you were doing security properly, would this not be basically redundant? Even PCI includes the means by which anti-virus becomes [in my view] excessive;

  1. Security Awareness Training (Req. 12.6) – If users were properly educated, a huge chunck of malware outbreaks would not happen in the first place. If your organisation does not have a very robust program for ongoing security training, they have missed the cheapest, and most effective security control that has, and will, ever exist. Ignorance is a choice, never an excuse.
    o
  2. Configuration Standards (Req. 2.x) – In my continuing theme of never backing up bold statements with actual facts, I will pronounce that the majority of malware out there is ONLY effective because the systems on which the malware is loaded are not configured correctly. Either the hardening guides are absent or inadequate, or the ongoing maintenance of the configurations was neglected.
    o
  3. Vulnerability Management (Req. 6.1) – If all you are relying on is patch releases from your OS vendors, then you deserve what you get. Vulnerability Management is everything from Patching, to Vulnerability Scanning, to Penetration Testing, to Change Control and Incident Response. Done well, vulnerability management is the only way you stand even half a chance of keeping up with the bad guys, but something I have personally never seen done well.
    o
  4. File Integrity Monitoring (Req 11.5) – Don’t buy Tripwire (I hate them too), but figure out a way to detect if a known-good file changes in some way. I have seen a client write an MD5 recursive hash on system32 and write the results to event logs for monitoring. It was free, and effective, but required significant expertise. All you’re trying to do here is make sure things stay the same, and it almost begs the question; Why have AV at all if you have FIM? This question becomes far more relevant the more of these points you master, but I will never negate the concept / cliché of defence-in-depth.
    o
  5. Logging & Monitoring (Req. 10.x) – In my opinion, nothing in your detective security portfolio is as important as this control, and can be used to create the most effective and ‘blanket’ compensating control for PCI there is. If you know what every system SHOULD be doing, anything NOT that is something to investigate. Daily review of log files is a farce, only real-time alerts triggered by base-line deviations makes sense, and should be the top of any organisation priorities to get right. Few do, and the majority of Managed / Cloud Security Services don’t do this either.
    o
  6. Incident Response (Req. 12.9) – Why bother being in business if you don’t intend staying in business? Incident Response can prevent an event from becoming a business crippling disaster, yet, like Vulnerability Management, is almost universally neglected. Do this one badly and I for one have no sympathy.

You should notice one unifying theme across all 6 of these controls; they have a significant process component, not technology. Most security is process, and yet PCI has driven more technology spend than all other compliance / regulatory standards in history combined (yes, that’s another fact-less statement, but I would be amazed if it wasn’t true). Anti-virus vendors, FIM vendors, logging vendors (and QSAs of course) have all made multi-millions from PCI, and not one of these vendors (including the QSAs) has ever made the effort to put their products into the proper context; A business focused solution that provides true benefit. Staying is business  IS an ROI!

All 6 factors will be addressed in their own Beyond The Standard posts, that should give some indication to their importance.

OK [deep breath], end of rant (and my longest blog of the series yet)! I’m not saying don’t use anti-virus if you believe it provides true benefit, and is not a massive capital / resource drain. But do NOT do it just because PCI says you should, do NOT rely on it, and focus your efforts on the above 6 factors as they are the things that actually meet the intent.

If you are only doing PCI minimums your QSA probably has no choice but to insist on AV (especially for Windows), your job is to give them an alternative.

Security Core Concepts

Security Core Concepts: Tying it All Together

With number 6, I completed my Security Core Concept series:

  1. Security Core Concept 1: Risk Assessment / Business Impact Analysis
    • Management buy-in
    • Examine your business processes 
    • Valuate and prioritise your data assets 
  2. Security Core Concept 2: Security Control Choice & Implementation
    • Perform gap analysis to determine control weaknesses
    • Mitigate control gaps based on priority
    • Think twice before throwing technology at a gap, perform robust vendor diligence if you do buy anything
  3. Security Core Concept 3: Security Management Systems
    • Make sure your controls are working
    • Begin control optimisation and measurement
    • Begin PDCA cycle 
  4. Security Core Concept 4: Governance & Change Control
    • Management buy-in
    • Interdepartmental co-operation and communication
    • Manage all business process and changes
  5. Security Core Concept 5: Incident Response (IR) & Disaster Recovery (DR)
    • Know what your baselines are
    • Standardise, centralise, and monitor
    • Test it, test it again, and keep testing it until everyone knows their part 
  6. Security Core Concept 6: Business Continuity Management (BCM) & Business As Usual (BAU)
    • Management buy-in (yes, that’s the THIRD time I’ve said that)
    • The goal is not security, it’s staying in business securely
    • BAU is where the true value of the core concepts is realised

…and have hopefully expressed in enough detail, the advantages of not only each of these steps, but of a security program ‘done right’.

What I have only alluded to, but will now examine in greater detail, is how the 6 Core Concepts make any regulation / standard / framework related to data security an afterthought.  Not that they aren’t useful, nor can they be ignored, but you’d already be ‘compliant’ with them.  The concepts themselves have been around for generations, and there are many treatises on each and every one.  There are even entire institutions dedicated to perfecting single concepts, but it’s only when you combine them all in a manner appropriate to YOUR business do they make sense.

I’m also talking about the fact that not one regulation the world-over goes deeper, and/or broader in their data security requirements, than you need to go for your business. They can’t, as the very names ‘standard’ and ‘framework’ automatically preclude, provide complete relevance to your organisation.  Nor can they possibly cover every nuance of every business type, sector, and culture.

What these regulations are trying to accomplish – but none state this outright – is a shift in culture away from function/profit only, to security enabled function/profit.  All 6 core concepts are basic fundamentals of security, yet are mostly ignored for reasons innumerable.  I hesitate to use the phrase business responsibility – I have enough issues with sounding like a lecturer – but that’s what it amounts to; you are responsible to protect the data in your possession.

But can you imagine going about your business, secure in the knowledge that no matter what security requirement or regulation gets thrown at you, you are already there!  Maybe not entirely, but the adjustments will be minimal, and will never require the level of effort that even PCI requires from you year after year.

In a nutshell;  If you do security properly, you will ALREADY be compliant with the security requirements of PCI / HIPAA / PoPI / SoX / SSAE-16 / GDPR / Swedish Personal Data Act / …and so on!

No more multiple annual audits / assessments (assuming you have some form of GRC tool), you will not only be compliant ALL the time, you can easily VALIDATE your compliance!

But none of this will be possible if your CEO doesn’t believe in it.  One again this phrase applies;

The CEO sets the tone for the entire company: its vision, its values, its direction, and its priorities.  If the organisation fails to achieve [enter business goal here], it’s the CEOs fault, and no-one else’s.

It does not matter what the goal, from PCI compliance, to great customer service, to an ethical salesforce, to a security culture that enables to business to grow responsibly, it’s the CEO who is responsible. And accountable.

I am pulling the following from where the sun doesn’t shine (no, not Scotland), but I would estimate that any time the CEO spends evangelising an appropriate security culture will be paid back 100-fold in terms of resource / capital / DR cost savings.

And it’s all so simple.  Not easy, but it is simple.

It may take years, even in smaller organisations, but the major costs are all front-loaded, and the long-term savings way in excess of the annual costs associated with constantly reacting.  Security is only effective if it’s mostly pro-active, and that’s exactly what the 6 Core Concepts are designed to do.

Don’t know where to start?

Ask.

[If you liked this article, please share! Want more like it, subscribe!]