Been Breached? The Worst is Yet to Come, Unless…

The information security sector is rife with negativity and pronouncements of doomsday, and while the title is no better, this blog is not meant to scare, but to provide an alternative view of the worst case scenario; a data breach and resulting forensics investigation. The fact remains that if your data is online, someone has the necessary skill-set and wants it badly enough, they are going to get it. So the sooner you prepare yourself for the inevitable, the better you will be able to prevent a security event from becoming a business-crippling disaster.

By the time you make your environment as hack-proof as humanly possible, the chances are you have spent far more money than the data you’re trying to protect was worth, which in security equates to career suicide. Instead, you are supposed to base your security posture on the only thing that matters; a business need, then maintain your security program with an on-going cycle of test > fix > test again.

Unfortunately what happens in the event of a breach is that you are told what was broken and how to fix it from a technical perspective. This is analogous to putting a plaster / band-aid on a gaping wound. You’re not actually fixing anything. A forensics investigation, instead of being seen as the perfect opportunity to re-examine the underlying security program, is seen as an embarrassment to be swept under the carpet as soon as possible. Sadly, valuable lessons are lost, and the organisation in question remains clearly in the sights of the attackers.

For example, let’s say a breach was caused by an un-patched server. The first thing you do is fix the server and get it back online, but all you have you have done is fix the symptom, not the underlying cause;

  1. How did you not KNOW your system was vulnerable? – Do you not have vulnerability scanning and penetration testing as an intrinsic part of a vulnerability management program?
  2. How did you not know your system wasn’t patched? – Is not patch management and on-going review of the external threats landscape also part of your vulnerability management program?
  3. Did the breach automatically trigger a deep-dive examination of your configuration standards to ensure that your base image was adjusted accordingly?
  4. Did you fix EVERY ‘like’ system or just the ones that were part of the breach?
  5. Did your policy and procedure review exercise make ALL necessary adjustments in light of the breach to ensure that individual accountability and requisite security awareness training was adjusted?
  6. Were Incident Response, Disaster Recovery and Business Continuity Plans all updated to incorporate the lessons learned?

And perhaps the most important part of any security program; Is the CEO finally paying attention? Ultimately this was their fault for not instilling a culture of security and individual responsibility, so if THIS doesn’t change, nothing will.

If the answer is no to most of these, you didn’t just not close the barn door after horse bolted, you left the door wide open AND forgot to get your horse back!

Most breaches are not the result of a highly skilled and concerted attack, but by those taking advantage of the results of  systemic neglect on the part of the target organisation. i.e. MOST organisations with an Internet presence! Therefore, organisations that can work towards security from the policies up, and the forensics report down, have a distinct advantage over those who do neither.

[Ed. Written in collaboration with Voodoo Technologies; Voodoo Technology, Ltd.]

PCI – Going Beyond the Standard: Part 24, Disaster Recovery (DR) & Business Continuity Management (BCM)

You may be wondering why I would put this after Governance seeing as that seems to bring everything together, and you may also be wondering why I did not included Disaster Recovery (DR) in the same post as Incident Response (IR) which everyone else always does.

They would be good questions, and my reasoning is relatively simple; You cannot HAVE Business Continuity Management (BCM) without Governance so that must be formalised first, DR represents the detailed processes summarised in the BCM, and IR is the feed INTO the DR/BCM, not the output from it.

To put it another way; the Business Continuity Plan (BCP) details what must be done, in what order, and how quickly to save the business, DR puts that plan into effect, and IR would have uncovered the inciting incident that brought both the BCP and DR plans into play in the first place.

Assuming that made any sense, the question is; What if I don’t HAVE a BCP?

I am surprised every time I ask a client for a BCP and don’t get one. Mostly because I’m not too bright, but partly because it makes absolutely no sense to me that ANY organisation in any industry sector, anywhere in the world would not make such a simple effort to help themselves STAY in business. While both DR and BCP represent what amounts to contingency planning and will hopefully never have to be invoked (assuming your IR is top notch of course), NOT having a plan is nothing short of irresponsible.

There are several well known standards related to Business Continuity, and for obvious reasons they encompass more than just IT systems:

  1. ISO 22301:2012: Societal security — Business continuity management systems – Requirements
    o
  2. ISO 22313:2012: Societal security — Business continuity management systems – Guidance
    o
  3. ISO/IEC 27031:2011: Information security – Security techniques — Guidelines for information and communication technology [ICT] readiness for business continuity
    o
  4. NIST Special Publication 800-34 Rev. 1, Contingency Planning Guide for Federal Information Systems
    o
  5. ANSI/ASIS SPC.1-2009 Organizational Resilience: Security, Preparedness, and Continuity Management Systems

Unfortunately the ISO stuff will set you back a few hundred quid, so start with the NIST / ANSI stuff to ge yourself familiar enough with the concept to at least ask the right questions.

For DR, start with mapping out all of your business processes and asset dependencies. If you don’t know how things fit together, you’ll have no idea how to put them back in place. Clearly, if your asset management processes are not robust, you can’t even begin the mapping process, so get that done first.

Once you have mapped out your business processes, it’s a relatively simple task to organise all of your procedural documentation into how you reestablish all the moving parts. You have all that, right? So whether you have full redundancy in all things, hot swap, warm spares or a whole host of other DR clichés, how you get your systems back online boils down to a series of easily followed instructions.

From an IT perspective, all the BCP plan does is tell you in which order to bring those systems back online and in what timeframe. It should be needless to say – but it isn’t – the plan and all of its moving parts must be tested on an annual basis or even explicit instructions cannot get the response times to an optimal state.

No aspect of security should be performed half-arsed, DR and BCP processes are no exception. Even within the field of security BCP is a speciality, and making the plan simple and appropriate is a talent more than a skill. Expect to pay a lot for these services but rest assured it is money well spent.

PCI – Going Beyond the Standard: Part 20, Incident Response (IR)

First, you may be asking why this blog does not include Disaster Recovery (DR) and Business Continuity Management (BCM, which governs the entire IR / DR process). Because the PCI DSS section 12.10.x is almost entirely related to IR (with the exception of a VERY brief nod to DR / BCP, below in red), I will handle DR / BCP separately in the series (post 23 in fact).

“12.10.1 – Create the incident response plan to be implemented in the event of system breach. Ensure the plan addresses the following, at a minimum:

    • Roles, responsibilities, and communication and contact strategies in the event of a compromise including notification of the payment brands, at a minimum
    • Specific incident response procedures
    • Business recovery and continuity procedures [This is the only requirement in the DSS that goes beyond the protection of CHD.]
    • Data backup processes
    • Analysis of legal requirements for reporting compromises * Coverage and responses of all critical system components
    • Reference or inclusion of incident response procedures from the payment brands.

With regard Incident Response, I put it this way; “What’s the point of being in business, if you don’t intend staying in business?”, and; “Good incident response is what prevents a security event from becoming a business crippling disaster.”

It makes absolutely no sense to me that organisations who basically depend on IT for significant chunks of income (which is most of them), have very little idea how to stop bad things from happening in the first place, let alone fix things when they go wrong. Of course, no incident response is going to predict an earthquake at the datacenter, but the organisations I’ve seen don’t even perform log monitoring properly, let alone consider the impact of acts of nature.

The development of a good incident response plan start with? Yep, a good policy, from there you agree on an appropriate Risk Assessment / Business Impact Analysis process, which in turn provides you everything you need to not only determine if you have any control gaps (after a gap analysis), but – if you’ve done it properly – a good indication of what your incident response and disaster recovery plans should entail.

There is no appropriate IR without an understanding of the business goals. If you have a 4 hour Recovery Time Objective (RTO), your IR will be significantly more robust than one where you can take a week to be back online. Yes, I know that RTOs (and RPOs (Recovery Point Objective for that matter) are DR terms, but if your incident response cannot detect a business crippling event in good time, then neither of those DR goals is an option for you.

When setting up your IR program, the most important word to keep in mind is ‘baseline’. Without a baseline, you don’t have much of a concept of what constitutes an incident in the first place. Only a baseline can give you both context and relevance.

From your baselined system configuration standards (DSS 2.x), to AV (DSS 5.x), to logging (DSS 10.x), to scanning (DSS 11.1.x, and 11.2.x), to FIM (DSS 11.5.x), you have many available inputs into your IR program, none of which will be of the slightest help if you don’t know what they SHOULD look like.

That’s all IR is;, a process whereby an exception to the norm is investigated, and appropriate action taken.

In each of my individual going-beyond-the-standard blogs related to the above DSS requirements, I have stressed the importance of baselining (well, except AV perhaps). The reason I did so was because they all lead up to this. I don’t care how well you have done ANY of the previous requirements, unless you can bring the outputs all together into a comprehensive process of taking action, all you have is a bunch of data to give to your forensics investigator.

You’ll notice though that I did not say a CENTRAL process, because while having a 24X7 Security Operations Centre t manage all of this, it’s rarely practical, even if it involves a outsourced managed service provider (MSP). However, having the correct assignments and procedures to MANAGE the response is of utmost importance, and the details of this plan will vary considerably from company to company.

No IR is not easy, but there is simply too much information and help out there for this difficulty to be any sort of excuse. And no, there is not much in this blog that actually provides guidance, but if this makes SENSE, then you at have at least got enough to begin to ask the right questions.

PCI – Going Beyond the Standard: Part 16, Logging & Monitoring

From everything I have seen in my many years performing PCI assessments, logging is not only one of the least understood of the requirements, it is the most under-utilised, and the one that gives/gave my clients the most pain.

Logging is the most important detective security control you have, bar none, and done correctly, logging is the foundation of your incident response program. Notice I didn’t say ‘disaster recovery’ as well, because if your incident response was where it should be you should not HAVE to recover from a disaster.

The confusion stems mostly from a lack of understanding of logging mechanisms themselves, even for Windows (for which PCI was clearly written). For example, do you think that Windows logs to the PCI requirements out of the box? I did too, but have been assured that it does not. Do you know HOW to get it to log appropriately? No, me either.

I have been further assured this if you WERE to turn logging on to cover the 10.2.X requirements, the logging would be so verbose as to render the device that’s doing the logging useless. Is that really the INTENT of logging? Of course not.

Also, can syslog EVER record the events required in 10.2.x, or even the event content as required in 10.3.x? Once again, I have been told no, but I am no expert.

Yes, you SHOULD have people who DO know this stuff, but how many organisations out there can truly afford that kind of deep expertise in-house? Yes you can outsource, but where is the guidance on EXACTLY how to configure operating systems to log to the PCI requirements? It probably exists, but in 10 years of doing PCI I have not found it, and I’ve even asked ‘experts’ in the field; Security Incident and Event Monitoring (SIEM) vendors. On that note, I have yet to see a SIEM vendor also be an expert in PCI, which to me is an absolute joke if that’s why they are selling it to their clients.

We have free hardening guides for Windows (CIS Security Benchmarks for example), but where is the guide that breaks down the Windows operating system into a mapping between the registry settings for logging and the PCI DSS? Or *nix flavours, or Cisco, or AS400, or Power Series? If you have them, please share?!

So let’s, for the sake of argument, assume that you cannot reasonably log to the letter of PCI, or more to the point, you do not WANT to for usability issues. Are you non-compliant? Let me answer that with another question; What’s more important, configuring your logging to record events for a forensics investigation, or configure your logging to help prevent a breach in the first place?

If you chose the second one, you are correct, and if you also choose to maximise your logging mechanisms, you will not only be PCI compliant to its intent, you will also be doing security properly.

First, logging is not about crunching masses of data through a correlation engine, its about the RIGHT data put into a base-lined context. There is no such thing as log event correlation without a deep understanding of what the end systems SHOULD look like, AND what your normal business processes are from start to finish. In other words, tell any vendor trying to sell you compliance though their SIEM, to put it where the sun don’t shine.

While we’re on the subject of SIEMs, how many of them do you think can accept Windows logs natively, or have to convert the logs to syslog via an agent (e.g. Snare)? Very few. What’s the point of buying a log mechanism that cannot even read WINDOWS events without butchering them DOWN to syslog?! You MUST ask the right questions before buying ANYTHING, especially a SIEM.

OK, so how DO you go above and beyond? Simple, in one way;

Do NOT perform your log reviews daily (10.6), because that’s just plain stupid, not to mention impossible to do adequately. Perform log reviews in real-time via some form of automation.

The automation you need is threefold;

  1. Events you should NEVER see: Each system admin, from OS, to network device to application SHOULD know which they events should never be seen under normal operating conditions. Look for these ‘strings’ and alert immediately.
    o
  2. Events you should not see in a certain quantity and velocity (i.e. thresholds): I don’t care if I see an admin fail to log in once, I do care if s/he fails 10 times in 2 seconds (for example).
    o
  3. Quantity of events over the course of time (i.e. trending): You have to save logs for 1 year (DSS 10.7), so why not put them to good use by trending events over time? Even if it’s just quantity of event (as opposed to quantity of type of event per device), the information you get can be extremely useful.

Perform all three of these things, and you have not only covered the ridiculous ‘daily reviews’ automatically, you now have input into your incident response mechanism that gives you real security.

Choosing the right centralised logging mechanism for your business is one of the most important decisions you can make, and it cannot be done ONLY for PCI. You must buy a system that can cover you enterprise-wide, and unless you have significant in-house expertise, you must build into the RFP the requirement for consulting support, and potentially some form of on-going managed service. Nothing stays the same, so your future state / needs will also need to be taken into account.

Do NOT penny-pinch here, but don’t buy anything that’s not appropriate. Your risk assessment process should tell you exactly what you need, and if you’ve not done one, start there.

PCI – Going Beyond the Standard: Part 10, Anti-Virus

First, let me be clear; I hate anti-virus. I guess more accurately, I hate anti-virus companies who are still making squillions peddling their no-longer-relevant wares (in my opinion).

Blacklisting (i.e. signature based) end-point protection is meaningless and almost completely ineffective against zero-day attacks. It’s a game of constant catch-up that can (and will) never be won. Yet here we are, still buying anti-virus software because we don’t know better, and standards like the PCI DSS still call for it by name instead of dealing with the actual underlying issue.

What is the INTENT of anti-virus?  According to the DSS, you should;

5.1 Deploy anti-virus software on all systems commonly affected by malicious software (particularly personal computers and servers).

…and;

5.1.1 Ensure that anti-virus programs are capable of detecting, removing, and protecting against all known types of malicious software.

Commonly affected? As defined by whom? Clearly they mean Windows but can’t just come out and say it. Yes, other OSs are becoming increasingly affected by viruses, but would you call them common? More to the point; if you had to install and maintain anti-virus on all of your *nix and Apple products would YOU classify them as ‘commonly affected’?

No, neither would I.

The intent of anti-virus is sound; do not let bad stuff run on your systems. However, if you were doing security properly, would this not be basically redundant? Even PCI includes the means by which anti-virus becomes [in my view] excessive;

  1. Security Awareness Training (Req. 12.6) – If users were properly educated, a huge chunck of malware outbreaks would not happen in the first place. If your organisation does not have a very robust program for ongoing security training, they have missed the cheapest, and most effective security control that has, and will, ever exist. Ignorance is a choice, never an excuse.
    o
  2. Configuration Standards (Req. 2.x) – In my continuing theme of never backing up bold statements with actual facts, I will pronounce that the majority of malware out there is ONLY effective because the systems on which the malware is loaded are not configured correctly. Either the hardening guides are absent or inadequate, or the ongoing maintenance of the configurations was neglected.
    o
  3. Vulnerability Management (Req. 6.1) – If all you are relying on is patch releases from your OS vendors, then you deserve what you get. Vulnerability Management is everything from Patching, to Vulnerability Scanning, to Penetration Testing, to Change Control and Incident Response. Done well, vulnerability management is the only way you stand even half a chance of keeping up with the bad guys, but something I have personally never seen done well.
    o
  4. File Integrity Monitoring (Req 11.5) – Don’t buy Tripwire (I hate them too), but figure out a way to detect if a known-good file changes in some way. I have seen a client write an MD5 recursive hash on system32 and write the results to event logs for monitoring. It was free, and effective, but required significant expertise. All you’re trying to do here is make sure things stay the same, and it almost begs the question; Why have AV at all if you have FIM? This question becomes far more relevant the more of these points you master, but I will never negate the concept / cliché of defence-in-depth.
    o
  5. Logging & Monitoring (Req. 10.x) – In my opinion, nothing in your detective security portfolio is as important as this control, and can be used to create the most effective and ‘blanket’ compensating control for PCI there is. If you know what every system SHOULD be doing, anything NOT that is something to investigate. Daily review of log files is a farce, only real-time alerts triggered by base-line deviations makes sense, and should be the top of any organisation priorities to get right. Few do, and the majority of Managed / Cloud Security Services don’t do this either.
    o
  6. Incident Response (Req. 12.9) – Why bother being in business if you don’t intend staying in business? Incident Response can prevent an event from becoming a business crippling disaster, yet, like Vulnerability Management, is almost universally neglected. Do this one badly and I for one have no sympathy.

You should notice one unifying theme across all 6 of these controls; they have a significant process component, not technology. Most security is process, and yet PCI has driven more technology spend than all other compliance / regulatory standards in history combined (yes, that’s another fact-less statement, but I would be amazed if it wasn’t true). Anti-virus vendors, FIM vendors, logging vendors (and QSAs of course) have all made multi-millions from PCI, and not one of these vendors (including the QSAs) has ever made the effort to put their products into the proper context; A business focused solution that provides true benefit. Staying is business  IS an ROI!

All 6 factors will be addressed in their own Beyond The Standard posts, that should give some indication to their importance.

OK [deep breath], end of rant (and my longest blog of the series yet)! I’m not saying don’t use anti-virus if you believe it provides true benefit, and is not a massive capital / resource drain. But do NOT do it just because PCI says you should, do NOT rely on it, and focus your efforts on the above 6 factors as they are the things that actually meet the intent.

If you are only doing PCI minimums your QSA probably has no choice but to insist on AV (especially for Windows), your job is to give them an alternative.