[Dailydave] Assymetry

Mon Apr 11 14:04:37 EDT 2016

I figured I'd chime in as someone who builds security machine learning
models as part of his day job.  A few hopefully not-too-incongruous
observations:

1) Most security problems are not machine learning problems.  Like
encryption, dual-factor authentication, taint analysis, or hand-crafted
IOCs, machine learning is just one of many security tools.  But somehow
people outside of machine learning seem to think a) machine learning can be
applied everywhere and replace every other approach or b) machine learning
can be applied nowhere, always underperforms, and is marketing snake oil.
The people who believe a) are bound to be disappointed and the people who
believe b) are bound to be blindsided when they wake up and realize machine
learning has become an important ingredient in the network defense
landscape.

2) For a working security data scientist, much of the ingenuity to
developing a successful machine learning product is in picking problems
that *are* good machine learning problems and not going down the rabbit
hole of problems that aren't.  Unsupervised clustering of malware to help
identify new malware families or link threat actors -- that's a good
problem, and systems that do this are currently deployed to good effect,
but can probably be improved upon.  Detecting and classifying malware is
another good one that's already been productized but merits continued
research.  Setting firewall policy or predicting which users on a network
will commit treason or sell your trade secrets is not a good machine
learning problem and probably won't be in the foreseeable future, even
though I'll bet there are products on the market that claim to do these
things.

3) For a problem to be a good security machine learning problem you need a
continuously replenished source of good data, because security models go
out of date as adversaries evolve if the models don't evolve along with
them.  If you don't have good data at scale (and this includes *ground
truth* with respect to this data) machine learning is the wrong approach.
For example, because we don't have thousands of examples of employees going
rogue and selling trade secrets (at least I don't) a machine learning
approach to detecting such employees doesn't make sense.

4) To echo what Sven said, custom modeling for a given security
application, which involves mostly either feature engineering or custom
crafting of deep learning models that automate a portion of the feature
engineering process, is the main work of a security data scientist.  In my
experience, wholesale adoption of approaches from other fields never
works.  For one, the statistics of the problem are totally different: in
the detection use case, we tend only to care about the performance of a
model in the extremely low false positive rate region, which changes the
modeling goals from many non-security applications.  And secondly, security
is just different from computer vision, text mining, etc., and in my
experience requires custom solutions to perform well.

Best,
Josh

On Fri, Apr 1, 2016 at 9:59 PM, <Robin.Lowe at forces.gc.ca> wrote:

> Good day all,
>
>
>
> Just a couple things I thought of while reading the earlier discussion on
> AI and this follow-up email. Just some, as Chris so eloquently put it
> earlier, conversation fodder.
>
>
>
> I think one thing we have to keep in mind is that the underlying framework
> behind machine learning is still a machine. An issue I can see about this
> is who is accountable for if it fails? If we’re talking about national
> security, what’s the risk that someone will be willing to take on in order
> to prove that their new machine learning intrusion detection system works
> 100% of the time? The number of hours that would be required to amass the
> amount of data needed to seed the system would be substantial, even on its
> own.
>
>
>
> There’s also the possibility of false positives being generated by
> erroneous data. Sure, an listening meterpreter shell on port 4444 is pretty
> damn obvious, but what about, say, Cobalt Strike’s Beacon system? Will the
> people developing the IDS need to spend thousands of dollars throwing all
> of these expensive network auditing programs at it in order to generate the
> data necessary to make it accurate even 90% of the time?
>
>
>
> Also, the budget just for personnel would be pretty high. You’d need
> people in R&D, maintenance, actually checking flagged intrusion attempts,
> etc.
>
>
>
> One last thing before I start in on the possible positives is that the
> machine itself might be prone to exploitation. Similar to how getting into
> domain controllers and hypervisors are pretty much endgame states, what if
> you broke into the IDS itself and started messing with its signatures?
> Seems like a few things to think about.
>
>
>
> However, some cost-reducing factors are that it’s always looking. And
> faster than a person can. Sure, there are some blue teams that are
> basically machines at this point, I can definitely see a time where
> machines can take over that facet of security.
>
>
>
> You don’t have to pay it a salary, just keep the machine happy with
> electricity and known behaviours and it’ll chug along.
>
>
>
> Kind of starting to sound like an antivirus program but one that looks at
> networks instead of files.
>
>
>
> New to this sort of thing so sorry if I mentioned something that would be
> considered common knowledge or just plain nonsense.
>
>
>
> Cheers,
>
>
>
> Leading Seaman/Matelot de 1re classe Robin Lowe
>
>
>
> Naval Communicator, HMCS EDMONTON
>
> Department of National Defence / Government of Canada
>
> *Robin.Lowe at forces.gc.ca <Robin.Lowe at forces.gc.ca>* / Tel: 250-363-7940
>
>
>
> Communicateur Naval, NCSM EDMONTON
>
> Ministère de la Défense nationale / Gouvernement du Canada
>
> *Robin.Lowe at forces.gc.ca <Robin.Lowe at forces.gc.ca>* / Tel: 250-363-7940
>
>
> *“**The quieter you are, the more you are able to hear.”*
>
>
>
> *From:* dailydave-bounces at lists.immunityinc.com [mailto:
> dailydave-bounces at lists.immunityinc.com] *On Behalf Of *Dave Aitel
> *Sent:* April-01-16 11:36 AM
> *To:* dailydave at lists.immunityinc.com
> *Subject:* [Dailydave] Assymetry
>
>
>
> One possible long-lasting cause of the "asymmetry" everyone talks about is
> that US defenders get quite high salaries compared to Chinese attackers (I
> assume, not being a Chinese attacker it's hard to know for sure).
>
>
>
> Just in pure "dollars spent vs dollars spent" it seems like it would be
> three times cheaper to be a Chinese attacker at that rate?
>
>
>
> But I think it's still a question whether or not machine learning
> techniques make surveillance cheaper than intrusion as a rule. What if it
> does? What would that change about our national strategy? (And if it
> DOESN'T then why bother?)
>
>
>
> -dave
>
>
>
> _______________________________________________
> Dailydave mailing list
> Dailydave at lists.immunityinc.com
> https://lists.immunityinc.com/mailman/listinfo/dailydave
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.immunityinc.com/pipermail/dailydave/attachments/20160411/4c959c9e/attachment-0001.html>