[Dailydave] AI

Sven Krasser sven at crowdstrike.com
Wed Mar 30 15:12:45 EDT 2016


Hey Chris,

We’re on the same page, and I think this is a healthy discussion to have :) Both an understanding and an open mind are required to successfully use ML on complex and new problems.

Best,
-Sven

-- 
Sven Krasser, Ph.D.
Chief Scientist, CrowdStrike, Inc.
http://www.crowdstrike.com | http://tinyurl.com/cs-svenk

From:  "Smoak, Christopher" <Christopher.Smoak at gtri.gatech.edu>
Date:  Wednesday, March 30, 2016 at 10:58 AM
To:  Sven Krasser <sven at crowdstrike.com>, dave aitel <dave at immunityinc.com>, "dailydave at lists.immunityinc.com" <dailydave at lists.immunityinc.com>
Subject:  Re: [Dailydave] AI

Sven,

I definitely understand your point. Approaching the "when you have a hammer…" phenomenon is most certainly an issue in the machine learning field, especially to your .fit() point below. As sure as I am that such an issue exists, I also think there's room for, improperly phrased, "non-traditional" applications of these types of techniques in order to achieve some goal. I just don't want to make the blanket statement that "if it isn't an image, <insert technique>" won't work. I realize that's not necessarily your point, but I wanted to add some conversation fodder to what I consider to be a really interesting thread.

Agreed 100% on the "because ML" argument; I see it way too often. Frankly, it hurts all "legitimate" (used liberally here) uses of ML in that everything gets wrapped up in the jargon/marketing lingo and can't see beyond it. We seem to live in an industry fraught with those types of things. My point is simply that I don't want to over-punish the terminology enough so as to devalue the real contributions that can be made to the field using ML, as an example. Employed carefully, there are definitely ways to use it for great justice. :)

Anyway, just wanted to get some more thoughts going on this topic, as I think it's worth a longer discussion, albeit a slight digression.

Regards,

Chris Smoak
Georgia Tech Research Institute

From: Sven Krasser <sven at crowdstrike.com>
Date: Wednesday, March 30, 2016 at 1:31 PM
To: Christopher Smoak <Christopher.Smoak at gtri.gatech.edu>, dave aitel <dave at immunityinc.com>, "dailydave at lists.immunityinc.com" <dailydave at lists.immunityinc.com>
Subject: Re: [Dailydave] AI

Hey Chris,

Carefully phrased, I am very skeptical that transforming your instances into images and then using CNNs will give you an out-of-the-box performance bump over other traditional techniques. To me this looks like a classic “When you have a hammer every problem looks like a nail” approach. Can we develop representations of input data that will allow deep architectures to successfully learn the instance space? Yes, I’m sure we can — but that will require more work than downloading TF and running it over the data as Dave described in his email.

As far as technology in commercial products goes, my point is that primarily it is important that a product performs to a specific objective standard, regardless of the technologies used. Explaining why something performs is indeed important, but the answer to this cannot simply be “because Machine Learning” as we see presently (and which I assume prompted Dave to send his initial email). Everyone with rudimentary Python knowledge can go download sklearn right now and call .fit() on the Iris dataset. Congratulations, you just used Machine Learning. That doesn’t make for a compelling product, however.

Best,
-Sven

-- 
Sven Krasser, Ph.D.
Chief Scientist, CrowdStrike, Inc.
http://www.crowdstrike.com | http://tinyurl.com/cs-svenk

From: "Smoak, Christopher" <Christopher.Smoak at gtri.gatech.edu>
Date: Wednesday, March 30, 2016 at 10:03 AM
To: Sven Krasser <sven at crowdstrike.com>, dave aitel <dave at immunityinc.com>, "dailydave at lists.immunityinc.com" <dailydave at lists.immunityinc.com>
Subject: Re: [Dailydave] AI

Sven,

Your general point is well taken, however I'd contend that while most problems in security don't boil down to simple image classification tasks, there are certainly valid ways of using the unique spatial nature of CNNs to apply to security problems. Namely, mapping data that is not traditionally visual in nature to that of an image representing that data (e.g. binary -> png) can—and in my experience, has—yielded very promising results. Granted, it's debatable whether it's better to utilize a technique more suited to the original data set in lieu of transforming it into an image, but that's a conversation for another day. The bottom line is finding a model that consistently gives good results in context of the question being answered.

On the point just caring about the results and not about the technology/process involved, I'm not sure I agree. When we get into extremely complex technologies that give us binary, "good/bad" answers to not-so-simple questions, I think it's imperative to understand the basis upon which the technology arrived at the answer. It may not be feasible with commercial (read: intellectual property) solutions but is nonetheless important. An example can be found in dynamic malware analysis systems, where understanding the perspective from which data is collected helps frame the efficacy of the result with respect to potential detection by malware.

Just some food for thought.

Chris Smoak
Georgia Tech Research Institute

From: <dailydave-bounces at lists.immunityinc.com> on behalf of Sven Krasser <sven at crowdstrike.com>
Date: Wednesday, March 30, 2016 at 10:49 AM
To: dave aitel <dave at immunityinc.com>, "dailydave at lists.immunityinc.com" <dailydave at lists.immunityinc.com>
Subject: Re: [Dailydave] AI

Hey Dave,

You got some things right and some things wrong. In security, most problems are not image classification related and do not benefit at the same level from the recent advances in Convolutional Neural Networks. Also, TensorFlow is not the first freely available Deep Learning library nor is it the first freely available Machine Learning classification library by a long shot. Take a look at e.g. some of the presentations that the MLSec Project made available, ML has been in security products for decades (and I worked on shipping products with it back in the day working at CipherTrust before people cared what technology stopped the threats as long as they were stopped). What’s new is that Machine Learning now also appears on marketing materials. So the question one should ask oneself is whether you still have a product once the ML hype wore off.

Best,
-Sven

-- 
Sven Krasser, Ph.D.
Chief Scientist, CrowdStrike, Inc.
http://www.crowdstrike.com | http://tinyurl.com/cs-svenk

From: <dailydave-bounces at lists.immunityinc.com> on behalf of dave aitel <dave at immunityinc.com>
Date: Wednesday, March 30, 2016 at 5:56 AM
To: "dailydave at lists.immunityinc.com" <dailydave at lists.immunityinc.com>
Subject: [Dailydave] AI

There are only a few real computers in the world, and I think we are just beginning to feel their influence. For example, here is a sample project I am working on now that image classification is a solved problem.

Like many of you on this list, I dabble in brazilian jiu jitsu. In fact, in a week we are doing an open mat at INFILTRATE for both newcomers who've always wanted to try to choke me out, to people in the community who are already very good at choking people.

Like many sports, BJJ is typically scored according to a ruleset based on the different positions you end up in. Being on top is usually better. Being able to get on top after you are on the bottom is worth 2 points. Being able to completely mount someone is worth three points. Getting on their back is four points. Generally a tournament will hire judges and they will award points based on their understanding of the rules and their personal feelings towards the contestants and whatever other factors are floating in their heads.

What I'm working on is collecting a set of images of BJJ, then annotating them as to what positions the different people are in. This essentially maps every image into a vector space - and after training a neural network using modern techniques you can have a program that looks at an image and then outputs "Blue is in top mount". 

Part of the key here is that you don't have to tell it that the picture is BJJ. Every picture that program sees is two people doing BJJ. All it has to do is output what positions they are in.

And in the end, by assigning point values to transitions between positions, you will have an automatic BJJ judge. I've applied for a TensorFlow API key from Google since although this is not a hard problem by ML standards I want to do it the right way and get good scalable results on video later.

And of course, the same thing is true for the process information El Jefe will give you. All those "behavioral analysis machine learning intrusion detection" startups are about to be crushed by simple open source projects that use Google and MS and Amazon's exported Machine Learning APIs. 

-dave



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.immunityinc.com/pipermail/dailydave/attachments/20160330/8b763efe/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4363 bytes
Desc: not available
URL: <https://lists.immunityinc.com/pipermail/dailydave/attachments/20160330/8b763efe/attachment-0001.p7s>


More information about the Dailydave mailing list