[Dailydave] Encrypted Malware Traffic Detection == hilarious?

Dave Aitel dave.aitel at gmail.com
Wed Jun 21 13:45:32 EDT 2017


To be fair, the advantage of the network position is it avoids interference
with your host-protection programs (aka, implants). And evading on the host
is possible too. But both are probably necessary at some level.

On Wed, Jun 21, 2017 at 1:41 PM Dominique Brezinski <
dominique.brezinski at gmail.com> wrote:

> Let me tell a little story about statistical analysis of network traffic.
> I may or may not have been associated with someone that built a very
> large-scale, statistics-based detection mechanism using un-sampled network
> flow and HTTP proxy logs. 3200 cores chugged through the trailing X weeks
> of traffic, for hundreds of thousands of hosts, building usage profiles and
> then measured the distance of the current day's activity for each host from
> the baseline profile.
>
> As a digression, in this unsupervised learning space all hope depends on
> quality feature selection. Once quality features are selected/engineered,
> the most basic of distance measures is sufficient to detect anomalies. The
> detected anomalies are just that -- anomalies. With regard to threat
> detection, the false positive and false negative rates are still likely too
> high to operationalize. You can ML like a boss and use Symbolic Aggregate
> Approximation (SAX) to represent your logs as images, then use a
> Convolutional Neural Network (CNN) to do the feature extraction for you,
> and feed the results through a Long Short Term Memory (LSTM) Recurrent
> Neural Network (RNN) to detect the anomalies -- which is approximately what
> Niara does based on their Spark Summit presentation. Or you can ML like a
> security engineer and use domain knowledge to identify discriminating
> features and use some simple Euclidean distance measures to detect
> anomalies. I have done both with the same approximate results. That was a
> statistics joke.
>
> The result of all this statistical analysis is a set of finding about
> hosts that deviate from normal by some measure on one or more features. So
> what? Well that is exactly what the team responsible for triaging and
> operationalizing alerts said. This is where the real work begins. Now if
> the host communicated with novel domains among the population, for example,
> the domains would be provided as evidence. The domain information could be
> enriched with threat intel and results from services like OpenDNS. The
> monitoring team still says, "yeah ok, it talked to some sketchy shit. What
> are we really suppose to do about that? I mean really do, so we are not
> scaling a very expensive whack-a-mole team?" Right.
>
> Now we go pull all the process execution and process-to-network events
> from the hosts. Now when a network anomaly occurs, you essentially build
> the activity graph that resulted in the anomalous network traffic. This
> looks actionable. It is.
>
> The thing is, once you have that on-host activity, as Dave said some might
> say, you really don't need the network data anymore. You get to the same
> result earlier in the activity chain with actionable results, rich in
> context that is easily assessed by analysts and incident responders. Even
> better, you don't need to use statistics. There are better models using
> this data that are quite good for detection and hunting.
>
> Some of us like belts and suspenders when we have to depend on imperfect
> techniques to mitigate risk, so network-level instrumentation presents data
> from a plane with different attack surface that correlates with host data.
> That is a nice feature if you can take advantage of it. Network data is
> also OS/device independent. Building some anomaly detection on network data
> provides broad coverage at a low engineering cost, however, the compute and
> storage costs are usually quite high. There are a lot of trade-offs.
> Honestly, most people get lost and never get clarity about what and how
> they are trying to detect and whether the data and techniques align with
> their desired results. They take an opportunistic stab at what data they
> have and fall down the rabbit hole.
>
> Dom
>
> On Wed, Jun 21, 2017 at 7:25 AM, dave aitel <dave at immunityinc.com> wrote:
>
>> Let's talk about the giant pile of wrong that is this reporting on
>> Cisco's new marketing campaign
>> <http://www.cnbc.com/2017/06/20/cisco-introduces-encrypted-traffic-analytics-to-detect-malwre.html>
>> around detecting encrypted malware traffic. "This is a seminal moment in
>> networking" is the quote from their CEO that CNBC decided to run. Let's
>> revisit the basics of this "new" technology: do statistical analysis on
>> encrypted data to find malware traffic.
>>
>> People have literally decoded conversations
>> <https://www.schneier.com/blog/archives/2008/06/eavesdropping_o_2.html>
>> from encrypted data using that same basic technique. Not even recently -
>> that work is from 2008 and was not surprising even then.
>>
>> "The software, which will be offered as a subscription service, is
>> currently in field trials with 75 customers, and according to Robbins, is
>> 99 percent effective."
>>
>> 99% effective with the kind of traffic a normal network sees means you
>> are FLOODED AND OVERWHELMED WITH FALSE POSITIVES. Although they don't
>> specify what that number even means. Is it false positives? False
>> negatives? Both? Let's just say this: 99.99% is useless when doing a
>> network-based IDS. All that might get you is an indicator you can use to
>> remotely load a more sophisticated remote tool onto an endpoint for further
>> detailed analysis. You essentially, need BOTH if you have this level of
>> network-based IDS, and the endpoint people will probably say you don't need
>> the network sniffer anymore, because scaling good analysis at that level at
>> anything near realtime is nearly impossible (c.f. Alex Stamos's talk
>> <https://www.youtube.com/watch?v=2OTRU--HtLM>) to the point where they
>> still try to sell you stuff that has 1% false positive rates. :)
>>
>> I'm going to bug our big customers to see if any of them are in this 75
>> field trial and what they think in real life. And I'm going to be honest
>> and say that if you are thinking of investing in this sort of thing, but
>> you haven't tested it against Cobalt Strike
>> <https://www.cobaltstrike.com/> and INNUENDO
>> <https://www.immunityinc.com/products/innuendo/>, then you are knowingly
>> buying snake oil. A good percentage of our consulting business right now is
>> literally just that because these anomaly detection products are so
>> expensive and so hard to test.
>>
>> Anyways, maybe I am wrong! If you are one of the privileged 75 and you
>> love this and it is amazing, let me/us know!
>>
>> -dave
>>
>>
>>
>>
>> _______________________________________________
>> Dailydave mailing list
>> Dailydave at lists.immunityinc.com
>> https://lists.immunityinc.com/mailman/listinfo/dailydave
>>
>>
> _______________________________________________
> Dailydave mailing list
> Dailydave at lists.immunityinc.com
> https://lists.immunityinc.com/mailman/listinfo/dailydave
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.immunityinc.com/pipermail/dailydave/attachments/20170621/1247c666/attachment-0001.html>


More information about the Dailydave mailing list