[Dailydave] Encrypted Malware Traffic Detection == hilarious?

Dominique Brezinski dominique.brezinski at gmail.com
Wed Jun 21 12:05:32 EDT 2017


Let me tell a little story about statistical analysis of network traffic. I
may or may not have been associated with someone that built a very
large-scale, statistics-based detection mechanism using un-sampled network
flow and HTTP proxy logs. 3200 cores chugged through the trailing X weeks
of traffic, for hundreds of thousands of hosts, building usage profiles and
then measured the distance of the current day's activity for each host from
the baseline profile.

As a digression, in this unsupervised learning space all hope depends on
quality feature selection. Once quality features are selected/engineered,
the most basic of distance measures is sufficient to detect anomalies. The
detected anomalies are just that -- anomalies. With regard to threat
detection, the false positive and false negative rates are still likely too
high to operationalize. You can ML like a boss and use Symbolic Aggregate
Approximation (SAX) to represent your logs as images, then use a
Convolutional Neural Network (CNN) to do the feature extraction for you,
and feed the results through a Long Short Term Memory (LSTM) Recurrent
Neural Network (RNN) to detect the anomalies -- which is approximately what
Niara does based on their Spark Summit presentation. Or you can ML like a
security engineer and use domain knowledge to identify discriminating
features and use some simple Euclidean distance measures to detect
anomalies. I have done both with the same approximate results. That was a
statistics joke.

The result of all this statistical analysis is a set of finding about hosts
that deviate from normal by some measure on one or more features. So what?
Well that is exactly what the team responsible for triaging and
operationalizing alerts said. This is where the real work begins. Now if
the host communicated with novel domains among the population, for example,
the domains would be provided as evidence. The domain information could be
enriched with threat intel and results from services like OpenDNS. The
monitoring team still says, "yeah ok, it talked to some sketchy shit. What
are we really suppose to do about that? I mean really do, so we are not
scaling a very expensive whack-a-mole team?" Right.

Now we go pull all the process execution and process-to-network events from
the hosts. Now when a network anomaly occurs, you essentially build the
activity graph that resulted in the anomalous network traffic. This looks
actionable. It is.

The thing is, once you have that on-host activity, as Dave said some might
say, you really don't need the network data anymore. You get to the same
result earlier in the activity chain with actionable results, rich in
context that is easily assessed by analysts and incident responders. Even
better, you don't need to use statistics. There are better models using
this data that are quite good for detection and hunting.

Some of us like belts and suspenders when we have to depend on imperfect
techniques to mitigate risk, so network-level instrumentation presents data
from a plane with different attack surface that correlates with host data.
That is a nice feature if you can take advantage of it. Network data is
also OS/device independent. Building some anomaly detection on network data
provides broad coverage at a low engineering cost, however, the compute and
storage costs are usually quite high. There are a lot of trade-offs.
Honestly, most people get lost and never get clarity about what and how
they are trying to detect and whether the data and techniques align with
their desired results. They take an opportunistic stab at what data they
have and fall down the rabbit hole.

Dom

On Wed, Jun 21, 2017 at 7:25 AM, dave aitel <dave at immunityinc.com> wrote:

> Let's talk about the giant pile of wrong that is this reporting on
> Cisco's new marketing campaign
> <http://www.cnbc.com/2017/06/20/cisco-introduces-encrypted-traffic-analytics-to-detect-malwre.html>
> around detecting encrypted malware traffic. "This is a seminal moment in
> networking" is the quote from their CEO that CNBC decided to run. Let's
> revisit the basics of this "new" technology: do statistical analysis on
> encrypted data to find malware traffic.
>
> People have literally decoded conversations
> <https://www.schneier.com/blog/archives/2008/06/eavesdropping_o_2.html>
> from encrypted data using that same basic technique. Not even recently -
> that work is from 2008 and was not surprising even then.
>
> "The software, which will be offered as a subscription service, is
> currently in field trials with 75 customers, and according to Robbins, is
> 99 percent effective."
>
> 99% effective with the kind of traffic a normal network sees means you are
> FLOODED AND OVERWHELMED WITH FALSE POSITIVES. Although they don't specify
> what that number even means. Is it false positives? False negatives? Both?
> Let's just say this: 99.99% is useless when doing a network-based IDS. All
> that might get you is an indicator you can use to remotely load a more
> sophisticated remote tool onto an endpoint for further detailed analysis.
> You essentially, need BOTH if you have this level of network-based IDS, and
> the endpoint people will probably say you don't need the network sniffer
> anymore, because scaling good analysis at that level at anything near
> realtime is nearly impossible (c.f. Alex Stamos's talk
> <https://www.youtube.com/watch?v=2OTRU--HtLM>) to the point where they
> still try to sell you stuff that has 1% false positive rates. :)
>
> I'm going to bug our big customers to see if any of them are in this 75
> field trial and what they think in real life. And I'm going to be honest
> and say that if you are thinking of investing in this sort of thing, but
> you haven't tested it against Cobalt Strike
> <https://www.cobaltstrike.com/> and INNUENDO
> <https://www.immunityinc.com/products/innuendo/>, then you are knowingly
> buying snake oil. A good percentage of our consulting business right now is
> literally just that because these anomaly detection products are so
> expensive and so hard to test.
>
> Anyways, maybe I am wrong! If you are one of the privileged 75 and you
> love this and it is amazing, let me/us know!
>
> -dave
>
>
>
>
> _______________________________________________
> Dailydave mailing list
> Dailydave at lists.immunityinc.com
> https://lists.immunityinc.com/mailman/listinfo/dailydave
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.immunityinc.com/pipermail/dailydave/attachments/20170621/894b88f3/attachment.html>


More information about the Dailydave mailing list