<div dir="ltr">I&#39;m a huge fan of that approach to DB monitoring (passive tap -&gt; reassemble -&gt; protocol parse -&gt; parse SQL syntax -&gt; analyze).<div><br><div><div>Applications have a limited set of queries that they make and when you can model those queries in terms of an AST, you can remove the data (aka the literals) and simply hash the control logic then build a blume filter for each application to do near-wire-speed lookups.  If something triggers, either an app has updated (which is simple to verify), or the control logic has changed via some form of injection (or 3rd your baseline wasn&#39;t accurate which is a wholly separate topic).  No fancy machine learning necessary.  Users are a little harder to model but we start by assessing how much access to data do they actually exercise and alert on large increases.  A stealthy attacker can stay under the radar by not doing anything aggressive but most attackers will fall right into it.</div><div><br></div><div>I haven&#39;t (yet) spent to time reversing M$ wire protocols or doing this on closed source grammars but it is possible to do query logging and feed them into the AST generator some other way.  Agents are tricky but you can make them light weight by just scraping off the socket and still doing your protocol analysis rather than a more invasive debugger based approach.</div><div><br></div><div>I started down this route because most people don&#39;t really understand their data models, and the apps and users that use them.  Once you get into that space there is all kinds of benefits like auditing app privileges versus columns actually read/written to start enacting better practice of principle of least privilege, spotting apps that don&#39;t use prepared statements just from the wire protocol, analyzing source IP and command styles for database admins, spotting shared accounts, weak passwords, identifying high privileged accounts, etc.  My point is there is a large amount of data that can enable better decision capabilities.  I don&#39;t agree that it&#39;s as complex as the NSA signals problems, but then again I don&#39;t see new database features being developed every 2 weeks that the devs are just dying to take advantage of so having to keep up with ensuring your grammars have semantic predicates for every major version of your fancy dialect isn&#39;t really a problem, you just gotta staff a guy that understands compilers for when shit breaks.</div><div><br></div><div>When you start thinking about parsing data bases like that it starts to remind me of the techniques applied by Ram Shankar and Sacha Faust in their Data Driven Offense excellent <a href="https://vimeo.com/133292422">talk</a> only instead of an LDAP database, we&#39;re talking about some other type and applying similar techniques for different ends.</div><div><br></div><div>Z</div><div><br></div><div><br></div></div></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Oct 23, 2015 at 9:18 AM, Dave Aitel <span dir="ltr">&lt;<a href="mailto:dave@immunityinc.com" target="_blank">dave@immunityinc.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

  <div bgcolor="#FFFFFF" text="#000000">

    I wanted to talk about patents in our industry, but I can&#39;t because

    everyone is all like &quot;Software patents are evil&quot; _until they get

    one_ and it gives me the sads. <br>

    <br>

    So instead I&#39;m going to talk about this company I saw yesterday,

    which is basically this simple diagram:<br>

    <br>

    Web App<br>

    [span port of your mid-tier] -----&gt; [Parser for TDS] ---&gt;

    [Machine learning to find SQLi]<br>

    <br>

    The good things about being on the network stack is that you can get

    access to clusters. The bad thing is that every minor change of the

    TDS stack or SQL syntax or anything of that nature means your system

    starts failing. And you have to auto-detect all possible variation

    in the network traffic because you&#39;re modeling what happens in an

    immensely complex piece of software on one side that you don&#39;t have

    access to. <br>

    <br>

    To avoid all possible ambiguity: This is an impossible problem to

    get right, even if you limit it to &quot;parse one version of TDS exactly

    the same as SQL Server 2010 at a known patch level&quot;. <br>

    <br>

    The other option is to install debugger-like instrumenters on every

    DB server. In fact, a script to do this came out with an early

    version of Immunity Debugger, which integrated with SPIKE Proxy so

    you could scan for SQL Injection and use the feedback loop to guide

    your scanner around filters and false positives. The downside is of

    course having to install things on every DB server. In theory MS

    would release an API that allows a logical &quot;span port&quot; that gave you

    ever SQL request, and I bet there IS one somewhere in the auditing

    section.<br>

    <br>

    Aside from the horribleness of every possible solution in that area,

    which probably STILL works better than a few other things, I wanted

    to point out a KEY sentence you might have missed in the <a href="https://assets.documentcloud.org/documents/2426450/read-the-nsc-draft-options-paper-on-strategic.pdf" target="_blank">Crypto-War

      guidelines</a> the administration pointed out. It was this:

    Without <a href="https://lists.immunityinc.com/pipermail/dailydave/2015-September/001016.html" target="_blank">voluntary

    </a>and enthusiastic help from Apple and Google, really bad things

    we won&#39;t specify will happen, even if we force it all to be in

    cleartext. That &quot;parse all variations of TDS&quot; problem that we just

    looked at is the same as the SIGINT problem faced by the

    FBI/NSA/etc. Even WITH THE KEYS, the problem is completely

    intractable if Google and Apple and Microsoft want to make it so. <br>

    <br>

    I can hear Google&#39;s lawyers now: &quot;Oh, we delivered you our latest

    protocol spec sheet, every two weeks as promised. Of course, our

    spec changes every two weeks right after we deliver it, and you are

    always out of date, and even if you WERE in date, only our software

    knows which version anyone is at at any given time, and parsing it

    incorrectly means you are wildly wrong, and if you can&#39;t provide a

    provably correct parser, no court will accept your analysis, etc.

    Hey, did we mention that every block is not encrypted, but of course

    it is XORed with this value which we calculate with the most crazy

    slow algorithm we could find, one million times. That&#39;s just this

    week though. Next week we are reversing every block, but we aren&#39;t

    going to update the version number on the wire.&quot; <br>

    <br>

    Just food for thought! ;)<span class="HOEnZb"><font color="#888888"><br>

    -dave<br>

    <br>

    <br>

     <br>

  </font></span></div>

<br>_______________________________________________<br>

Dailydave mailing list<br>

<a href="mailto:Dailydave@lists.immunityinc.com">Dailydave@lists.immunityinc.com</a><br>

<a href="https://lists.immunityinc.com/mailman/listinfo/dailydave" rel="noreferrer" target="_blank">https://lists.immunityinc.com/mailman/listinfo/dailydave</a><br>

<br></blockquote></div><br></div>