<div dir="ltr">I'm a huge fan of that approach to DB monitoring (passive tap -> reassemble -> protocol parse -> parse SQL syntax -> analyze).<div><br><div><div>Applications have a limited set of queries that they make and when you can model those queries in terms of an AST, you can remove the data (aka the literals) and simply hash the control logic then build a blume filter for each application to do near-wire-speed lookups. If something triggers, either an app has updated (which is simple to verify), or the control logic has changed via some form of injection (or 3rd your baseline wasn't accurate which is a wholly separate topic). No fancy machine learning necessary. Users are a little harder to model but we start by assessing how much access to data do they actually exercise and alert on large increases. A stealthy attacker can stay under the radar by not doing anything aggressive but most attackers will fall right into it.</div><div><br></div><div>I haven't (yet) spent to time reversing M$ wire protocols or doing this on closed source grammars but it is possible to do query logging and feed them into the AST generator some other way. Agents are tricky but you can make them light weight by just scraping off the socket and still doing your protocol analysis rather than a more invasive debugger based approach.</div><div><br></div><div>I started down this route because most people don't really understand their data models, and the apps and users that use them. Once you get into that space there is all kinds of benefits like auditing app privileges versus columns actually read/written to start enacting better practice of principle of least privilege, spotting apps that don't use prepared statements just from the wire protocol, analyzing source IP and command styles for database admins, spotting shared accounts, weak passwords, identifying high privileged accounts, etc. My point is there is a large amount of data that can enable better decision capabilities. I don't agree that it's as complex as the NSA signals problems, but then again I don't see new database features being developed every 2 weeks that the devs are just dying to take advantage of so having to keep up with ensuring your grammars have semantic predicates for every major version of your fancy dialect isn't really a problem, you just gotta staff a guy that understands compilers for when shit breaks.</div><div><br></div><div>When you start thinking about parsing data bases like that it starts to remind me of the techniques applied by Ram Shankar and Sacha Faust in their Data Driven Offense excellent <a href="https://vimeo.com/133292422">talk</a> only instead of an LDAP database, we're talking about some other type and applying similar techniques for different ends.</div><div><br></div><div>Z</div><div><br></div><div><br></div></div></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Oct 23, 2015 at 9:18 AM, Dave Aitel <span dir="ltr"><<a href="mailto:dave@immunityinc.com" target="_blank">dave@immunityinc.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
I wanted to talk about patents in our industry, but I can't because
everyone is all like "Software patents are evil" _until they get
one_ and it gives me the sads. <br>
<br>
So instead I'm going to talk about this company I saw yesterday,
which is basically this simple diagram:<br>
<br>
Web App<br>
[span port of your mid-tier] -----> [Parser for TDS] --->
[Machine learning to find SQLi]<br>
<br>
The good things about being on the network stack is that you can get
access to clusters. The bad thing is that every minor change of the
TDS stack or SQL syntax or anything of that nature means your system
starts failing. And you have to auto-detect all possible variation
in the network traffic because you're modeling what happens in an
immensely complex piece of software on one side that you don't have
access to. <br>
<br>
To avoid all possible ambiguity: This is an impossible problem to
get right, even if you limit it to "parse one version of TDS exactly
the same as SQL Server 2010 at a known patch level". <br>
<br>
The other option is to install debugger-like instrumenters on every
DB server. In fact, a script to do this came out with an early
version of Immunity Debugger, which integrated with SPIKE Proxy so
you could scan for SQL Injection and use the feedback loop to guide
your scanner around filters and false positives. The downside is of
course having to install things on every DB server. In theory MS
would release an API that allows a logical "span port" that gave you
ever SQL request, and I bet there IS one somewhere in the auditing
section.<br>
<br>
Aside from the horribleness of every possible solution in that area,
which probably STILL works better than a few other things, I wanted
to point out a KEY sentence you might have missed in the <a href="https://assets.documentcloud.org/documents/2426450/read-the-nsc-draft-options-paper-on-strategic.pdf" target="_blank">Crypto-War
guidelines</a> the administration pointed out. It was this:
Without <a href="https://lists.immunityinc.com/pipermail/dailydave/2015-September/001016.html" target="_blank">voluntary
</a>and enthusiastic help from Apple and Google, really bad things
we won't specify will happen, even if we force it all to be in
cleartext. That "parse all variations of TDS" problem that we just
looked at is the same as the SIGINT problem faced by the
FBI/NSA/etc. Even WITH THE KEYS, the problem is completely
intractable if Google and Apple and Microsoft want to make it so. <br>
<br>
I can hear Google's lawyers now: "Oh, we delivered you our latest
protocol spec sheet, every two weeks as promised. Of course, our
spec changes every two weeks right after we deliver it, and you are
always out of date, and even if you WERE in date, only our software
knows which version anyone is at at any given time, and parsing it
incorrectly means you are wildly wrong, and if you can't provide a
provably correct parser, no court will accept your analysis, etc.
Hey, did we mention that every block is not encrypted, but of course
it is XORed with this value which we calculate with the most crazy
slow algorithm we could find, one million times. That's just this
week though. Next week we are reversing every block, but we aren't
going to update the version number on the wire." <br>
<br>
Just food for thought! ;)<span class="HOEnZb"><font color="#888888"><br>
-dave<br>
<br>
<br>
<br>
</font></span></div>
<br>_______________________________________________<br>
Dailydave mailing list<br>
<a href="mailto:Dailydave@lists.immunityinc.com">Dailydave@lists.immunityinc.com</a><br>
<a href="https://lists.immunityinc.com/mailman/listinfo/dailydave" rel="noreferrer" target="_blank">https://lists.immunityinc.com/mailman/listinfo/dailydave</a><br>
<br></blockquote></div><br></div>