<div dir="ltr">Previously Unreleased Work: <a href="https://docs.google.com/presentation/d/1tMlJvnUv_Qbh5mx2RYbyuTHTHr9c9ShIKBzz_JDGn_s/edit?usp=sharing">https://docs.google.com/presentation/d/1tMlJvnUv_Qbh5mx2RYbyuTHTHr9c9ShIKBzz_JDGn_s/edit?usp=sharing</a><div>Paper on the 3M Tweets from Clemson: <a href="https://www.cyxtera.com/blog/data-carving-the-internet-research-agency-tweets">https://www.cyxtera.com/blog/data-carving-the-internet-research-agency-tweets</a></div><div><br></div><div>So what you see a lot in some papers is this sort of thing (this one is from the original Clemson paper):</div><div><img src="cid:1651f44d4b0cb971f161" alt="image.png" class="" style="max-width: 100%; opacity: 1;">I always get flashbacks of that XKCD <a href="https://xkcd.com/552/">Correlation vs Causation comic</a> when I see this type of graphic. The point of these is to say "Hey, there is increased activity from the Internet Research Agency (IREA) before the FSB/GRU sends their data to Wikileaks. And on one hand: Maybe. The flow of information is fun to track. But in addition to metadata like count(number_of_tweets) you also have the CONTENT of the Tweets available! There's an infinite number of real world events you can attach with lines to a histogram to associate with various curves if you're ignoring the content. Using NLP techniques to analyze the tweets at various points helps us make better conjectures about the flow of information through an adversaries IO systems.</div><div><br></div><div>I call this kind of thing "data carving" because at some level you are spelunking about through a massive dataset, like a radiologist taking slices of imagery from a human body and trying to build systemic knowledge by staring at them long enough. It's a logic puzzle attached to a visualization problem attached to natural language parsing and machine learning science, and if you're doing it right, linguistic, cultural and domain specific knowledge as well. </div><div><br></div><div>My blogpost comes across very much as a Brainspace ad but Brainspace happens to be a great tool for data carving and basically nobody in our community has ever heard of it, the way most penetration testers have never heard of <a href="http://www.fpx.de/fp/Software/UUDeview/">uudeview</a>. I would have killed for something like Brainspace back when I did more analysis work. </div><div><br></div><div>When you do data carving right, you find things that SURPRISE you. We found lots of examples of attempts to do things remotely like setting up protests and even joining AN ONLINE ARMY they called the US Freedom Army. Check out this response on the<a href="https://twitter.com/MiasmRe/status/1027568331656765443"> survivalist website</a> to it. <br></div><div><br></div><div>But if you read our previous work (linked above) we've also seen bot accounts that are virtually indistinguishable from real people until you go try to verify them on LinkedIn. We've also found massive client-side exploitation networks. And it's fun to read the sites like <a href="https://off-guardian.org/">Off Guardian</a> and see how they all fit into the picture. So much work in this propaganda edifice. Even if it mostly is creating poisonous confections it can still have beauty.  </div><div><br></div><div>Oh, and here's another dark secret: I'm a pretty active advertiser on Twitter, for various reasons. Twitter charges advertisers per "interaction". A lot of these fake accounts click a lot of ads. At some level, they can bribe Twitter to ignore their manipulation of the conversation and community. If we want to change the effect of Information Operations we can't be hunting botnets and manipulators - we have to change the economic incentives. Engaging the policy community on that should be...fun.</div><div><br></div><div>-dave</div><div><br></div><div><br></div><div><br></div><div><br></div><div><br><div><br></div></div></div>