Identity Research Dataset Boost

I’ve got early drafts in for a new tool that will collect larger identity research data.

It basically connects to a server and scans through everything and collects the user profiles before disconnecting.  This should give us about a 60,000 user profile benefit.

Of course, I can’t run it.  I can only develop it and wait for someone to use it.

 

Sneak Peak at new orch features.

Data Feeds Need Control and Reporting Signal Channels in SOA

I’ll expand on this later after I’ve slept but this is the high level design for the next update for tenta (besides some obvious gotchas pertaining to field of vision obfuscation).

This will allow me to control and report on tenta clients so I know what feeds are going where, and even control feed client state.  It’ll also provide some great dashboarding capability.

Update 1:

During the buildout of a new component called “Leptin”, due to being tired and making the perfect typo, I had to drop about 8,000 messages in transport while I rewired the changes in place.  And that’s why you use a pre-prod, and that’s what I get for trying to cut costs with cowboy maneuvers.  More details to follow.

New Standalone: Distributed Endpoint Identity Generation Engine

Introducing DEIGE

I’ve got most pieces of this already built which I’ve been using for testing, but automated identity creation on IRC networks is super easy, even for highly restrictive environments.

They’re not going to change the IRC protocol, so the constraints of IRC protocol are givens.

Nickserv varies from network to network and depends on what services bots they use and configuration of them.  So some components will need to be network-dependent.

Given input:

  • email
  • host (meta, registered vs used)
  • user
  • ident
  • password

This should be able to operate as a single command that connects and does everything.

There are some problems to solve there:

N, R and E are separate hosts.

In addition to that, R and E need to be random and disparate between iterations for a system like this to really work.

The two problems introduced there are:

  • orchestration
  • endpoint creation

My two major pain points in all things.

Endpoint creation for R is relatively easy.  Endpoint creation for E is slightly more complicated as you need to be able to open ports on a host to do that, which requires root access.  For dynamic endpoint creation you’d almost need to generate the OS image and spawn dynamically.

I have an idea that I want to test.

Field of Visibility Change

 

Updated Field Of Visibility

Joins and Parts are Removed from API Returns

The data is still present in the backend so that identity research pages still work, they just aren’t displayed in the log viewer.  There is almost nothing lost by this and plenty gained.

IRCTHULU Logs

Metadata

  • The time the logs resume (correlates to joins in local log)
  • The time the logs stop after a kline (correlates to kline in local log)
  • [any entries present in local logs not present in ircthulu logs — mitigated by random capture delay at client level]

Their Local Logs

Data

  • joins (completely mitigated)
    • user
    • host
    • ident
    • realname
  • registration
    • email (if I reduce this to the last one I’ll be able to confirm they’re tracking users’ emails)
    • host
    • user
    • ident

Metadata

  • age of registration on join
  • profile of joined channels
  • vps provider profile

This Blog

This blog is a great source of information for predictive analytics now.  As it is intended to be.

New Year Trouble

Well, I’ve got good new and bad news.

The operation that was conducted pretty much all day today to break the feedback loop for Freenode and OFTC staff unveiled a minor but critical vulnerability in the data shape produced by tenta.

Now that it’s mostly over,  or mitigated at least, I can reveal the details.

Problem

Technically it’s not a bug, as the issue is in the “negative spaces” in the data that’s created.  When the tenta client joins it currently omits its own user from the logs.  This is actually bad, as, it can be used to root out the runners.  I’ll explain more below.

Certainty

I’ve been able to confirm that this is the method the staff were using to identify the “bait bots”.  I’d originally thought they were processing some server-side information, and I’m sure they did in some cases, but was able to conduct thorough A:B and isolation tests to verify that they are also cross-referencing local logs with presenta logs; this was found by making minor adjustments in their field of vision and then waiting for a bait to hook in a controlled manner repeatedly by comparing page views to klined bots in a predetermined manner after assessing what their visible data points were.  They were processing the joins listed in the presenta logs and also checking for missing user data there, and comparing to local logs.

Impact

This has to be fixed before we can use any more runner data.  I went ahead when I first suspected and deleted random rows from the database early on to obfuscate already existing data so we dont have to lose the whole database but I will not be turning the feeds back on until the next update to tenta.  All pooled data is useless without compromising the runners.

Otherwise, A Relative Success

In other, better news, the staff used approximately 6,086 IP Addresses total during the operation to view the logs.  I think we’ve just about got their loop compromised.

Here is a list of those IP addresses in case you’d like to do something similar if you host a rogue clone of IRCTHULU PRESENTA on your PHP-APACHE server — dropping this in an include should pretty must ghost out the whole TOR network, most vpn’s known for being abused, and almost all the relevant staff’s various proxies and owned IP addresses:

http://paste.silogroup.org/axohacugej.apache

The process for adding them to the ban list was automated about 10 minutes in, but, I needed to disable the banning for a good long stretch or they’d have caught on to what was really going on.  One of them was really smart and added in some well crafted characters to try to slide through a grep and I didn’t see what they were doing until about an hour in — whoever that was knew exactly what what was up.

There will still be some of them that can access, but,  it’s pretty straight forward now.  This will buy plenty of time since I can’t use runners until the Tenta update.  New version of Nerve will accompany to add the feature of clearing out the pooled messages on restart.

I’m pretty excited — this was a total blast.  This whole project’s been like that.

Recap

  • This operation did indeed confirm the OFTC and FNODE network is actively targeting the runners.
  • FNODE and OFTC Feedback Loop is mostly broken so they won’t be able to for much longer.
  • They did my bug testing and risk analysis for me today which identified the vulnerability they’d use to find the runners.
  • Unfortunately it was significant enough that I can’t turn them back on without compromising their identities.
  • I obtained excellent data leverage-able to conduct “further WTF”.  Which I will certainly be doing.

Operation in Progress

Yes, the Feeds are Disabled

You might have noticed I’ve turned the feeds off.

Relax.  Your runners will pool messages until I turn it back on.

I’m conducting a mixed signals operation to ensure you’re protected from your network.  They’re actively targeting some of you.

I’ll flip the switch back on when it’s done.  This needs to happen.  We’ve got to shut off their eyes.

Other News

IRCTHULU just got google indexed.  I need to fix that title for the next crawl.

http://i.imgur.com/ShWCc4J.png

-C

IRCTHULU is LIVE

The Presenta layer is complete.

Presenta is an “example” ui. I hope someone comes along and builds a more robust and featured UI — but it works, and it works on mobile phones.

Official IRCTHULU Log Portal:

presenta.silogroup.org

IRCTHULU ARRIVES

The Example UI is crude, simple, and functional.

The API and UI will be moved to alpha, and will be publicly expo’d later today.

This version should be rather sturdy.

An announcement with url’s will be made shortly after the new services are up.

New Direction for Presenta

I’ve decided to take the Presenta example UI in a different direction.

Since this will be primarily tailored for google-indexed log viewing and identity research, we need to have something more palatable to the index crawlers than would easily be provided by a static html page.

So, I’m moving it to be a drill-down style navigation with breadcrumbs and making it PHP based to speed things up a bit.

High Level Nav Structure

[Log Selector] -> [Log Viewer] -> [Identity Research]

Log Selector

[Network] -> [Channel] -> [ Date Start, Date End]

Log Viewer

[Log]

Identity Research

[Host Research] | [Ident Research] | [User Research]

Host Research

[ Host, Associated Idents[], Associated Users[] ]

Ident Research

[ Ident, Associated Hosts[], Associated Users[] ]

User Research

[ User, Associated Hosts[], Associated Idents[] ]

-C