Espionage and the Covert Art of Data Warehouse Management

data warehouse testing

I don’t know how the secret agent world of intelligence works, but I do know how data warehouses work, and I know how secret agents work in the movies.  So let’s see what happens if I make the “logical connections.”

I am a fictional secret agent who works for MI6.  I have just heard a foreign agent refer to an upcoming event as “Operation Grand Slam.”  I know that the word “Operation” was in front, so we’re not talking about a “Grand Slam” in baseball, tennis, or even a Denny’s menu.  We are talking about some covert action that is going to take place in the near future, and lives may be at risk!  If you know your movies, you know that the plot will involve the fate of all of the gold in Fort Knox and a very destructive weapon (I am not trying to spoil a 52-year-old movie today – I’ll just say that the climax is shocking, to which I’ll tip my hat).

Let’s bring this to the modern day so we can add in our data warehouse knowledge to assist.  You Google this term, and immediately find the movie reference.  The End.  Or is it?  No, the covert name will be something different today, something that fails on a Google search.  Fortunately, you’ve got access to the Utah Data Center, the world’s largest repository of intelligence material.  And data warehouse testing is what you’ll need to solve this dilemma.  But you can’t search a large collection of audio files easily, so there has to be another way.  An easier way to parse the data before we ever ask to generate a report from queried data.  And let me tell you what it is.

The old way of building a data warehouse was to use ETL.  The E and L are not particularly exciting here – they just move the data from one place to another in the same form.  But the T, that’s exciting.  That’s where the magic happens.  T stands for Transform.  And that’s what makes it possible to find that phrase easily.  I was once talking to a headhunter – I mean career placement specialist – who told me that my resume would be scanned to have text pulled from it, so that the .doc or .docx would be irrelevant.  Part of the Transform here will involve a similar process, one aimed at extracting flat text from a file in a different format – in this case an audio file, the same way that Siri can pull real words from audio today.

To get the details of the actual spoken content of a phone call, you need to do one of 2 things: tap the line (if you are using POTS), or copy the assembled packets (if you are using VOIP).  POTS landlines are rapidly disappearing, limiting the need for old-fashioned line-tapping.  To get the metadata, you simply need for the carrier to be required by federal law to push call data toward your aggregation center, to help tag your voice packet collection audio files.  The aggregator then cleanses the data through this Transformation procedure we were just talking about, so that we have a flat text file to scan.  We still might want to hold onto the original audio file for playback at a later time, so we can say, “That’s the voice of the person we are looking for.”

Perhaps the federal government also requires data pushes from other methods of VOIP or text communication, like Skype or FaceTime or gotomeeting or IM or email (pulls would cause too much latency in the communications system, and we can’t shut down communication without someone getting suspicious).  I say perhaps – I have no official knowledge here of what the U.S. government has access to.  I am only saying what I would do if I had ultimate control and wanted this end-goal of communication data collection.  And if you know me, you know how much I would enjoy having ultimate control.  Or maybe my tin-foil hat is pinching my brain too much and requires adjustment.

The point is that we know what we have to do.  We have collected and stored lots of information.  We filter, if needed, by using a Transform so that it is in a flat text form, which is well-designed for querying at a later time.  We give ourselves the ability to query a phrase from our collected flat text.  We use this to generate a report of all of the text matches for things that contain the danger phrase we seek.  The report contains links back to the original audio files or audio script of the conversation, for more subtle analysis.  We sort our report by date, so we can track the genesis of the topic and walk through the later conversations.  All sewn up rather tidily, wouldn’t you say?  All that’s left for us to do now is to send out our best agents out to apprehend the scofflaws, now that we have uncovered their nefarious plot.  And we have the intelligence gathered by our ginormous data warehouse to thank.  Well done everyone, good show!  On to your next assignment …

Similar Articles

Dynamics CRM

It's one of the keystones, basic but key in the successful highly competitive modern business environment, where the connection with the customer is a must.

Python for small scale businesses

The speed of progress in the modern business landscape is quite relentless. For small-scale companies, this implies that keeping up with this progress is not simply gainful but fundamentally significant for their survival. And what does success in such an environment demand?

Benefits of Power Automate for the Finance Industry

The finance sector needs to battle many difficulties in the modern and quick-moving digital landscape. Be it exploring the unpredictable snare of official guidelines or overseeing tremendous volumes of data - - financial establishments are feeling the pressure to succeed. This demanding environment, in turn, often leads to exhausted teams, costly manual errors, and inefficiencies that can be chalked up to repetitive tasks

digital transformation

The manufacturing industry, vital to the world economy, is at a pivotal intersection. I mean that, yet again, changes are afoot in the sector, this time driven by digital transformation as it represents a profound change in the very essence of how manufacturers operate, think, and drive innovation.

How Can Payment Gateways Benefit the Travel Industry

Technology helps make things easier and faster. Digitization is one of the aspects of technology that has changed how we live and work. It has brought many benefits for businesses, especially the travel industry. Customers can search online for the schemes offered and easily book trips, but payments need to be completed with ease.

DataOps

In an article published by The Economist in 2017, while describing the astounding growth of titan companies like Google, Apple, Facebook, and Microsoft, it was mentioned how data had become “the oil of the digital era.”

The Impact of AR & VR on the Media and Entertainment Industry

Harnessing the latest technology to create and distribute content is an ongoing process in the media and entertainment industry. Changes in consumer behavior and demands, along with continuous and rapid technological advancements, are reshaping the industry

Fleet Management: Common Hurdles and Their Solutions

In the modern, dynamic business environment, companies across the broad spectrum of sectors have become heavily dependent on vehicle fleets to sustain their activities. Whether it involves delivering crucial supplies, ferrying passengers, or supporting field service crews, effective fleet management is a fundamental pillar for success

Best Practices of Cloud Computing for Digital Transformation

It has been for everyone to see that we live in a rapidly evolving digital environment. It is also amply obvious that staying competitive in such a market is not just advantageous -- it is a must. To this end organizations across different industries are progressively embracing cloud computing as well as the extraordinary potential it brings along.