Tracking your cyber footprints

Amitava Nag

In today’s digital world we constantly leave behind digital trails when we carry out transactions online – GPS geo-tagging, swiping our cards, being active on Facebook or connecting to the Internet through mobile phones. By offering and making digital synchronization possible, not only are the telecom service providers increasing their revenues, but more importantly, they are harnessing the nation’s digital data into a phantom store. This circular, cyclical inclusion of individuals as mere agents of data – a creator and hence a consumer – has started slowly but surely in India. There are multiple online retailers who provide additional discounts on retail items when bought through their mobile apps instead of the online websites accessed from a desktop or a laptop computer. This is because of the philosophy that relates the consumer data with individuals who are more than just face-less email ids in the virtual space. Consider the amount of data that is tagged to an individual based on his/her buying patterns, search patterns and social behaviour patterns the moment the person’s phone and mobile numbers are correlated and identified. This provides a 360-degree customer profile which is why you find advertisements of the book you browsed on Amazon on your Facebook wall. And, this is the reason why Gmail, Facebook and the like keep pestering you for your mobile number every now and then.


In this age every click is important since every tap of the mouse creates a transaction and in a way, defines the user’s behaviour.

However, apart from this structured data stream in the form of digital trail, Big Data also includes unstructured organizational data lying in physical documents or even the digitized data in legacy systems. More importantly, the term not only refers to the data itself in terms of its magnanimous volume (in the order of Petabytes (1024 Terabytes i.e 1024 x 1024 Gigabytes) or Exabytes (1024 Petabytes)) but also to the techniques and technologies that may be used to manipulate, dictate and derive information from available data. A formal definition of the term will involve the three Vs:

Volume – In typical Business Intelligence applications fetching data in bulk from multiple systems and correlating them for insightful reports, patterns or trends has been a problem now being tackled by new technologies, e.g., Hadoop, Oracle Exadata, etc.

Velocity – Systems today are built to capture process and data in near real time from RFID (Radio-Frequency Identification) tags, Twitter handles, digital sensors, smart meters and so on. The speed with which torrents of data are bombarding an organization has led to technologies that rely on real-time processing of data.

Variety – As mentioned above, data comes from disparate sources with differing latencies and with a staggering range of different formats. To standardize them in similar units in order to correlate them to derive useful information is again a big challenge.

It has to be understood that data (and hence information and ultimately knowledge) is now looked upon as the newest and most powerful asset for an organization to hold a competitive advantage and edge over its rivals. At a more individuated scope, the more data a person can have, the more armed he/she is and the more accurate his/her decision is expected to be.

Shaping Entertainment using Big Data
bollywood-formula-decoded Big Data has been revolutionizing sports and entertainment in more ways than one. On the one hand, in every sport, the amount of data for a player is immense, tracking the player to the minutest details so that summarized strengths or weaknesses can be better analyzed. Consider the fact that in cricket, data for every ball faced by a batsman is being stored and analyzed just as data for every ball bowled by every bowler is maintained as well along with associated parameters to provide meaningful information about a particular bowler’s performance across different formats, pitches, innings, climatic conditions, oppositions, so on and so forth. The Indian Premier League cricket teams use information from this detailed data to finalize the players they will bid for during the auction.

On the other, Big Data is being utilized in a big way to drive and influence audience sentiments. This leads to marketing and sales initiatives resulting in more revenue and beefed up profit for the teams or the sporting authorities arranging the events.

International IT giant IBM worked closely with a media company and designed and executed predictive models on the social buzz for the film Ram Leela in 2013. As per reports, IBM predicted nearly a 73% success for the film based on right selection of cities. The data that IBM churned out was from social networking data, mainly ‘social sentiment’ captured from Twitter messages or otherwise. Another film in the same year, Chennai Express became one of the biggest hits, thanks not only to Shah Rukh Khan but also to the digital marketing campaigns based on audience sentiment and behavioural patterns and preferences. The IT services company, Persistent Systems, working for Chennai Express analyzed over 1 billion cumulative impressions from over 750 thousand tweets during a 90-day campaign period. Persistent Systems CEO Siddhesh Bhobe later commented, “Shah Rukh Khan and the success of Chennai Express have proved that social media is the channel of the future and that it presents unique opportunities to marketers and brands, at an unbeatable ROI (return on investment).”

The paradigm shift has been two fold – first, a film’s review used to be dictated by a few film critics which transformed to everyone’s rating of it irrespective of his/her strata in the society – film criticism seem to have become a democratic right of every citizen! Second, the box-office used to be the only quality gate that determined the success or failure of a film. With analytics on Big Data, film makers and producers are now in a position to analyze the sentiments of viewers in real time which helps them market the film in customized ways to guarantee more profit margins.

Hollywood started using Big Data as a tool to generate a marketing strategy based on audience sentiments and behavioural pattern much earlier. Even Netflix’s (which started off as a video renting company) original show The House Of Cards was commissioned based on the results of customer preferences and streaming habits. It has to be borne in mind that to arrive at this statistical inference, Netflix did slice and dice the attributes of individual films into more than 70,000 characteristics! This micro-granular matrix could differentiate between minor preference disparities.

Looking forward
Richard Maraschi, Global Solutions Leader, Advanced Analytics, IBM commented, “Lots of people can tell you [audience] sentiment and use keywords and see what the sentiment is based on that. We can do what we call deep sentiment analysis, which is parsing and categorizing sentiment into its features. If you think about a film, it would be certain characters, the music, certain plot elements, a feeling about the film, a scene in the film, etc. You can start to get down to that level and then tie that back to who said [what on social media], what kind of people said that. Was that on target on our audience? Is there a new audience that’s talking about this film?”

So, what does the future hold? Notwithstanding the risk of being labelled as cynical, it can be mentioned that ‘big’ data in entertainment can be counter-productive. Whereas machine intelligence churned out of trillions of data is of help in medical science to get rid of human errors, the same philosophy when applied for artistic creation can be dangerous. Precisely because, unlike science, art demands differences, celebrates the imperfections of individuals and thrives on human frailties and flaws. If audience sentiments dictate the subject matter of the creative expression or even design and define the script, we will end up in a closed system since there will not be any chance of evolution of the audience preferences. The same set of preferences will continue to be fine-tuned based on further sets of data whose basic premise remains unaltered.

In the broader scale, as individuals, we need to take a stance as to whether we can resist falling prey to this overt digitalization of our human existences. It is perennially tiring to secrete a digital watermark throughout our breathing cycle at home and in today’s Smart Cities. This digital existence sums up as Big Data which looks at us as consumers primarily. And like all capitalist machineries that treat individuals as mere consumers, Big Data will also extract optimal value out of us – through background data collection and also by trapping us with the lure of consumer commodities including sports and cinema.

Today’s social life for most of us is more virtual than physical. Most of us are constantly logged into Facebook, Google Hangouts, WhatsApp and what not. Unfortunately this ease of communication with anyone on the ‘network’ exposes us to the risks of proprietary, sensitive and our personal information being intercepted.

There are specific software which can read the contents of our communication – a reason most governments across the globe have problems with encrypted messaging systems. If this tapping of communication helps governments to potentially nab terrorist activities, for common citizens it may amount to phishing and cyber stalking. This is because in our journey on the Internet, we leave, unknowingly, digital traces within the browser history, the machine RAM and cache, or even the storage devices. Even when most websites offer a confirmation for storing cookies, they are, in effect storing our browsing patterns, our searching logic. While this helps the visited websites to provide us with the information that we wish to consume, the fact that our browsing patterns are stored means that these website can preserve our online behaviour for future use. There are a number of websites which offer us a “convenience” feature of logging in using our Facebook accounts. If we do so we always find specific advertisements within Facebook based on our browsing behaviour. You may be happy at the convenience of this feature and all praise for the way technology can work, but know that a lot of information about you like your chat history, the images you share, the Facebook Likes you tick will be easily accessible to websites on the Internet.

There are a few anonymous communication softwares that use a mesh of distributed network by bouncing the communication and preventing network watchers from tracing the sites we visit and our geo-physical location. ‘The Onion Router’ (famously known as Tor) is one such software which is gaining huge popularity because of the supposed anonymity being preserved using the corresponding web browser.

In brief, the digital trail that we leave behind inadvertently is not easy to clean. It is better to keep in mind that sharing too much about ourselves may land us in trouble.

Points for Discussion

Teachers may use these points to set up a discussion with students based on the article.

  1. What are the 3 V’s of Big Data?
  2. Students can be asked to give real life examples of digital traces left by their browsing habits.
  3. Theoretically, how can one try to minimise digital imprints?
  4. Death of the film critic – what will be the future of film/art criticism look like when everyone has an option to voice his/her opinion.
  5. Will the outcomes of any sport become more predictable with more ‘knowledge’ about teams/players?

The author is a writer and film critic residing in Kolkata. He can be reached at

Leave a Reply