How much do you know on search engines? Quis Custodiet Ipsos Custodes?

Visit the front pageVisit your profilePublish a blog post

Every day billions of people submit an unimaginable number of queries through Internet search engines.  These powerful instruments have profoundly changed the user’s perception of web content.

Before  search engine popularity, “web portals”, like “DMOZ  Open Directory Project” (http://www.dmoz.org) and “Yahoo! Directory” (http://dir.yahoo.com), competed for global dominance, manually indexing web content into a structured hierarchy of categories.

Today, DMOZ indexes a massive 5 million sites into 1 million cross-linked categories!    Impressive as this is,  according to Netcraft, there are over 190 million active websites, and according to the WorldWideWebSize daily estimate, the Indexed Web contains at least 8.42 billion pages.

It is humanly impossible to manually track and index the content of 8.42 billion pages!   Without third party search-engines, the chances are very low you will find the page you want without spending hours (or days) manually searching hard for it.   Most of us unconsciously trust the results from Google, Bing, and similar search engine services.  But do we know how their algorithms work?  Are we sure that the results are presented in the correct priority order?  And how important is that order?

Let’s start with the importance of Priority and the impact of Advertising.  Google never sells better ranking in their search results, but they present pay-per-click or pay-for-inclusion results before and beside their indexed web-traffic search results.  Google statisticians recently analysed over 5300 websites assessing the impact of Advertising to drive traffic to a website.  They found that ~8 out of 10 clicks to a website would never happen in the absence of related search ad campaigns.  Furthermore, if you ran an advertising campaign for a period of time, and then turned it off, 80 percent  of the visits to your website would be lost.

Position in web-search results matters.  Consequently US online advertising spend reached $26 billion in 2010, with search advertising accounting for 46% of the market.  Total US online spend is projected to reached $42 billion by 2013.

Unless you’re one of the globally recognised names on the Internet (Facebook, CNN, …), no matter how good your product, if you don’t pay the search engines to promote your website, you simply won’t get that much traffic.  If your website is not indexed by a search engine, chances are you will have next to no visitors.

The popular search engines employ complex algorithms.  For example,  Google claims that when a user enters a query, their machines search the index for matching pages and return the results Google believes are most relevant to that user.  Relevancy is determined by over 200 factors (not explicitly listed).  Independent researchers have identified that “relevancy” includes your previous web-surfing habits, geographic location, and so on.  Who guarantees that the proposed results are not deliberately manipulated?

For example, imagine an ill-intentioned party wishes to make your business disappear from web-search queries, dramatically reducing client traffic, and strangling your company’s income flow.   This can be achieved in practice by manipulating the search results generated by a few leading search engines.

There are three different scenarios for manipulating search engines results:

  • client side interference (on your computer),
  • injection of malicious websites pages into popular search engines,
  • direct manipulation of search engine databases

Client side interference

Various methods of search engine attack are well known to malware developers. One of the most common uses of search engine manipulation is to drive traffic to their malicious websites.   Today, many forms of malware that infect computers can locally manipulate search engine results.  In this case, the search-engine sends you “clean” data, and the malware intercepts that data, “infects” it, and then displays the infected search engine results to you.  These infected search results often redirect you to websites containing more malware.  This is is a typical approach used by cyber criminals to recruit machines (like yours) into a botnet without your knowledge or consent.

Injection of malicious websites pages into popular search engines

As discussed in our last article, search engines crawl the web to discover website pages through links.   The search engines then rank the discovered pages based on a variety of criteria, including which other website pages link back to them, and their content.  Unsurprisingly it is possible to distort reality, through web crawlers.

Blackhat search engine optimization (SEO) techniques employ a process in which criminals (and others) trick search engines into ranking their (malicious) web pages high up in the search engine results. According to Sophos, the most popular techniques used by Black hats employ malware to create website pages on popular topics that “appear very attractive” to crawlers:

  • Malicious pages are submitted to web search engines.
  • The search engine’s robots read and index these pages.
  • Users searching for these popular topics receive links to the rogue malware pages high in the search results.
  • A user clicks on one of the rogue links.
  • The malicious website page redirects the user to another malicious website which may try to infect the client’s computer, or do other nefarious things.

Direct manipulation of search engine databases

In this case, search engines vendors may choose, or be coerced, to manipulate their own results.  This may be done with the best intentions, such as to protect people from known fraudulent websites, or websites recently compromised by malware.  However, this is not always the case.  Search engine vendors may have no choice but to comply with censorship requests from local Governments.  In June, Google reported that, in the previous 6 months, it saw an “alarming” increase in requests (over 1,000) from Governments to censor Internet content, including removing items such as YouTube videos and search listings.  The company, which said it complied with more than half the requests, released a catalog of those requests as part of its biannual Global Transparency Report.  Google released this report to maximize transparency around the flow of information related to its tools and services. The document disclose the following info:

a) Real-time and historical traffic to Google services around the world;
b) Numbers of removal requests received from copyright owners or governments;
c) Numbers of user data requests received from government agencies and courts.

"Unfortunately, what we've seen over the past couple years has been troubling, and today is no different," Dorothy Chou, Google's senior policy analyst, said in a blog post. "When we started releasing this data, in 2010, we noticed that government agencies from different countries would sometimes ask us to remove political content that our users had posted on our services. We hoped this was an aberration. But now we know it's not."

Google said it had received 461 court orders for the removal of 6,989 items, consenting to 68 percent of those orders. It also received 546 informal requests, complying with 46 percent of those requests.

"Just like every other time, we've been asked to take down political speech," Chou wrote. "It's alarming not only because free expression is at risk, but because some of these requests come from countries you might not suspect -- western democracies not typically associated with censorship."

The company said it complied with the majority of requests from Thai authorities for the removal of 149 YouTube videos that allegedly insulted the monarchy, a violation of Thailand law.  The Web giant said it also granted U.K. police requests for removal of five YouTube accounts that allegedly promoted terrorism. Google also said it complied with 42 percent of U.S. requests for the removal of 187 pieces of content, most of which were related to harassment.

This type of  interference is potentially the most dangerous cyber threat against search engines.  Search engine companies are increasingly obliged to operate under government imposed constraints, or else they may be banned in those countries.

In particular, the Google study doesn't reflect censorship activity from countries such as China and Iran, which block content without notifying Google.  In this case, we can assume that the censorship is even more rigorous, proactively  censoring political or religious material that is considered to be in any way offensive to the incumbent regime.

Diving deeper into censorship

Those wanting to learn more about censorship by governments should visit the OpenNet Initiative (http://opennet.net/) which monitors restrictions imposed on the Internet.  The OpenNet portal contains data related to the "anomalies" in accessing the network using different axis of analysis, such as geographic location and type of intervention performed (e.g. political, social, conflict / security, Internet tools). The information is produced with detailed reporting, along with maps, making it an attractive graphical representation of the phenomena.

Fortunately it appears that the influence of government on search engine results is only serious in few regions of the planet.  However, this trend could change as Governments seek to gain greater controls.

What has been presented in this article is just a small taste of the complex theme of search engines, powerful tools that can lend themselves to numerous purposes that go beyond their undisputed indexing ability.

I hope that the information provided may lead us to think whenever a search engine offers us a series of results based on our instructions.

Why do you click on the first link a web-search engine provides you?
Is this link really what I asked for? Why have I received these specific answers from the search engine?
Has anyone influenced the results I see: be they private business or governments?
Quis Custodiet Ipsos Custodes?

To search without google adverts use:  https://www.google.com/webhp .
Also try  http://www.mojeek.com/ for a web-server that claims to crawl it’s own web pages and deliberately avoids tracking/profiling it’s users.

About the Authors:
Pierluigi Paganin,  Deep web expert and Security Specialist CISO Bit4ID Srl, a CEH Certified Ethical Hacker, EC Council and Founder of Security Affairs ( http://securityaffairs.co/wordpress ). Pierluigi Paganini is a co-author (with Richard Amores) of the soon to be published book - "The Deep Dark Web: The hidden world" which extensively covers all aspects of the Deep Web.

David Pace is Project Manager of the ICT Gozo Malta Project, and a freelance IT Consultant

Prof. Fabian Martins, ( http://br.linkedin.com/in/fabianmartinssilva )  Banking security expert and Product Development Manager at Scopus Tecnologia, http://www.scopus.com.br/ ) owned by Bradesco Bank Group.

Ron Kelson is Vice Chair of the ICT Gozo Malta Project and CEO of Synaptic Laboratories Limited [email protected] .

Ben Gittins is CTO of Synaptic Laboratories Limited. [email protected]

ICT Gozo Malta is a joint collaboration between the Gozo Business Chamber and Synaptic Labs, part funded in 2011 by the Malta Government, Ministry for Gozo, Eco Gozo Project, and a prize winner in the 2012 Malta Government National Enterprise Support Awards.   www.ictgozomalta.eu links to free cyber awareness resources for all age groups.   To promote Maltese ICT, we encourage all ICT Professionals to register on the ICT GM Skills Register and keep aware of developments, both in Cyber security and other ICT R&D initiatives in Malta and Gozo.   For further details contact David Pace at [email protected] or phone +356 79630221 .

Published by:

Pierluigi Paganini's picture

Name
Pierluigi Paganini

Country
Italy

My website
http://securityaffairs.co/wordpress