An internet search engine or web search engine is a software system which is used to perform internet search (web search). Search engine crawl the internet for specific information relevant to the textual query placed in the web search. Search engine results pages (SERPs) refer to the line of results which are displayed in response to a search result. The information can be of various forms, like web page links, infographics, videos, images, research papers, articles, and other file types. Data stored inside open directories and databases can be mined by search engines .Deep web refers to the content which cannot be searched by an internet search engine.
Search Engine Approaches:
A search engine uses following three methods to perform its function:
- Web crawling
Indexing refers to associating tokens and words located on web pages with their HTML-based fields and domain names, on a public database. These associations are later made available for search queries. The user search query can be single or multiple words, or a sentence. Through indexing, information relevant to a search query can be found very quickly. Techniques used for caching, and indexing are trade secrets.
Inquirer receives cached page versions stored on the working memory of the search engine between spider visits. Search engine behaves like a web proxy, in case a visit is overdue. In such instance the search terms indexed differ from the pages. The cached pages relate to the previously indexed words. This makes the cached version of pages valuable in case the actual pages are lost. This issue is regarded as a mild form of linkrot.
User typically enters a few keywords for performing query in a search engine. Web sites which contain the specified keyword are immediately obtained from the index. The actual processing load is the generation of site pages relevant to the search result list. Each page in the search result list is weighted using the information contained in the indexes. The top search result items require reconstruction, markup, and lookup of snippets showing the keywords context matched.
Search engine provide its own search parameters, like command-driven operator or GUI beyond simple keyword lookups for refining search results. Users engaged in feedback loop, created by weighting and filtering, are provided controls with refined search results. Since 2007, Google.com allowed users to filter results by date by accessing the “Show search tools” and choosing the appropriate date range. Modification time of pages allow user to weight by date. User search query is refined in most search engines through the Boolean operators OR, NOT, and AND. The Boolean operator allows user to extend the search terms for result refinement. The entered search word or sentence is searched by the search engine. Proximity search feature of an engine allows one to define keyword distance. The entered keyword or phrase is searched on a webpage through statistical analysis using concept-based searching. Ask.com allows users to search for queries in the same way one asks a human for a question.
Search engines showing relevant results are more useful to a user. Out of millions of site pages containing the specified word or sentence, only a few of those pages are more authoritative, popular, or relevant than others. Search engines use strategies to show the “best” result at the top by ranking the pages. Different search engines employ different strategies to decide the order and the best match for the pages. These strategies change as new techniques evolve and internet usage changes. Two types of search engines have evolved: first is a system of hierarchically and predefined ordered keywords programmed by humans. Second is a system which generates “inverted index” by analyzing the located texts. Computes does the major load/bulk of the work in the first form.
Advertising revenue support most commercially ventured internet search engines, and for a fee advertisers can have their listing rank higher. Engines generate revenue by running search related ads. Every time a user clicks on the displayed ad, the search engine generates revenue.
Google is the most popular and widely used search engine in the world, with a market share of 92.96%, followed by Bing, with a market share of 2.34%, Yahoo!, with a market share of 1.64%, Baidu with a market share of 0.92%, YANDEX RU with a market share of 0.47%, and finally DuckDuckGo with a market share of only 0.43%.
Local business efforts are optimized using local search. All searches are made consistent by focusing on change. It’s crucial because people make their traveling and buying decisions based on their search results.
Search engine bias:
Internet search engines rank sites based on their relevancy and popularity. Studies suggest various economic, social and political biases in the information provided by engines. Political, economic and commercial processes result in biases observed in search engine results. For instance, Google will not show search results for neo-Nazi sites in Germany and France, because Holocaust denial is not legal there.
Social processes can also cause biases in search results, because non-normative viewpoints are exclude in favor of “popular” results by search engine algorithms. Major Search engine algorithms are biased toward non-U.S.-based sites, and rank U.S.-based websites better.
Filtered bubbles and customized results:
Search engines like Bing and Google display customized results by analyzing a person’s previous history. This phenomenon is referred to as a filter bubble. Search engines use algorithms to guess the content users are searching for based on their previous and current activities (like location, search history, and past click behaviors). Information resembling the previously searched content is shown to the user. Users get intellectually isolated, and do not receive contrary information. Facebook’s personalized news stream and Google’s personalized search result are prime example of this. Eli Pariser illustrated this with an example in which two users searched for “BP” on Google, and both received different result on performing their search. One obtained search result about “Deepwater Horizon oil spill”, whereas the other obtained result about “British Petroleum”. Pariser suggested that bubble effect has negative implication for civil discourse. To avoid the bubble effect, some search engines like DuckDuckGo do not track their user’s search activity. Some scholars contradict Pariser’s view, on grounds of lack of convincing evidence.
Search engine submission:
Webmasters submits their sites on a search engine via process called Web search engine submission. This is usually done to promote a website, and is usually not required because the engine web crawler can find most sites on the net without any assistance. Either a single webpage or the entire site can be submitted using sitemap. However, it is often necessary to submit only the home page of a site, because well designed websites can be easily crawled by search engines. Web pages or web sites are submitted to a search engine for two main reasons: to add a totally new site and not wait for the search engine to locate it, and to update the sites record after a redesign.
Search engine submission software adds sites on multiple engines, along with the site links from their own pages. This is very crucial because external links are important for increasing a site’s ranking. However, site might get unhealthy number of unnatural links which can lower a site ranking.