From: eldy <>
Date: Wed, 5 Jun 2013 08:18:28 +0000 (+0000)
Subject: Complete search engine list
X-Git-Tag: AWSTATS_7_2~6
X-Git-Url: http://git.ipfire.org/?a=commitdiff_plain;h=b74996525d9d4bc6214b0a89fa73daa1d404eeeb;p=thirdparty%2FAWStats.git
Complete search engine list
---
diff --git a/wwwroot/cgi-bin/lib/search_engines.pm b/wwwroot/cgi-bin/lib/search_engines.pm
index 480184bd..52907ab9 100644
--- a/wwwroot/cgi-bin/lib/search_engines.pm
+++ b/wwwroot/cgi-bin/lib/search_engines.pm
@@ -4,6 +4,58 @@
# you must add an entry in SearchEnginesSearchIDOrder, SearchEnginesHashID and in
# SearchEnginesHashLib.
# An entry if known in SearchEnginesKnownUrl is also welcome.
+#
+# to eldy: Please check if the following description is correct:
+# You need the following information to specify a search engine:
+# (a) A regular expression that matches the referrer string of the
+# search engine. Unclear: What about slashes in the name of
+# a search engine, e.g. as in 'ecosia.com/search'. Seems that
+# AWStats will non find search strings containing a slash.
+# Maybe use a search string without a slash, and - if necessary -
+# an entry in %NotSearchEnginesKeys , if this search string
+# matches entries that are not search engines.
+# (b) A unique string to identify the search engine within AWStats
+# (c) A regular expression that finds the start of the query part in the
+# referrer string
+# (d) A HTML-fragment that goes into the reports generated by AWStats which
+# identifies the search engine to human reader of the report. In the
+# simplest case this is a string containing the name of the search
+# engine. You can also provide a hypertext clause that presents the
+# name together with a link to the search engine.
+#
+# The regular expression (a) goes into SearchEnginesSearchIDOrder_list1
+# or ..._list2. List 1 contains common search engines, list 2 those
+# that are not so often used.
+#
+# SearchEnginesHashID contains to consecutive entries for each search
+# engine: The regular expression (a) followed bei the search engine
+# identifier (b)
+#
+# SearchEnginesKnownUrl specifies how to find the start of the query.
+# For each search engine you enter the search engine identifier (b)
+# followed by the regular expression (c). Unclear: It is possible to
+# omit this entry. If you do this, how will AWStats find the start of
+# the query?
+#
+# SearchEnginesHashLib contains also two entries for each search engine:
+# The search engine identifier (b) followed by the HTML-Fragment (d)
+#
+# There are search engines that do not use a query part in their URLs.
+# They put the search expression in the main part of the URL instead.
+# AWStats is able to handle these cases. They are specified as described
+# above, except the following two things:
+# - The regular expression (c) searches the complete URL and not only
+# the query part.
+# - An additional Entry in the list %SearchEnginesWithKeysNotInQuery is
+# necessary.
+#
+#
+# AWStats runs a sanity check of the contents of search_engines.pm. This
+# check detects the following things:
+# - Inconsistencies (number of entries)
+# It does not detect the following errors:
+# - If the HTML-Fragment (d) is syntactically incorrect.
+#
#------------------------------------------------------------------------------
# $Revision$ - $Author$ - $Date$
@@ -14,7 +66,7 @@
# kataweb http://kataweb.it/
# corrected uk looksmart
# 'askuk','ask=', 'bbc','q=', 'freeserve','q=', 'looksmart','key=',
-# to
+# to
# 'askuk','ask=', 'bbc','q=', 'freeserve','q=', 'looksmartuk','key=',
# corrected spelling
# internationnal -> international
@@ -38,7 +90,7 @@
# added sify.com (India)
# added sogou.com (Cina) [https://sourceforge.net/forum/message.php?msg_id=3501603]
# Ask changes:
-# - added Ask Japan (ask.jp)
+# - added Ask Japan (ask.jp)
# - break out Ask new country level variants (DE, ES, FR, IT, NL)
# - updated Ask name from Ask Jevees
# - added Ask q= parameter - many recent searches probably not recognized; [https://sourceforge.net/forum/message.php?msg_id=3465444]
@@ -62,16 +114,16 @@
# added wwweasel.de http://wwweasel.de
# added Yahoo Mindset! http://mindset.research.yahoo.com/
# updated Mirago query parameter recognition (qry=); added breakout for each country (France, Germany, Spain, Italy, Norway, Sweden, Denmark, Netherlands, Belgium, Switzerland)
-# 2006-05-13 Sean Carlos http://www.antezeta.com/awstats.html
+# 2006-05-13 Sean Carlos http://www.antezeta.com/awstats.html
# added Google cache IPs 64.233.183.104 & 66.102.7.104
-# 2006-05-20 Sean Carlos http://www.antezeta.com/awstats.html
+# 2006-05-20 Sean Carlos http://www.antezeta.com/awstats.html
# anzwers.com.au
# schoenerbrausen.de http://www.schoenerbrausen.de/
# added Google cache IP 216.239.59.104
# answerbus http://www.answerbus.com/ (does not provide keywords)
# 2006-05-23 Sean Carlos http://www.antezeta.com/awstats.html
# added Google cache IP 66.102.9.104, 64.233.161.104
-# 2006-06-23 Sean Carlos http://www.antezeta.com/awstats.html
+# 2006-06-23 Sean Carlos http://www.antezeta.com/awstats.html
# added Alice Search search.alice.it
# added GoodSearch http://www.goodsearch.com/ (does not provide keywords) "a Yahoo-powered search engine that donates money to your favorite charity or school each time you search the web"
# added googlee.com, variant of Google
@@ -87,8 +139,8 @@
# 216\.239\.(51|59)\.104 to 216\.239\.5[0-9]\.104
# 66\.102\.(7|9)\.104 to 66\.102\.[1-9]\.104
# 2006-06-27 Sean Carlos http://www.antezeta.com/awstats.html
-# added Onet.pl http://szukaj.onet.pl/
-# corrected name "Wirtualna Polska" from "Szukaj" (search); added link http://szukaj.wp.pl/
+# added Onet.pl http://szukaj.onet.pl/
+# corrected name "Wirtualna Polska" from "Szukaj" (search); added link http://szukaj.wp.pl/
# 2006-06-30 Sean Carlos http://www.antezeta.com/awstats.html
# Additional Polish Search Engines:
# added Dodaj.pl http://www.dodaj.pl/
@@ -116,9 +168,9 @@
# added filter for google groups. Attempt to parse group name as keyword.
-# 2006-09-14
+# 2006-09-14
# added Eniro Sverige http://www.eniro.se/
-# added MyWebSearch http://search.mywebsearch.com/
+# added MyWebSearch http://search.mywebsearch.com/
# added Teecno http://www.teecno.it/ Italian Open Source Search Engine
#package AWSSE;
@@ -165,8 +217,8 @@
'googlecom\.com',
'goggle\.co\.hu',
'216\.239\.(35|37|39|51)\.100',
-'216\.239\.(35|37|39|51)\.101',
-'216\.239\.5[0-9]\.104',
+'216\.239\.(35|37|39|51)\.101',
+'216\.239\.5[0-9]\.104',
'64\.233\.1[0-9]{2}\.104',
'66\.102\.[1-9]\.104',
'66\.249\.93\.104',
@@ -188,7 +240,7 @@
'netscape\.',
'search\.terra\.',
'www\.search\.com',
-'search\.sli\.sympatico\.ca',
+'search\.sli\.sympatico\.ca',
'excite\.'
);
@@ -225,7 +277,7 @@
'dogpile\.com',
'wisenut\.com',
'ixquick\.com',
-'search\.earthlink\.net',
+'search\.earthlink\.net',
'i-une\.com',
'blingo\.com',
'centraldatabase\.org',
@@ -242,6 +294,15 @@
'avantfind\.com',
'steadysearch\.com',
'steady-search\.com',
+'claro-search\.com',
+'www1\.search-results\.com',
+'www\.holasearch\.com',
+'search\.conduit\.com',
+'static\.flipora\.com',
+'(?:www[12]?|mixidj)\.delta-search\.com',
+'start\.iminent\.com',
+'www\.searchmobileonline\.com',
+'int\.search-results\.com',
# Chello Portals
'chello\.at',
'chello\.be',
@@ -254,7 +315,7 @@
'chello\.se',
'chello\.sk',
'chello', # required as catchall for new countries not yet known
-# Mirago
+# Mirago
'mirago\.be',
'mirago\.ch',
'mirago\.de',
@@ -298,20 +359,24 @@
'\.zhongsou\.com', # zhongsou search portal
# Minor czech search engines
'atlas\.cz','seznam\.cz','quick\.cz','centrum\.cz','jyxo\.(cz|com)','najdi\.to','redbox\.cz',
-# Minor danish search-engines
+'isearch\.avg\.com',
+# Minor danish search-engines
'opasia\.dk', 'danielsen\.com', 'sol\.dk', 'jubii\.dk', 'find\.dk', 'edderkoppen\.dk', 'netstjernen\.dk', 'orbis\.dk', 'tyfon\.dk', '1klik\.dk', 'ofir\.dk',
# Minor dutch search engines
'ilse\.','vindex\.',
# Minor english search engines
'(^|\.)ask\.co\.uk','bbc\.co\.uk/cgi-bin/search','ifind\.freeserve','looksmart\.co\.uk','splut\.','spotjockey\.','ukdirectory\.','ukindex\.co\.uk','ukplus\.','searchy\.co\.uk',
+'search\.fbdownloader\.com',
+'search\.babylon\.com',
# Minor finnish search engines
'haku\.www\.fi',
# Minor french search engines
'recherche\.aol\.fr','ctrouve\.','francite\.','\.lbb\.org','rechercher\.libertysurf\.fr', 'search[\w\-]+\.free\.fr', 'recherche\.club-internet\.fr',
-'toile\.com', 'biglotron\.com',
-'mozbot\.fr',
+'toile\.com', 'biglotron\.com',
+'mozbot\.fr',
# Minor german search engines
'sucheaol\.aol\.de',
+'o2suche\.aol\.de',
'fireball\.de','infoseek\.de','suche\d?\.web\.de','[a-z]serv\.rrzn\.uni-hannover\.de',
'suchen\.abacho\.de','(brisbane|suche)\.t-online\.de','allesklar\.de','meinestadt\.de',
'212\.227\.33\.241',
@@ -319,6 +384,12 @@
'wwweasel\.de',
'netluchs\.de',
'schoenerbrausen\.de',
+'suche\.gmx\.net',
+'ecosia\.org',
+'de\.aolsearch\.com',
+'suche\.aol\.de',
+'www\.startxxl\.com',
+'www\.benefind\.de',
# Minor Hungarian search engines
'heureka\.hu','vizsla\.origo\.hu','lapkereso\.hu','goliat\.hu','index\.hu','wahoo\.hu','webmania\.hu','search\.internetto\.hu',
'tango\.hu',
@@ -329,6 +400,8 @@
# Minor Italian search engines
'virgilio\.it','arianna\.libero\.it','supereva\.com','kataweb\.it','search\.alice\.it\.master','search\.alice\.it','gotuneed\.com',
'godado','jumpy\.it','shinyseek\.it','teecno\.it',
+# Minor Israeli search engines
+'search\.genieo\.com',
# Minor Japanese search engines
'ask\.jp','sagool\.jp',
# Minor Norwegian search engines
@@ -459,6 +532,15 @@
'avantfind\.com','avantfind',
'steadysearch\.com','steadysearch',
'steady-search\.com','steadysearch',
+'claro-search\.com','clarosearch',
+'www1\.search-results\.com', 'searchresults',
+'www\.holasearch\.com', 'holasearch',
+'search\.conduit\.com', 'conduit',
+'static\.flipora\.com', 'flipora',
+'(?:www[12]?|mixidj)\.delta-search\.com', 'delta-search',
+'start\.iminent\.com', 'iminent',
+'www\.searchmobileonline\.com', 'searchmobileonline',
+'int\.search-results\.com', 'nortonsavesearch',
# Chello Portals
'chello\.at','chelloat',
'chello\.be','chellobe',
@@ -471,7 +553,7 @@
'chello\.se','chellose',
'chello\.sk','chellosk',
'chello','chellocom',
-# Mirago
+# Mirago
'mirago\.be','miragobe',
'mirago\.ch','miragoch',
'mirago\.de','miragode',
@@ -522,7 +604,8 @@
'jyxo\.(cz|com)','jyxo',
'najdi\.to','najdi',
'redbox\.cz','redbox',
-# Minor danish search-engines
+'isearch\.avg\.com', 'avgsearch',
+# Minor danish search-engines
'opasia\.dk','opasia',
'danielsen\.com','danielsen',
'sol\.dk','sol',
@@ -547,6 +630,8 @@
'ukindex\.co\.uk','ukindex',
'ukplus\.','ukplus',
'searchy\.co\.uk','searchy',
+'search\.fbdownloader\.com','fbdownloader',
+'search\.babylon\.com', 'babylon',
# Minor finnish search engines
'haku\.www\.fi','haku',
# Minor french search engines
@@ -562,6 +647,7 @@
'mozbot\.fr', 'mozbot',
# Minor german search engines
'sucheaol\.aol\.de','aolde',
+'o2suche\.aol\.de','o2aolde',
'fireball\.de','fireball',
'infoseek\.de','infoseek',
'suche\d?\.web\.de','webde',
@@ -575,6 +661,12 @@
'wwweasel\.de','wwweasel',
'netluchs\.de','netluchs',
'schoenerbrausen\.de','schoenerbrausen',
+'suche\.gmx\.net', 'gmxsuche',
+'ecosia\.org', 'ecosiasearch',
+'de\.aolsearch\.com', 'aolsearch',
+'suche\.aol\.de', 'aolsuche',
+'www\.startxxl\.com', 'startxxl',
+'www\.benefind\.de', 'benefind',
# Minor Hungarian search engines
'heureka\.hu','heureka',
'vizsla\.origo\.hu','origo',
@@ -601,6 +693,8 @@
'jumpy\.it','jumpy\.it',
'shinyseek\.it','shinyseek\.it',
'teecno\.it','teecnoit',
+# Minor Israeli search engines
+'search\.genieo\.com', 'genieo',
# Minor Japanese search engines
'ask\.jp','askjp',
'sagool\.jp','sagool',
@@ -647,7 +741,8 @@
# List of search engines that store keyword as page instead of query parameter
#------------------------------------------------------------------------------
%SearchEnginesWithKeysNotInQuery=(
-'a9',1 # www.a9.com/searckey1%20searchkey2
+'a9',1, # www.a9.com/searchkey1%20searchkey2
+'iminent',1 #http://start.iminent.com/StartWeb/1031/toolbox/#q=searchkey1%20searchkey2&additional_arguments
);
# SearchEnginesKnownUrl
@@ -658,7 +753,7 @@
'alexa','q=',
'alltheweb','q(|uery)=',
'altavista','q=',
-'a9','a9\.com\/',
+'a9','a9\.com\/',
'dmoz','search=',
'google_products','(p|q|as_p|as_q)=',
'google_base','(p|q|as_p|as_q)=',
@@ -679,7 +774,7 @@
'search.com','q=',
'yahoo_mindset','p=',
'yahoo','p=',
-'sympatico', 'query=',
+'sympatico', 'query=',
'excite','search=',
# Minor international search engines
'google4counter','(p|q|as_p|as_q)=',
@@ -709,7 +804,7 @@
'spray','string=',
'teoma','q=',
'webcrawler','searchText=',
-'wisenut','query=',
+'wisenut','query=',
'ixquick', 'query=',
'earthlink', 'q=',
'iune','(keywords|q)=',
@@ -728,6 +823,15 @@
'copernic','web\/',
'avantfind','keywords=',
'steadysearch','w=',
+'clarosearch','q=',
+'searchresults','q=',
+'holasearch', 'q=',
+'conduit', 'q=',
+'flipora', 'q=',
+'delta-search', 'q=',
+'iminent', 'q=',
+'searchmobileonline', 'q=',
+'nortonsavesearch', 'q=',
# Chello Portals
'chelloat','q1=',
'chellobe','q1=',
@@ -784,6 +888,7 @@
'vnet','kw=',
# Minor czech search engines
'atlas','(searchtext|q)=', 'seznam','(w|q)=', 'quick','query=', 'centrum','q=', 'jyxo','(s|q)=', 'najdi','dotaz=', 'redbox','srch=',
+'avgsearch', 'q=',
# Minor danish search engines
'opasia','q=', 'danielsen','q=', 'sol','q=', 'jubii','soegeord=', 'finddk','words=', 'edderkoppen','query=', 'orbis','search_field=', '1klik','query=', 'ofir','querytext=',
# Minor dutch search engines
@@ -791,6 +896,8 @@
# Minor english search engines
'askuk','(ask|q)=', 'bbc','q=', 'freeserve','q=', 'looksmartuk','key=',
'splut','pattern=', 'spotjockey','Search_Keyword=', 'ukindex', 'stext=', 'ukdirectory','k=', 'ukplus','search=', 'searchy', 'search_term=',
+'fbdownloader','q=',
+'babylon','q=',
# Minor finnish search engines
'haku','w=',
# Minor french search engines
@@ -800,13 +907,20 @@
'mozbot','q=',
# Minor german search engines
'aolde','q=',
+'o2aolde', 'q=',
'fireball','q=', 'infoseek','qt=', 'webde','su=',
-'abacho','q=', 't-online','q=',
+'abacho','q=', 't-online','q=',
'metaspinner','qry=',
'metacrawler_de','qry=',
'wwweasel','q=',
'netluchs','query=',
'schoenerbrausen','q=',
+'gmxsuche', 'q=',
+'ecosiasearch', 'q=',
+'aolsearch', 'q=',
+'aolsuche', 'q=',
+'startxxl', 'q=',
+'benefind', 'q=',
# Minor Hungarian search engines
'heureka','heureka=', 'origo','(q|search)=', 'goliat','KERESES=', 'wahoo','q=', 'internetto','searchstr=',
'keresolap_hu','q=',
@@ -826,6 +940,8 @@
'jumpy\.it','searchWord=',
'shinyseek\.it','KEY=',
'teecnoit','q=',
+# Minor Israeli search engines
+'genieo','q=',
# Minor Japanese search engines
'askjp','(ask|q)=',
'sagool','q=',
@@ -941,8 +1057,8 @@
'spray','Spray',
'teoma','Teoma', # Replace 'directhit\.com','DirectHit',
'webcrawler','WebCrawler',
-'wisenut','WISENut',
-'ixquick','ix quick',
+'wisenut','WISENut',
+'ixquick','ix quick',
'earthlink', 'Earth Link',
'iune','i-une',
'blingo','Blingo',
@@ -959,6 +1075,15 @@
'copernic','Copernic',
'avantfind','Avantfind',
'steadysearch','Avantfind',
+'clarosearch','Claro Search',
+'searchresults','Search-results',
+'holasearch', 'Hola Search',
+'conduit', 'Conduit Search',
+'flipora', 'Flipora',
+'delta-search', 'Delta Search',
+'iminent', 'Iminent',
+'searchmobileonline', 'Search Mobile Online (StartApp)',
+'nortonsavesearch', 'Norton Safe Search',
# Chello Portals
'chelloat','Chello Austria',
'chellobe','Chello Belgium',
@@ -1015,16 +1140,19 @@
'vnet','VNet',
# Minor czech search engines
'atlas','Atlas.cz', 'seznam','Seznam', 'quick','Quick.cz', 'centrum','Centrum.cz', 'jyxo','Jyxo.cz', 'najdi','Najdi.to', 'redbox','RedBox.cz',
+'avgsearch', 'AVG Secure Search',
# Minor danish search-engines
'opasia','Opasia', 'danielsen','Thor (danielsen.com)', 'sol','SOL', 'jubii','Jubii', 'finddk','Find', 'edderkoppen','Edderkoppen', 'netstjernen','Netstjernen', 'orbis','Orbis', 'tyfon','Tyfon', '1klik','1Klik', 'ofir','Ofir',
# Minor dutch search engines
-'ilse','Ilse','vindex','Vindex\.nl',
+'ilse','Ilse','vindex','Vindex\.nl',
# Minor english search engines
'askuk','Ask UK',
'bbc','BBC', 'freeserve','Freeserve', 'looksmartuk','Looksmart UK',
'splut','Splut', 'spotjockey','Spotjockey', 'ukdirectory','UK Directory', 'ukindex','UKIndex', 'ukplus','UK Plus', 'searchy','searchy.co.uk',
+'fbdownloader','FBDownloader',
+'babylon','Babylon',
# Minor finnish search engines
-'haku','Ihmemaa',
+'haku','Ihmemaa',
# Minor french search engines
'aolfr','AOL (fr)', 'ctrouve','C\'est trouve', 'francite','Francite', 'lbb', 'LBB', 'libertysurf', 'Libertysurf', 'free', 'Free.fr', 'clubinternet', 'Club-internet',
'toile', 'Toile du Quebec',
@@ -1032,14 +1160,21 @@
'mozbot','Mozbot',
# Minor German search engines
'aolde','AOL (de)',
+'o2aolde', 'o2 Suche',
'fireball','Fireball', 'infoseek','Infoseek', 'webde','Web.de',
-'abacho','Abacho', 't-online','T-Online',
-'allesklar','allesklar.de', 'meinestadt','meinestadt.de',
+'abacho','Abacho', 't-online','T-Online',
+'allesklar','allesklar.de', 'meinestadt','meinestadt.de',
'metaspinner','metaspinner',
'metacrawler_de','metacrawler.de',
'wwweasel','WWWeasel',
'netluchs','Netluchs',
'schoenerbrausen','Schoenerbrausen/',
+'gmxsuche', 'GMX Suche',
+'ecosiasearch', 'Ecosia Search',
+'aolsearch', 'AOL Search',
+'aolsuche', 'AOL Suche',
+'startxxl', 'StartXXL',
+'benefind', 'benefind',
# Minor hungarian search engines
'heureka','Heureka', 'origo','Origo-Vizsla', 'lapkereso','Startlapkereso', 'goliat','Goliat', 'indexhu','Index', 'wahoo','Wahoo', 'webmania','webmania.hu', 'internetto','Internetto Kereso',
'tango_hu','Tango',
@@ -1059,11 +1194,13 @@
'jumpy\.it','Jumpy.it',
'shinyseek\.it','Shinyseek.it',
'teecnoit','Teecno',
+# Minor Israeli search engines
+'genieo','Genieo',
# Minor Japanese search engines
'askjp','Ask Japan',
'sagool','Sagool',
# Minor Norwegian search engines
-'start','start.no', 'eniro','Eniro',
+'start','start.no', 'eniro','Eniro',
# Minor polish search engines
'wp','Wirtualna Polska',
'onetpl','Onet.pl',
@@ -1088,7 +1225,7 @@
# Minor Portuguese search engines
'sapo','Sapo',
# Minor Swiss search engines
-'searchch', 'search.ch', 'bluewin', 'search.bluewin.ch',
+'searchch', 'search.ch', 'bluewin', 'search.bluewin.ch',
# Minor Croatian, Serbian, Macedonian, Bosnian and Herzegovinian search engines
'pogodak','Pogodak.com',
# Generic search engines