From: eldy <> Date: Wed, 5 Jun 2013 08:18:28 +0000 (+0000) Subject: Complete search engine list X-Git-Tag: AWSTATS_7_2~6 X-Git-Url: http://git.ipfire.org/?a=commitdiff_plain;h=b74996525d9d4bc6214b0a89fa73daa1d404eeeb;p=thirdparty%2FAWStats.git Complete search engine list --- diff --git a/wwwroot/cgi-bin/lib/search_engines.pm b/wwwroot/cgi-bin/lib/search_engines.pm index 480184bd..52907ab9 100644 --- a/wwwroot/cgi-bin/lib/search_engines.pm +++ b/wwwroot/cgi-bin/lib/search_engines.pm @@ -4,6 +4,58 @@ # you must add an entry in SearchEnginesSearchIDOrder, SearchEnginesHashID and in # SearchEnginesHashLib. # An entry if known in SearchEnginesKnownUrl is also welcome. +# +# to eldy: Please check if the following description is correct: +# You need the following information to specify a search engine: +# (a) A regular expression that matches the referrer string of the +# search engine. Unclear: What about slashes in the name of +# a search engine, e.g. as in 'ecosia.com/search'. Seems that +# AWStats will non find search strings containing a slash. +# Maybe use a search string without a slash, and - if necessary - +# an entry in %NotSearchEnginesKeys , if this search string +# matches entries that are not search engines. +# (b) A unique string to identify the search engine within AWStats +# (c) A regular expression that finds the start of the query part in the +# referrer string +# (d) A HTML-fragment that goes into the reports generated by AWStats which +# identifies the search engine to human reader of the report. In the +# simplest case this is a string containing the name of the search +# engine. You can also provide a hypertext clause that presents the +# name together with a link to the search engine. +# +# The regular expression (a) goes into SearchEnginesSearchIDOrder_list1 +# or ..._list2. List 1 contains common search engines, list 2 those +# that are not so often used. +# +# SearchEnginesHashID contains to consecutive entries for each search +# engine: The regular expression (a) followed bei the search engine +# identifier (b) +# +# SearchEnginesKnownUrl specifies how to find the start of the query. +# For each search engine you enter the search engine identifier (b) +# followed by the regular expression (c). Unclear: It is possible to +# omit this entry. If you do this, how will AWStats find the start of +# the query? +# +# SearchEnginesHashLib contains also two entries for each search engine: +# The search engine identifier (b) followed by the HTML-Fragment (d) +# +# There are search engines that do not use a query part in their URLs. +# They put the search expression in the main part of the URL instead. +# AWStats is able to handle these cases. They are specified as described +# above, except the following two things: +# - The regular expression (c) searches the complete URL and not only +# the query part. +# - An additional Entry in the list %SearchEnginesWithKeysNotInQuery is +# necessary. +# +# +# AWStats runs a sanity check of the contents of search_engines.pm. This +# check detects the following things: +# - Inconsistencies (number of entries) +# It does not detect the following errors: +# - If the HTML-Fragment (d) is syntactically incorrect. +# #------------------------------------------------------------------------------ # $Revision$ - $Author$ - $Date$ @@ -14,7 +66,7 @@ # kataweb http://kataweb.it/ # corrected uk looksmart # 'askuk','ask=', 'bbc','q=', 'freeserve','q=', 'looksmart','key=', -# to +# to # 'askuk','ask=', 'bbc','q=', 'freeserve','q=', 'looksmartuk','key=', # corrected spelling # internationnal -> international @@ -38,7 +90,7 @@ # added sify.com (India) # added sogou.com (Cina) [https://sourceforge.net/forum/message.php?msg_id=3501603] # Ask changes: -# - added Ask Japan (ask.jp) +# - added Ask Japan (ask.jp) # - break out Ask new country level variants (DE, ES, FR, IT, NL) # - updated Ask name from Ask Jevees # - added Ask q= parameter - many recent searches probably not recognized; [https://sourceforge.net/forum/message.php?msg_id=3465444] @@ -62,16 +114,16 @@ # added wwweasel.de http://wwweasel.de # added Yahoo Mindset! http://mindset.research.yahoo.com/ # updated Mirago query parameter recognition (qry=); added breakout for each country (France, Germany, Spain, Italy, Norway, Sweden, Denmark, Netherlands, Belgium, Switzerland) -# 2006-05-13 Sean Carlos http://www.antezeta.com/awstats.html +# 2006-05-13 Sean Carlos http://www.antezeta.com/awstats.html # added Google cache IPs 64.233.183.104 & 66.102.7.104 -# 2006-05-20 Sean Carlos http://www.antezeta.com/awstats.html +# 2006-05-20 Sean Carlos http://www.antezeta.com/awstats.html # anzwers.com.au # schoenerbrausen.de http://www.schoenerbrausen.de/ # added Google cache IP 216.239.59.104 # answerbus http://www.answerbus.com/ (does not provide keywords) # 2006-05-23 Sean Carlos http://www.antezeta.com/awstats.html # added Google cache IP 66.102.9.104, 64.233.161.104 -# 2006-06-23 Sean Carlos http://www.antezeta.com/awstats.html +# 2006-06-23 Sean Carlos http://www.antezeta.com/awstats.html # added Alice Search search.alice.it # added GoodSearch http://www.goodsearch.com/ (does not provide keywords) "a Yahoo-powered search engine that donates money to your favorite charity or school each time you search the web" # added googlee.com, variant of Google @@ -87,8 +139,8 @@ # 216\.239\.(51|59)\.104 to 216\.239\.5[0-9]\.104 # 66\.102\.(7|9)\.104 to 66\.102\.[1-9]\.104 # 2006-06-27 Sean Carlos http://www.antezeta.com/awstats.html -# added Onet.pl http://szukaj.onet.pl/ -# corrected name "Wirtualna Polska" from "Szukaj" (search); added link http://szukaj.wp.pl/ +# added Onet.pl http://szukaj.onet.pl/ +# corrected name "Wirtualna Polska" from "Szukaj" (search); added link http://szukaj.wp.pl/ # 2006-06-30 Sean Carlos http://www.antezeta.com/awstats.html # Additional Polish Search Engines: # added Dodaj.pl http://www.dodaj.pl/ @@ -116,9 +168,9 @@ # added filter for google groups. Attempt to parse group name as keyword. -# 2006-09-14 +# 2006-09-14 # added Eniro Sverige http://www.eniro.se/ -# added MyWebSearch http://search.mywebsearch.com/ +# added MyWebSearch http://search.mywebsearch.com/ # added Teecno http://www.teecno.it/ Italian Open Source Search Engine #package AWSSE; @@ -165,8 +217,8 @@ 'googlecom\.com', 'goggle\.co\.hu', '216\.239\.(35|37|39|51)\.100', -'216\.239\.(35|37|39|51)\.101', -'216\.239\.5[0-9]\.104', +'216\.239\.(35|37|39|51)\.101', +'216\.239\.5[0-9]\.104', '64\.233\.1[0-9]{2}\.104', '66\.102\.[1-9]\.104', '66\.249\.93\.104', @@ -188,7 +240,7 @@ 'netscape\.', 'search\.terra\.', 'www\.search\.com', -'search\.sli\.sympatico\.ca', +'search\.sli\.sympatico\.ca', 'excite\.' ); @@ -225,7 +277,7 @@ 'dogpile\.com', 'wisenut\.com', 'ixquick\.com', -'search\.earthlink\.net', +'search\.earthlink\.net', 'i-une\.com', 'blingo\.com', 'centraldatabase\.org', @@ -242,6 +294,15 @@ 'avantfind\.com', 'steadysearch\.com', 'steady-search\.com', +'claro-search\.com', +'www1\.search-results\.com', +'www\.holasearch\.com', +'search\.conduit\.com', +'static\.flipora\.com', +'(?:www[12]?|mixidj)\.delta-search\.com', +'start\.iminent\.com', +'www\.searchmobileonline\.com', +'int\.search-results\.com', # Chello Portals 'chello\.at', 'chello\.be', @@ -254,7 +315,7 @@ 'chello\.se', 'chello\.sk', 'chello', # required as catchall for new countries not yet known -# Mirago +# Mirago 'mirago\.be', 'mirago\.ch', 'mirago\.de', @@ -298,20 +359,24 @@ '\.zhongsou\.com', # zhongsou search portal # Minor czech search engines 'atlas\.cz','seznam\.cz','quick\.cz','centrum\.cz','jyxo\.(cz|com)','najdi\.to','redbox\.cz', -# Minor danish search-engines +'isearch\.avg\.com', +# Minor danish search-engines 'opasia\.dk', 'danielsen\.com', 'sol\.dk', 'jubii\.dk', 'find\.dk', 'edderkoppen\.dk', 'netstjernen\.dk', 'orbis\.dk', 'tyfon\.dk', '1klik\.dk', 'ofir\.dk', # Minor dutch search engines 'ilse\.','vindex\.', # Minor english search engines '(^|\.)ask\.co\.uk','bbc\.co\.uk/cgi-bin/search','ifind\.freeserve','looksmart\.co\.uk','splut\.','spotjockey\.','ukdirectory\.','ukindex\.co\.uk','ukplus\.','searchy\.co\.uk', +'search\.fbdownloader\.com', +'search\.babylon\.com', # Minor finnish search engines 'haku\.www\.fi', # Minor french search engines 'recherche\.aol\.fr','ctrouve\.','francite\.','\.lbb\.org','rechercher\.libertysurf\.fr', 'search[\w\-]+\.free\.fr', 'recherche\.club-internet\.fr', -'toile\.com', 'biglotron\.com', -'mozbot\.fr', +'toile\.com', 'biglotron\.com', +'mozbot\.fr', # Minor german search engines 'sucheaol\.aol\.de', +'o2suche\.aol\.de', 'fireball\.de','infoseek\.de','suche\d?\.web\.de','[a-z]serv\.rrzn\.uni-hannover\.de', 'suchen\.abacho\.de','(brisbane|suche)\.t-online\.de','allesklar\.de','meinestadt\.de', '212\.227\.33\.241', @@ -319,6 +384,12 @@ 'wwweasel\.de', 'netluchs\.de', 'schoenerbrausen\.de', +'suche\.gmx\.net', +'ecosia\.org', +'de\.aolsearch\.com', +'suche\.aol\.de', +'www\.startxxl\.com', +'www\.benefind\.de', # Minor Hungarian search engines 'heureka\.hu','vizsla\.origo\.hu','lapkereso\.hu','goliat\.hu','index\.hu','wahoo\.hu','webmania\.hu','search\.internetto\.hu', 'tango\.hu', @@ -329,6 +400,8 @@ # Minor Italian search engines 'virgilio\.it','arianna\.libero\.it','supereva\.com','kataweb\.it','search\.alice\.it\.master','search\.alice\.it','gotuneed\.com', 'godado','jumpy\.it','shinyseek\.it','teecno\.it', +# Minor Israeli search engines +'search\.genieo\.com', # Minor Japanese search engines 'ask\.jp','sagool\.jp', # Minor Norwegian search engines @@ -459,6 +532,15 @@ 'avantfind\.com','avantfind', 'steadysearch\.com','steadysearch', 'steady-search\.com','steadysearch', +'claro-search\.com','clarosearch', +'www1\.search-results\.com', 'searchresults', +'www\.holasearch\.com', 'holasearch', +'search\.conduit\.com', 'conduit', +'static\.flipora\.com', 'flipora', +'(?:www[12]?|mixidj)\.delta-search\.com', 'delta-search', +'start\.iminent\.com', 'iminent', +'www\.searchmobileonline\.com', 'searchmobileonline', +'int\.search-results\.com', 'nortonsavesearch', # Chello Portals 'chello\.at','chelloat', 'chello\.be','chellobe', @@ -471,7 +553,7 @@ 'chello\.se','chellose', 'chello\.sk','chellosk', 'chello','chellocom', -# Mirago +# Mirago 'mirago\.be','miragobe', 'mirago\.ch','miragoch', 'mirago\.de','miragode', @@ -522,7 +604,8 @@ 'jyxo\.(cz|com)','jyxo', 'najdi\.to','najdi', 'redbox\.cz','redbox', -# Minor danish search-engines +'isearch\.avg\.com', 'avgsearch', +# Minor danish search-engines 'opasia\.dk','opasia', 'danielsen\.com','danielsen', 'sol\.dk','sol', @@ -547,6 +630,8 @@ 'ukindex\.co\.uk','ukindex', 'ukplus\.','ukplus', 'searchy\.co\.uk','searchy', +'search\.fbdownloader\.com','fbdownloader', +'search\.babylon\.com', 'babylon', # Minor finnish search engines 'haku\.www\.fi','haku', # Minor french search engines @@ -562,6 +647,7 @@ 'mozbot\.fr', 'mozbot', # Minor german search engines 'sucheaol\.aol\.de','aolde', +'o2suche\.aol\.de','o2aolde', 'fireball\.de','fireball', 'infoseek\.de','infoseek', 'suche\d?\.web\.de','webde', @@ -575,6 +661,12 @@ 'wwweasel\.de','wwweasel', 'netluchs\.de','netluchs', 'schoenerbrausen\.de','schoenerbrausen', +'suche\.gmx\.net', 'gmxsuche', +'ecosia\.org', 'ecosiasearch', +'de\.aolsearch\.com', 'aolsearch', +'suche\.aol\.de', 'aolsuche', +'www\.startxxl\.com', 'startxxl', +'www\.benefind\.de', 'benefind', # Minor Hungarian search engines 'heureka\.hu','heureka', 'vizsla\.origo\.hu','origo', @@ -601,6 +693,8 @@ 'jumpy\.it','jumpy\.it', 'shinyseek\.it','shinyseek\.it', 'teecno\.it','teecnoit', +# Minor Israeli search engines +'search\.genieo\.com', 'genieo', # Minor Japanese search engines 'ask\.jp','askjp', 'sagool\.jp','sagool', @@ -647,7 +741,8 @@ # List of search engines that store keyword as page instead of query parameter #------------------------------------------------------------------------------ %SearchEnginesWithKeysNotInQuery=( -'a9',1 # www.a9.com/searckey1%20searchkey2 +'a9',1, # www.a9.com/searchkey1%20searchkey2 +'iminent',1 #http://start.iminent.com/StartWeb/1031/toolbox/#q=searchkey1%20searchkey2&additional_arguments ); # SearchEnginesKnownUrl @@ -658,7 +753,7 @@ 'alexa','q=', 'alltheweb','q(|uery)=', 'altavista','q=', -'a9','a9\.com\/', +'a9','a9\.com\/', 'dmoz','search=', 'google_products','(p|q|as_p|as_q)=', 'google_base','(p|q|as_p|as_q)=', @@ -679,7 +774,7 @@ 'search.com','q=', 'yahoo_mindset','p=', 'yahoo','p=', -'sympatico', 'query=', +'sympatico', 'query=', 'excite','search=', # Minor international search engines 'google4counter','(p|q|as_p|as_q)=', @@ -709,7 +804,7 @@ 'spray','string=', 'teoma','q=', 'webcrawler','searchText=', -'wisenut','query=', +'wisenut','query=', 'ixquick', 'query=', 'earthlink', 'q=', 'iune','(keywords|q)=', @@ -728,6 +823,15 @@ 'copernic','web\/', 'avantfind','keywords=', 'steadysearch','w=', +'clarosearch','q=', +'searchresults','q=', +'holasearch', 'q=', +'conduit', 'q=', +'flipora', 'q=', +'delta-search', 'q=', +'iminent', 'q=', +'searchmobileonline', 'q=', +'nortonsavesearch', 'q=', # Chello Portals 'chelloat','q1=', 'chellobe','q1=', @@ -784,6 +888,7 @@ 'vnet','kw=', # Minor czech search engines 'atlas','(searchtext|q)=', 'seznam','(w|q)=', 'quick','query=', 'centrum','q=', 'jyxo','(s|q)=', 'najdi','dotaz=', 'redbox','srch=', +'avgsearch', 'q=', # Minor danish search engines 'opasia','q=', 'danielsen','q=', 'sol','q=', 'jubii','soegeord=', 'finddk','words=', 'edderkoppen','query=', 'orbis','search_field=', '1klik','query=', 'ofir','querytext=', # Minor dutch search engines @@ -791,6 +896,8 @@ # Minor english search engines 'askuk','(ask|q)=', 'bbc','q=', 'freeserve','q=', 'looksmartuk','key=', 'splut','pattern=', 'spotjockey','Search_Keyword=', 'ukindex', 'stext=', 'ukdirectory','k=', 'ukplus','search=', 'searchy', 'search_term=', +'fbdownloader','q=', +'babylon','q=', # Minor finnish search engines 'haku','w=', # Minor french search engines @@ -800,13 +907,20 @@ 'mozbot','q=', # Minor german search engines 'aolde','q=', +'o2aolde', 'q=', 'fireball','q=', 'infoseek','qt=', 'webde','su=', -'abacho','q=', 't-online','q=', +'abacho','q=', 't-online','q=', 'metaspinner','qry=', 'metacrawler_de','qry=', 'wwweasel','q=', 'netluchs','query=', 'schoenerbrausen','q=', +'gmxsuche', 'q=', +'ecosiasearch', 'q=', +'aolsearch', 'q=', +'aolsuche', 'q=', +'startxxl', 'q=', +'benefind', 'q=', # Minor Hungarian search engines 'heureka','heureka=', 'origo','(q|search)=', 'goliat','KERESES=', 'wahoo','q=', 'internetto','searchstr=', 'keresolap_hu','q=', @@ -826,6 +940,8 @@ 'jumpy\.it','searchWord=', 'shinyseek\.it','KEY=', 'teecnoit','q=', +# Minor Israeli search engines +'genieo','q=', # Minor Japanese search engines 'askjp','(ask|q)=', 'sagool','q=', @@ -941,8 +1057,8 @@ 'spray','Spray', 'teoma','Teoma', # Replace 'directhit\.com','DirectHit', 'webcrawler','WebCrawler', -'wisenut','WISENut', -'ixquick','ix quick', +'wisenut','WISENut', +'ixquick','ix quick', 'earthlink', 'Earth Link', 'iune','i-une', 'blingo','Blingo', @@ -959,6 +1075,15 @@ 'copernic','Copernic', 'avantfind','Avantfind', 'steadysearch','Avantfind', +'clarosearch','Claro Search', +'searchresults','Search-results', +'holasearch', 'Hola Search', +'conduit', 'Conduit Search', +'flipora', 'Flipora', +'delta-search', 'Delta Search', +'iminent', 'Iminent', +'searchmobileonline', 'Search Mobile Online (StartApp)', +'nortonsavesearch', 'Norton Safe Search', # Chello Portals 'chelloat','Chello Austria', 'chellobe','Chello Belgium', @@ -1015,16 +1140,19 @@ 'vnet','VNet', # Minor czech search engines 'atlas','Atlas.cz', 'seznam','Seznam', 'quick','Quick.cz', 'centrum','Centrum.cz', 'jyxo','Jyxo.cz', 'najdi','Najdi.to', 'redbox','RedBox.cz', +'avgsearch', 'AVG Secure Search', # Minor danish search-engines 'opasia','Opasia', 'danielsen','Thor (danielsen.com)', 'sol','SOL', 'jubii','Jubii', 'finddk','Find', 'edderkoppen','Edderkoppen', 'netstjernen','Netstjernen', 'orbis','Orbis', 'tyfon','Tyfon', '1klik','1Klik', 'ofir','Ofir', # Minor dutch search engines -'ilse','Ilse','vindex','Vindex\.nl', +'ilse','Ilse','vindex','Vindex\.nl', # Minor english search engines 'askuk','Ask UK', 'bbc','BBC', 'freeserve','Freeserve', 'looksmartuk','Looksmart UK', 'splut','Splut', 'spotjockey','Spotjockey', 'ukdirectory','UK Directory', 'ukindex','UKIndex', 'ukplus','UK Plus', 'searchy','searchy.co.uk', +'fbdownloader','FBDownloader', +'babylon','Babylon', # Minor finnish search engines -'haku','Ihmemaa', +'haku','Ihmemaa', # Minor french search engines 'aolfr','AOL (fr)', 'ctrouve','C\'est trouve', 'francite','Francite', 'lbb', 'LBB', 'libertysurf', 'Libertysurf', 'free', 'Free.fr', 'clubinternet', 'Club-internet', 'toile', 'Toile du Quebec', @@ -1032,14 +1160,21 @@ 'mozbot','Mozbot', # Minor German search engines 'aolde','AOL (de)', +'o2aolde', 'o2 Suche', 'fireball','Fireball', 'infoseek','Infoseek', 'webde','Web.de', -'abacho','Abacho', 't-online','T-Online', -'allesklar','allesklar.de', 'meinestadt','meinestadt.de', +'abacho','Abacho', 't-online','T-Online', +'allesklar','allesklar.de', 'meinestadt','meinestadt.de', 'metaspinner','metaspinner', 'metacrawler_de','metacrawler.de', 'wwweasel','WWWeasel', 'netluchs','Netluchs', 'schoenerbrausen','Schoenerbrausen/', +'gmxsuche', 'GMX Suche', +'ecosiasearch', 'Ecosia Search', +'aolsearch', 'AOL Search', +'aolsuche', 'AOL Suche', +'startxxl', 'StartXXL', +'benefind', 'benefind', # Minor hungarian search engines 'heureka','Heureka', 'origo','Origo-Vizsla', 'lapkereso','Startlapkereso', 'goliat','Goliat', 'indexhu','Index', 'wahoo','Wahoo', 'webmania','webmania.hu', 'internetto','Internetto Kereso', 'tango_hu','Tango', @@ -1059,11 +1194,13 @@ 'jumpy\.it','Jumpy.it', 'shinyseek\.it','Shinyseek.it', 'teecnoit','Teecno', +# Minor Israeli search engines +'genieo','Genieo', # Minor Japanese search engines 'askjp','Ask Japan', 'sagool','Sagool', # Minor Norwegian search engines -'start','start.no', 'eniro','Eniro', +'start','start.no', 'eniro','Eniro', # Minor polish search engines 'wp','Wirtualna Polska', 'onetpl','Onet.pl', @@ -1088,7 +1225,7 @@ # Minor Portuguese search engines 'sapo','Sapo', # Minor Swiss search engines -'searchch', 'search.ch', 'bluewin', 'search.bluewin.ch', +'searchch', 'search.ch', 'bluewin', 'search.bluewin.ch', # Minor Croatian, Serbian, Macedonian, Bosnian and Herzegovinian search engines 'pogodak','Pogodak.com', # Generic search engines