A Search Log Analysis of a Portuguese Web Search
Transcrição
A Search Log Analysis of a Portuguese Web Search
A Search Log Analysis of a Portuguese Web Search Engine Miguel Costa, Mário J. Silva LaSIGE @ Faculty of Sciences, University of Lisbon Foundation for National Scientific Computing INFORUM 2010, Braga, Portugal Problem Do Portuguese users search in the same way as other users? Does search behavior influence web search engine design? 2/20 Applications • Speed – e.g. special indexes, cache • Quality of results – e.g. better ranking • Web design – e.g. stand out most used functionalities 3/20 Summary • Introduction • Methodology & Dataset • Results • Conclusions 4/20 Search Log Analysis PROS: • Large and varied • Less bias • Cheap • Non-intrusive CONS: • Lack of context • Lack of control 5/20 Dataset • Tumba – http://www.tumba.pt • 2 full years – 2003 & 2004 – several studies from the same period – baseline for future works • 90% of the IP addresses → Portugal • 98% of the interactions → Portuguese interface 6/20 How do users search? 7/20 How do users search Session Duration (min) 50% 40% 30% 20% 10% 0% [240,inf[ [180,240[ [120,180[ [60,120[ [30,60[ [15,30[ [10,15[ [5,10[ [1,5[ 80% [0,1[ Fast and short sessions • Fast • Few queries • Few terms • Few result pages • Few clicks 8/20 How do users search Queries per Session 50% 40% 30% 20% 10% 0% 87% # Terms Changed 82% ≤-5 -4 -3 -2 -1 0 1 2 3 4 ≥5 50% 40% 30% 20% 10% 0% 1 2 3 4 5 6 7 8 9 ≥10 Result Page Viewed 100% 80% 60% 40% 20% 0% 1 2 3 4 5 6 7 8 9 ≥10 75% 1 2 3 4 5 6 7 8 9 ≥10 50% 40% 30% 20% 10% 0% Terms per Query Evolution from 2003 to 2004 • -½ term of query length • +10% of sessions with less than 1 minute • +9% of sessions with only one query • +8% of sessions where only the first result page was viewed 10/20 Evolution from 2003 to 2004 • -½ term of query length • +10% of sessions with less than 1 minute • +9% of sessions with only one query • +8% of sessions where only the first result page was viewed Less data submitted, less results seen 11/20 What do users search for? 12/20 Top Search Queries 13/20 Top Search Queries 14/20 Topic Categories Categories 1 2 3 4 5 6 7 8 9 10 11 Commerce, Travel, Employment or Economy People, Places or Things Health or Sciences Education or Humanities Society, Culture, Ethnicity or Religion Computers or Internet Sex or Pornography Entertainment or Recreation Government Performing or Fine arts Unknown or Other 2003 2004 ∆% % queries % queries 22.4 14.8 10.5 7.2 5.6 6.4 4.9 8.7 7.0 1.6 11.2 20.3 17.7 11.8 10.5 6.1 5.9 5.8 5.1 4.2 1.6 11.3 - 2.1 2.9 1.3 3.3 0.5 - 0.5 0.9 - 3.6 - 2.8 0.0 0.1 15/20 Comparison world region search engine U.S. Excite Europe FAST Portugal Tumba! single term queries terms per query result pages viewed queries per session topic most seen 20% -30% 2.6 1.7 2.3 Commerce, Travel 25% -35% 2.3 2.2 2.9 People, Places 40% 2.2 1.4 2.49 -2.94 Commerce, Travel Less data submitted, less results seen 16/20 Conclusions 17/20 Conclusions • Portuguese users – spend little time and effort on individual searches – tend to submit less data and see less results – search differently than other users – specificities can be used to tune web search engines 18/20 Future Work • Updated characterization of Portuguese users • Characterization of Portuguese users from web archives 19/20 Portuguese Web Archive http://archive.pt 80% of the web documents are unavailable after 1 year 20/20 Questions Thank you.
Documentos relacionados
Europass Curriculum Vitae
Hardware and Network Administrator, Multimedia Developer Web Designer, 3D Modeling, Multimedia Creation, Graphic Design, Leaflet Design, Network and Hardware Administration
Leia maisThis is a title
for data collection. For each unlock method users learned or configured their code and tried it out until they were confident that it was memorized. The observer was then called to observe above th...
Leia mais