Empirical Study on Crawler Visibility of PDF Documents in Digital Libraries

Weideman, Melius

Proceedings 2010 3rd IEEE International Conference on Computer Science and Information Technology

Weideman, M. 2014. Empirical Study on Crawler Visibility of PDF Documents in Digital Libraries, Proceedings 2010 3rd IEEE International Conference on Computer Science and Information Technology, Chengdu, 2010, July 7-10.

ABSTRACT.
Digital library users might not enter a digital library through homepage menus. As a result, digital library owners should consider the visibility to search engines of stored PDF documents. The aim of this research project was to determine to what extent the visibility of these PDF documents can be improved. In a series of empirical experiments, 100 PDF documents stored on digital libraries were identified an inspected. Searches were done for them and rankings on search engine result pages recorded. The current visibility of these documents was then calculated. After submission to Google, a waiting period was allowed for crawler visitation and the searches repeated. The results of these experiments proved that the visibility of these documents could be improved only marginally. It is therefore concluded that the designers of university digital libraries should consider other alternatives, such as providing text extracts of PDF documents, to enhance the overall visibility of content.
REFERENCES
  1. Weideman, M. "Internet searching as a study aid for information technology and information systems learners at a tertiary level."Unpublished PhD thesis, University of Cape Town, Cape Town, 2001.
  2. Goncalves, M.A., Fox, E.A. and Watson, L.T. "Towards a digital library theory:a formal digital library ontology" Int J Digit Libr 8:91-114, 2008.
  3. Ngindana, M. "Visibility of e-commerce websites to search engines: a comparison between text-based and graphic-based hyperlinks." Unpublished MTech thesis, Cape Peninsula University of Technology, Cape Town, 2006.
  4. Karim, J., Antonellis, I., Ganapathi, V. and Garcia-Molina, H. "A dynamic navigation guide for webpages."Proc. CHI 2009, April 4-9, Boston, MA, 2009.
  5. Ingwersen, P.: "Information retrieval interaction." Taylor Graham, London, 1992.
  6. Mendelson, E. "A PDQ Guide to PDFs." PC Magazine, 30-31, November 6, 2007.
  7. McClure, M. "Vitrium adds sales power to PDFs." Information Today, www.infotoday.com, November 2009.
  8. Donovan, S.K. "A tax on productivity?" Journal of Scholarly Publishing 40(2): 201-205, 2009.
  9. Jacs├│, P. "Tools for unearthing PDF files." Information Today 48-49. www.infotoday.com May, 2001.
  10. Notess, G. "Internet search engine update." 28:5, Sept/Oct, http://www.onlinemag.net, 2004.
  11. Anonymous. "PDF searcher finds technical data." Machine Design, October 26, www.machinedesign.com, 2006.
  12. Yahoo. "What are the different sections on the search results page?" http://help.yahoo.com/l/uk/yahoo/search/basics/basics-23.html?&printer=1#pdf, 2007.
  13. Anonymous.Corporation for National Research Initiatives.http://www.cnri.reston.va.us/, 2010.
  14. Skibinski, P, and Swacha, J. "The efficient storage of text documents in digital libraries." Information Technology and Libraries, September 143-153, 2009.
  15. Hadro, J. "Ebrary offers self-service platform." Library Journal 22. www.libraryjournal.com, 2010.
  16. Weideman, M.: "Website visibility: the theory and practice of improving rankings." Chandos, Oxford, 2009.
  17. Sargolzaei, P. and Soleymani, F. "PageRank problem, survey and future research directions." International Mathematical Forum, 5(19):937-956, 2010.

Full text of Conference Paper No 0026: Empirical Study on Crawler Visibility of PDF Documents in Digital Libraries

Digital Library with full-text of academic publications on website visibility, usability, search engines, information retrieval

Back to Abstracts page