Methodology

The Ranking Web of World Hospitals formally and explicitly adheres to the Berlin Principles of Higher Education Institutions. The ultimate aim is the continuous improvement and refinement of the methodologies according to a set of agreed principles of good practices.

0) Background of the project.

The “World Hospitals' ranking on the Web” is an initiative of the Cybermetrics Lab, a research group of the Centro de Información y Documentación (CINDOC), part of the National Research Council (CSIC), the largest public research body in Spain.

Cybermetrics Lab is devoted to the quantitative analysis of the Internet and Web contents specially those related to the processes of generation and scholarly communication of scientific knowledge. This is a new emerging discipline that has been called Cybermetrics (our team developed and publishes the free electronic journal Cybermetrics since 1997) or Webometrics.

Cybermetrics electronic journal scientometrics bibliometrics webometrics

With these rankings we intend to provide extra motivation to researchers worldwide for publishing more and better scientific content on the Web, making it available to colleagues and people wherever they are located.

The "Ranking Web of World Hospitals" is launched in a "Beta" phase, and it is intended that once it reaches its definitive version it will be be updated every 6 months (data collected in January and July and published one month later). The Web indicators used are based and correlated with traditional scientometric and bibliometric indicators and the goal of the project is to convince academic and political communities of the importance of the web publication not only for dissemination of the academic knowledge but for measuring scientific activities, performance and impact too.

A) Purposes and Goals of Rankings

1. Assessment of higher education (processes, and outputs) in the Web. The Web indicators and we are already publishing comparative analysis with similar initiatives. But the current objective of the Ranking is to promote Web publication by Hospitals, evaluating the commitment to the electronic distribution  of these organizations and to fight a very concerning academic digital divide which is evident even among world Hospitals from developed countries. However, even when we do not intend to assess hospital performance solely on the basis of their web output, Ranking Web is measuring a wider range of activities than the current generation of bibliometric indicators that focuses only in the activities of scientific elite.

2. Ranking purpose and target groups. This Ranking is measuring the volume, visibility and impact of the web pages published by Hospitals, with special emphasis in the scientific output (referred papers, conference contributions, pre-prints, monographs, thesis, reports, …) but also taking into account other materials (courseware, seminars or workshops documentation, digital libraries, databases, multimedia, personal pages, …) and the general information on the institution, their departments, research groups or supporting services and people working or attending courses.
There is a direct target group for the Ranking which are the hospital authorities. If the web performance of an institution is below the expected position according to their academic excellence, they should reconsider their web policy, promoting substantial increases in the volume and quality of their electronic publications.
Hospital members are indirect target groups as we expect that in a near future the web information could be as important as other bibliometric and scientometric indicators for the evaluation of the scientific performance of scholars and their research groups.

3. Diversity of institutions: Missions and goals of the institutions. Quality measures for research-oriented institutions, for example, are quite different from those that are appropriate for institutions that provide broad access to underserved communities. Institutions that are being ranked and the experts that inform the ranking process should be consulted often.

4. Information sources and interpretation of the data provided. Access to the Web information is done mainly through search engines. These intermediaries are free, universal, and very powerful even when considering their shortcomings (coverage limitations and biases, lack of transparency, commercial secrets and strategies, irregular behaviour). Search engines are key for measuring visibility and impact of hospitals’ websites.
There are a limited number of sources that can be useful for webometric purposes: 7 general search engines (Google*, Yahoo Search*, Live (MSN) Search*, Exalead*, Ask (Teoma), Gigablast and Alexa) and 2 specialised scientific databases (Google Scholar* and Live Academic). All of them have very large (huge) independent databases, but due to the availability of their data collection procedures (APIs), only those marked with asterisk are used in compiling the Ranking.

5. Linguistic, cultural, economic, and historical contexts. The project intends to have true global coverage, not narrowing the analysis to a few hundreds of institutions (world-class Hospitals) but including as many organizations as possible. The only requirement in our international rankings is having an autonomous web presence with an independent web domain. This approach allows a larger number of institutions to monitor their current ranking and the evolution of this position after adopting specific policies and initiatives. Hospitals in developing countries have the opportunity to know precisely the indicators' threshold that marks the limit of the elite.
Current identified biases of the Ranking includes the traditional linguistic one (more than half of the internet users are English-speaking people), and a new disciplinary one (technology instead of biomedicine is at the moment the hot topic) Since in most cases the infrastructure (web space) and the connectivity to the Internet already exits , the economic factor is not considered a major limitation (at least for the 1.000 Top Hospitals).

B) Design and Weighting of Indicators

6. Methodology used to create the rankings. The unit for analysis is the institutional domain, so only that Hospitals with an independent web domain are considered. If an institution has more than one main domain, two or more entries are used with the different addresses. About 5-10% of the institutions have no independent web presence, most of them located in developing countries. Names and addresses were collected from both national and international sources including among others:

Hospital activity is multi-dimensional and this is reflected in its web presence. So the best way to build the ranking is combining a group of indicators that measures these different aspects. Almind & Ingwersen proposed the first Web indicator, Web Impact Factor (WIF), based on link analysis that combines the number of external inlinks and the number of pages of the website, a ratio of 1:1 between visibility and size. This ratio is used for the ranking but adding two new indicators to the size component: Number of documents, measured from the number of rich files in a web domain, and number of publications being collected by Google Scholar database. As it has been already commented, the four indicators were obtained from the quantitative results provided by the main search engines as follows:

Size (S). Number of pages recovered from four engines: Google, Yahoo, Live Search and Exalead. For each engine, results are log-normalised to 1 for the highest value. Then for each domain, maximum and minimum results are excluded and every institution is assigned a rank according to the combined sum.
Visibility (V). The total number of unique external links received (inlinks) by a site can be only confidently obtained from Yahoo Search, Live Search and Exalead. For each engine, results are log-normalised to 1 for the highest value and then combined to generate the rank.
Rich Files (R). After evaluation of their relevance to academic and publication activities and considering the volume of the different file formats, the following were selected: Adobe Acrobat (.pdf), Microsoft Excel (.xls), Microsoft Word (.doc) and Microsoft Powerpoint (.ppt). These data were extracted using Google and merging the results for each filetype after log-normalising in the same way as described before.
Scholar (Sc). Google Scholar provides the number of papers and citations for each academic domain. These results from the Scholar database represent papers, reports and other academic items.

The four ranks were combined according to a formula where each one has a different weight:

formula

7. Relevance and validity of the indicators. The choice of the indicators was done according to several criteria (see note), some of them trying to catch quality and academic and institutional strengths but others intending to promote web publication and Open Access initiatives. The inclusion of the total number of pages is based on the recognition of a new global market for academic information, so the web is the adequate platform for the internationalization of the institutions. A strong and detailed web presence providing exact descriptions of the structure and activities of the university can attract new students and scholars worldwide . The number of external inlinks received by a domain is a measure that represents visibility and impact of the published material, and although there is a great diversity of motivations for linking, a significant fraction works in a similar way as bibliographic citation. The success of self-archiving and other repositories related initiatives can be roughly represented from rich file and Scholar data. The huge numbers involved with the pdf and doc formats means that not only administrative reports and bureaucratic forms are involved. Excel and Powerpoint files are clearly related to academic activities.

8. Measure outcomes in preference to inputs whenever possible. Data on inputs are relevant as they reflect the general condition of a given establishment and are more frequently available. Measures of outcomes provide a more accurate assessment of the standing and/or quality of a given institution or program. We expect to offer a better balance in the future, but current edition intend to call the attention to incomplete strategies, inadequate policies and bad practices in web publication before attempting a more complete scenario.

9. Weighting the different indicators: Current and future evolution. The current rules for ranking indicators including the described weighting model has been tested and published in scientific papers. More research is still done on this topic, but the final aim is to develop a model that includes additional quantitative data, especially bibliometric and scientometric indicators.

C) Collection and Processing of Data

10. Ethical standards. We identified some relevant biases in the search engines data including under-representation of some countries and languages. As the behaviour is different for each engine, a good practice consists of combining results from several sources. Any other mistake or error is unintentional and it should not affect the credibility of the ranking. Please contact us if you think the ranking is not objective and impartial in any way.
11. Audited and verifiable data. The only source for the data of the Ranking Web is a small set of globally available, free access search engines. All the results can be duplicated according to the describing methodologies taking into account the explosive growth of the web contents, their volatility and the irregular behaviour of the commercial engines.
12. Data collection. Data are collected during the same week, in two consecutive rounds for each strategy, being selected the higher value. Every website under common institutional domain is explored, but no attempt has been done to combine contents or links from different domains.
13. Quality of the ranking processes. After automatic collection of data, positions are checked manually and compared with previous editions. Some of the processes are duplicated and new expertise is added from a variety of sources. Pages that linked to the Ranking Web are explored and comments from blogs and other fora are taken into account. Finally, our mailbox receives a lot of requests and suggestions that are acknowledged individually.
14. Organizational measures to enhance credibility. The ranking results and methodologies are discussed in scientific journals, and presented in international conferences. We expect international advisory or even supervisory bodies to take part in future developments of the ranking.

D) Presentation of Ranking Results

15. Display of data and factors involved. The published tables show all the Web indicators used in a very synthetic and visual way. Rankings are provided not only from a central Top 1000 classification but also considering several regional rankings for comparative purposes.
16. Updating and error reducing. The listings are offered from asp dynamic pages build on several databases that can be corrected when errors or typos are detected.

Coments welcomed

Our group thanks the comments, suggestions and proposals than can be useful for improving this website. We try to maintain an objective position on the quantitative data provided but mistakes can occur. Please, take into account that merging, domain change or networks problems can affect the ranking of the institutions.

Currently the members of our team are Isidro F. AGUILLO, José Luis ORTEGA, Mario FERNÁNDEZ (Webmaster) and Ana UTRILLA.

For more information please contact:

Isidro F. Aguillo
CCHS - CSIC
Albasanz, 26-28
28037 Madrid. SPAIN

Notes:

- Aguillo, I. F.; Granadino, B.; Ortega, J. L.; Prieto, J. A. (2006). Scientific research activity and communication measured with cybermetric indicators. Journal of the American Society for the Information Science and Technology, 57(10): 1296 - 1302.

- Wouters, P.; Reddy, C. & Aguillo, I. F. (2006). On the visibility of information on the Web: an exploratory experimental approach. Research Evaluation, 15(2):107-115.

- Ortega, J L; Aguillo, I.F.; Prieto, JA. (2006). Longitudinal Study of Contents and Elements in the Scientific Web environment. Journal of Information Science, 32(4):344-351.

- Kretschmer, H. & Aguillo, I. F. (2005).New indicators for gender studies in Web networks. Information Processing & Management, 41 (6): 1481-1494.

- Aguillo, I. F.; Granadino, B.; Ortega, J.L. & Prieto, J.A. (2005). What the Internet says about Science. The Scientist, 19(14):10, Jul. 18, 2005.

- Kretschmer, H. & Aguillo, I. F. (2004). Visibility of collaboration on the Web. Scientometrics, 61(3): 405-426.

- Cothey V, Aguillo IF & Arroyo N (2006). Operationalising “Websites”: lexically, semantically or topologically?. Cybermetrics, 10(1): Paper 4. http://www.cindoc.csic.es/cybermetrics/articles/v10i1p4.html