ORCID Imagery • June 2018
From January to July of this year, I had the opportunity to work on a project geolocating and disambiguating institutions within the ORCID dataset publicly available at DataDryad. While the overall project produced a disambiguated dataset, the following is a series of images created using the raw geolocated data in Google Earth. Each marker represents a node, a unique coordinate pair to which the Google Maps API assigned at least one institution. These were made as a side product, but line up well with the end results of disambiguation.
Some notes:
Methodology - all unique institution strings (around 330,000) were put through the Google Maps API, which generated coordinate pairs and various other location-related data for slightly more than 82% of the institutions.
Errors - of the successfully-geotagged institutions, a manually verified sample showed the vast majority were correctly geolocated (slightly over 80% of the sample). However, errors remained, and those errors tended to be gross - a Brazilian institution could be erroneously geolocated in Morocco.
Opacity - these images have varying opacities for markers. At 100% opacity, markers are colocation-independent. At lower opacities, the more institutions are colocated in one coordinate pair, the darker the marker. This is useful for identifying areas with dense institution colocation, but makes it difficult to see the geographic distribution of institutions as a whole. Opacity is marked under each image.
Expansion - see “expansion of a node” below. Clicking on any node-marker in Google Earth expands the node to show each colocated institution. In this image, there are numerous institutions colocated at that specific coordinate pair.
Cropping - in the grid, images are cropped. Click on any image to reveal it in full, then mouse over the image to see the title.
Further imagery is possible using Google Earth, but these areas seemed particularly relevant to the project at hand. Importing all of the geotagged institutions into Google Earth is incredibly resource-intensive, and I would suggest splitting the dataset into countries before importing.