Putting fossils on the map : Applying a geographical information system to heritage resources

HOW TO CITE: Van der Walt M, Cooper AK, Netterberg I, Rubidge BS. Putting fossils on the map: Applying a geographical information system to heritage resources. S Afr J Sci. 2015;111(11/12), Art. #2014-0371, 7 pages. http://dx.doi.org/10.17159/ sajs.2015/20140371 A geographical information system (GIS) database was compiled of Permo-Triassic tetrapod fossils from the Karoo Supergoup in South African museum collections. This database is the first of its kind and has great time applicability for understanding tetrapod biodiversity change though time more than 200 million years ago. Because the museum catalogues all differed in recorded information and were not compliant with field capture requirements, this information had to be standardised to a format that could be utilised for archival and research application. Our paper focuses on the processes involved in building the GIS project, capturing metadata on fossil collections and formulating future best practices. The result is a multi-layered GIS database of the tetrapod fossil record of the Beaufort Group of South Africa for use as an accurate research tool in palaeoand geoscience research with applications for ecology, ecosystems, stratigraphy and basin development.


Introduction
The fossil record of the Karoo Supergroup, which comprises a largely unbroken temporal record of tetrapod evolution from the Middle Permian to the Middle Jurassic, 1,2 provides a unique opportunity to set up a GIS database of fossil occurrences which can be utilised to answer questions relating to ecological and biodiversity change through time.The Karoo fossil record is the best preserved ecological assemblage of pre-mammalian terrestrial tetrapods documenting the stem lineages of both mammals and dinosaurs. 3,4 geocoded palaeontological data for use in a geographical information system (GIS) for palaeoscience research to explore issues relating to the biodiversity of Permian and Triassic tetrapod faunas.This was the first time a GIS had been applied to the fossil records of the Karoo Supergroup.With the cooperation of seven South African museums and institutes (Council for Geoscience, Pretoria; Ditsong Museum, Pretoria; Evolutionary Studies Institute, University of the Witwatersrand, Johannesburg; National Museum, Bloemfontein; Albany Museum, Grahamstown; Rubidge Collection, Wellwood, Graaff-Reinet; Iziko South African Museum, Cape Town) that curate collections of Karoo tetrapod fossils, a GIS incorporating the South African databases of fossil records collected from the Beaufort Group, Karoo Supergroup has been compiled.
The hundreds of thousands of fossil artefacts stored and accessioned in museum collections are the foundation of our knowledge on past biodiversity.Great strides have been made in biodiversity informatics in providing digital access to extinct biodiversity data, for integration, interpretation, reconstruction and application objectives.Models for community data access are evident in abundant projects, such as:

•
The Revealing Human Origins Initiative (RHOI) 5 Specimen Database, a collaboration of paleoanthropological and related projects studying Late Miocene (and Pliocene) hominins and other faunas in context, with the database including digital imagery and metadata that covers age, geology, collection elements and taxonomy; • The digital@rchive of Fossil Hominoids 6 , for which the primary mandate is to facilitate morphological investigations in the field of human evolution by providing digital data for the international scientific community; • The Darwin Core metadata standard 7 , a uniform standard designed to expedite the exchange of information about the geographic occurrence of species and specimen records in collections, with extensions for palaeontology.
These information systems, driven by distributed data retrieval, data capture and person-facilitated geospatial referencing, have enabled the investigation of novel research questions around ecological reconstruction, extinct biodiversity trends and predictive modelling.
Historically, details of fossils collected were recorded as hand-written descriptions on index cards or in catalogues (Figure 1).Such documentation included both data (e.g.species and location) and metadata (information about the record), such as who collected, prepared and/or identified the fossil, where the fossil is stored and who wrote up the index card.
There are a variety of standards available for metadata, such as the Dublin Core (ISO 15836:2009), 8 developed primarily for describing resources for discovery, and ISO 19115:2003, 9 for describing geographical data, of which the South African profile (subset) is SANS 1878-1:2005. 10Dublin Core is primarily text-based, making it easy to enter information for its 15 metadata elements, while ISO 19115 makes extensive use of encoding, which facilitates automated processing and presenting the metadata in multiple languages.Metadata can be converted from one standard to another using an ontology or a cross-walk. 11As ISO 19115 has encoded metadata and more detailed metadata elements, it is easy to convert its metadata to Dublin Core through a cross-walk (conversion South African fossil-find field notes for the Beaufort Group (to be eventually reconfigured into museum index cards) were written up over the space of 150 years (since 1845 12,13 ) and do not conform to any particular standard.The main disadvantage is that some records contain inadequate or ambiguous data, particularly relating to the precise location of the fossil provenance.
This paper focuses on the processes involved in establishing a GIS for tetrapod fossils from the Beaufort Group.It highlights the key challenges encountered during database establishment, as well as describing its main applications and future best practices for use as an accurate research tool in palaeontological research.This unique database is curated at the Evolutionary Studies Institute (ESI) at the University of the Witwatersrand and is available as a research tool to all bona-fide scientists

Creating a reliable product
Extensive fossil collections have been amassed from the rocks of the time-extensive Permo-Triassic Beaufort Group and curated in different museum collections in South Africa, providing a unique opportunity to incorporate these collections onto a GIS.Ultimately, this database will be expanded to include fossils from the Beaufort Group which are housed in overseas institutions such as the Natural History Museum, London; Smithsonian Institution, Washington DC and the Field Museum, Chicago.This GIS will enhance their utility in research relating to changing biodiversity patterns, both temporally and geographically, as well as stratigraphic and basin development modelling.
Problems that had to be overcome in setting up the GIS database related largely to a lack of consistency in the data, ambiguous locality data and outdated taxonomic records, requiring rigorous standardisation and updating.
While all the original data was provided in digital format, these were set up from manual records.This is the main drawback encountered when having to apply human interpretation verses the structured logic of the computer.The establishment of the GIS database highlighted the value of structuring data to suit GIS and other digital applications.The migration of paper records to useful electronic records could not simply be carried out verbatim as many of the data obtained from the contributing South African museums needed to be restructured to facilitate analysis through electronic means.

Mapping palaeontological specimens
As this database has been set up as a research tool to be used by palaeontologists, it is important to explain the methodology in detail so that users can fully understand why the GIS was created in this particular way.
The broad-spectrum processes were divided into three stages (Table 1): Stage 1: Acquisition and processing of original data; Stage 2: Establishing a GIS management system; Stage 3: Reconciliation.
More detailed processes involved in spatially mapping the fossils were subdivided into two phases (Table 2), Phase 1: accessing and processing of data and Phase 2: development of a spatial model.
Alphanumeric data was converted to spatial data because, for most of the records, the location was specified using geographical identifiers, 14 particularly farm names, rather than coordinates.Converting the data required rigorous 'cleaning', correction of spelling errors and standardisation of content to permit queries.Farm names with their corresponding farm numbers were aligned with the names registered with the Registrar of Deeds and the Surveyors General.
Once cleaning of data had been accomplished, selection of data fields applicable for spatial mapping was undertaken.Geospatial coordinates used for mapping species location and distribution are crucial for a reliable spatial system. 15Access to geospatially referenced data from fossils provides a quantitative basis for biodiversity analyses over time and predictive niche modelling for determining sampling densities of various sites.
Providing locality coordinates proved a significant challenge.Most of the recorded specimens were associated with a georeference, but this reference was, in most instances, a worded description of the localities from where they were discovered with few records having geographic coordinates (Table 3).
To get to the point where data could be represented on a spatial map, two approaches were adopted.The first involved selecting records that qualified for automatic import into the system.The second approach involved records that could only be entered onto the system manually.

Automated data entry procedure
Records with locality coordinates from a Global Positioning System (GPS) could be entered automatically.However, as the majority of records had only a farm name for the locality, a spatial database had to be created to allow records to be imported automatically to specific localities referenced as farm centroids.A farm centroid is the calculated gravitational centre of a polygon (farm boundaries are polygons).This centroid is calculated using the ArcMap field calculator which automatically sets a field value for a single record, or even all records.
Forcing such localities into a single point at the gravitational centre of the farm introduces error and inaccuracy into the data, but remains the best option to utilise locality data for the majority of fossils found prior to GPS usage.Current locality data were accurately captured by GPS.To allow data to be imported automatically, certain tasks had to be completed (Table 4).A geodatabase was created to house the spatial data of the farm, administrative, district and magisterial boundaries and local authorities' databases. 16Various map layers (including Landsat 7 ETM+ Satellite imagery) were necessary as backdrop data to interpret the distribution patterns of fossil taxa.
Because most of the specimens in older collections lacked geographic coordinates for their place of discovery, the most accurate locality information in the majority of the databases was simply a farm and district name.To represent this locality information on the GIS, farm locality data was received in .FEA format from the Surveyor General and converted into shape file format.Alphanumeric data were exported as a point file and joined to the polygon data using a spatial join.The cadastre received from the Surveyor General contained farm boundaries and their farm numbers, but very few farm names.This lack of names posed a problem as localities for most of the specimens in the museum catalogues were given as locality names, which were assumed to correspond to the farm names.As such, the farm names were essential for the geocoding of the localities and thus the specimens.
To solve the number versus name problem, Environmental Potential Atlas (ENPAT 2004 17 ) farm cadastre data was used as the new spatial layer to identify localities.For each farm, centroids were generated and used to geocode the specimens by linking the specimen locality names to farm names.Additional backdrop map layers included Surveyor General data for magisterial districts and provinces.These data were used to identify further localities, as farm names are not unique across the country.Digitised geological maps covering the extent of the Beaufort Group were included as additional backdrop data.
The Evolutionary Studies Institute (ESI) collection database was selected as the test case because of the high resolution of farm locality and map sheet data, to determine whether automated entry of palaeontological records was a feasible option.Unique localities were split into those with coordinates and those without, as the process for identifying the location of these two groups of localities was different 16 .
Those localities with grid coordinates were extracted and all coordinate data converted to decimal degrees, imported into ArcGIS ® as an event theme, converted to a shape file, and each specimen was located as a point in the spatial data file.
Localities that lacked coordinates were identified by districts, farm names and map sheet indices.As farm names are not unique and can be repeated for several districts, the map sheet index was used in addition to the farm names and districts as identifiers for the location of the farm localities.As an index shape file of the 1:50 000 map sheet series does not exist, a map sheet index shape file was created by digitising the sheets.
As a test run to determine how to automate the linkage of the locality name provided by the ESI data to farm records listed in the Surveyor General data, 13 distinct localities in the district of Beaufort West were selected (Table 5).According to the alphanumeric data, all these localities fall on the same map sheet except for the Winterberg (Gryskop) locality.Of the 13 localities, only seven were matched to the spatial data and of these only two localities fell on the correct map sheet, 16 indicating it would be difficult to automate the linking process.Another test was run to determine if 'selection by map sheet' could be used as a method to link alphanumeric data to spatial data.Map sheet 3123DD was randomly selected and alphanumeric records of the ESI collection located on this map sheet were selected, returning 41 records.These records were then queried such that only distinct localities would be returned, and resulted in 38 localities.As locality name should correspond to farm name, it follows that there should be 38 farms which intersect with this map sheet.A query was performed to select all the farms which lie wholly or partly on this map sheet and resulted in 16 farms -less than half the number of distinct localities.As the method of using locality name, farm name and farm centroid was not effective (because neither locality name nor farm name matched the government farm name), an alternative linkage solution needed to be created.
The database of Iziko South African Museum, which contains both locality names and formal government farm names, was used as the linkage mechanism.For each collection, a query for distinct records of specimens was run.Results from this query were input into a second query where the centroids with their government farm names were linked to the specimens with their locality names via the joining table derived from the Iziko South African Museum data.The results from these queries were imported into the GIS and merged to form one data file.The automatic impor t of records yielded a poor success rate.Of the 19 718 specimens selected for automatic impor t, 8512 records were imported automatically.Against the total number of records (20 968) that required import into the GIS, this number indicates a success rate of 43.2%.The only solution was for the remaining data to be manually imported into the existing digital-palaeo system.

Manual data entry procedure and collation of Beaufort Group data
Data tables containing specimens that had failed to import automatically were established for each museum and imported into the spatial map under separate map layers.Each fossil entry was then systematically added in point format to the spatial map.
The final phase involved collating the point data of the seven collections into a single map layer to decrease the complexity of future queries.The original intention was to append all the data sets onto a single data set, but because of differing alphanumeric table structures and schemes of the contributing museums, a single data set was not directly possible.Accordingly, the alphanumeric table structures of the various data files were manipulated to conform to a designed standard structure (Table 6). 16Shape files that had a multipoint geometry type, rather than a point geometry type, were converted to a point geometry type, and appended, resulting in a single point shape file.This file contains approximately 30 000 points and indicates where fossil vertebrates from the Beaufort Group were discovered in the field. 17

Applying GIS to palaeontological research
Establishment of the Beaufort Group GIS has resulted in a research aid which can be utilised to answer questions relating to Permo-Triassic continental vertebrate biodiversity.The system displays taxonomic diversity of fossil tetrapods from the Beaufort Group, with specific reference to the housing of particular genera, including numbers of specimens of each genus and their respective locality and/or biozone data.This unique data set provides a record of fossil tetrapod biodiversity in the continental realm from the Middle Permian to the Middle Triassic, and shows accurate numbers of specimens of the various taxa which have been collected (e.g.Smith et al. 18 ).
The spatial map allows for queries to be performed relating to geospatial distribution, which may assist in understanding the pattern of basin infill as well as significant biodiversity patterns.0][21][22] Now that the database of Karoo fossils is available on a GIS, biogeographic distribution patterns of index taxa can be determined and utilised in the development of basin development models.In addition, because the tropic level of each species is recorded in the database, changes in terrestrial vertebrate ecological relationships from the Middle Permian to the Mid Triassic have been explored. 23e database elucidates which genera occur in particular biozones, as well as numbers of individuals of each genus in successive biozones.Application of the GIS can highlight biodiversity changes across successive biozones, making it possible to calculate the extent of extinction of taxa within successive time slices, which allows for the determination of trends in biodiversity changes through time.
Through the application of the GIS database, the stratigraphic ranges of fossil specimens were calculated and a refined biostratigraphic subdivision of the Eodicynodon, Tapinocephalus and Pristerognathus assemblage zones (Middle Permian Beaufort Group) proposed. 24eviously, biostratigraphic maps of the Beaufort Group were compiled based on a rough estimate of the distribution of biozone-defining signature fossil genera.Through the utilisation of the GIS database, a far more precise biozone map of the Beaufort Group has been produced (Figure 2), with the capability of being continuously updated as new information is received.This method introduces an entirely new way of representing geographical fossil distribution data that can be used in basin development and tetrapod biogeographic studies. 25

Future spatial map enhancement and maintenance
Given the scope of both the specimen locality data and the necessity for this data to be available in a readily usable form, efficiency and accuracy is of prime importance for the task of geospatial referencing.Individual institutions housing palaeo-collections typically lack the resources or informatics expertise to meet the challenges of georeferencing alone. 15esigning a collaborative geospatial referencing methodology using the combined expertise of all palaeontologists in South Africa has yielded an accurate and reliable spatial map.
An important aspect of data management that will improve reliability of the data set is the accurate assignment of taxonomic data.This accuracy is crucial for trend analysis.The updating and verification of both biozone assessment and taxonomic assignment must be accomplished through ongoing collaboration of palaeontologists, each selected for their expertise in specific taxa of animals and/or geological expertise.Specimens in collections need to be checked and identifications must be updated using current taxonomic diagnoses.
The application of GIS technologies could have significant impact as it could open further avenues of GIS-based research in palaeontology.The data could be used for four-dimensional (4D) spatio-temporal modelling.These dimensions can be distorted by geological or other processes and hence there are different spatio-temporal dimensions that are relevant for the fossils, including the various 4D environments through which the fossil has been taken, changed, moved, etc., such as weathering, erosion, re-deposition, lithification, metamorphism, diagenesis, faulting, folding, etc.
Ongoing contribution to the GIS project involves the further development and refinement of the Spatial Map of Beaufort Group fossil specimens (e.g.Van der Walt et al. 25 ).Continual refinement and upgrading will ensure a reliable product that may confidently be used as an analytical and research tool.Combining all the South African databases of vertebrate fossils from the Beaufort Group onto a geographical information system is an important tool which could be utilised to address numerous different issues relating to Permo-Triassic continental tetrapod biodiversity in space and time.In the process of setting up this GIS, it became apparent that the course of action is not simply a matter of combining all the databases and coming up with answers.
Many metadata have been collected for recorded fossils from the seven large collections of vertebrate fossils in South Africa, but, particularly for specimens collected long ago, the quality of the metadata was not initially compliant with the requirements for a reliable GIS.Lack of complaince is largely because of a lack of consistency of data in the different collections, and also because precise GPS coordinates of localities of many of the fossils were not recorded, and only a farm locality was given.Certain specimens were listed more than once, despite the selection of distinct records.This is because of capture errors and lack of standardisation in the original data.In a number of the spreadsheets, the supposedly unique identifiers of some specimens were not actually unique, which made removing redundant data much more difficult.
Since overcoming these obstacles, the outcome is a GIS database that is reliable and applicable.7][28] As the Karoo geological succession preserves the most complete record of Middle Permian to Middle Triassic continental tetrapod biodiversity, a study of this nature is of great importance for an understanding of early continental tetrapod biodiversity changes as well as for Gondwanan basin modeling.
The objective of the established GIS database is for its utilisation as both a research tool and digital archive and as such, it requires continuous improvement, updating and refinement.Future work will include standardising the records for the fossils, for both data and the metadata.

Figure 1 :
Figure 1:Collected fossil recorded as a hand-written description on an index card.
http://www.sajs.co.za Volume 111 | Number 11/12 November/December 2015 Conclusion Digital archiving and spatial cataloguing of artefact collections data are central to biodiversity research.The way forward for the digitised spatial map of the Beaufort Group of the Karoo Basin is to create easier access for researchers, where currently access to collections is limited by their distributed nature on museum shelves.

Figure 2 :
Figure 2: Defined biostratigraphic map of the Beaufort Group, Karoo Supergroup of South Africa -produced through the application of GIS technology.

Table 1 :
Summary of processes for presenting data within a spatial system

Task order Processes Stage 1: Acquisition and processing of original data Stage 2: Establishing a GIS management system Stage 3: Reconciliation
3 Criteria established for the elimination of non-viable data with subsequent deletion Establishing an alternative base data Suggestions for improving the current success rate for digitising fossil localities http://www.sajs.co.za Volume 111 | Number 11/12 November/December 2015

Table 3 :
Challenges posed by textual geo-referencing

Table 5 :
Unique localities in the Beaufort West District

Table 6 :
Alphanumeric table structure