This document was prepared for the virtual workshop, Georeferencing for Paleo: Refreshing the approach to fossil localities. Our goal here is to explore the georeferenced data that paleontological collections are currently providing to biodiversity data aggregators, namely iDigBio and GBIF. In particular, we want to know…
# Load core libraries; install these packages if you have not already
library(ridigbio)
library(tidyverse)
library(wordcloud)
# Load library for making nice HTML output
library(kableExtra)
Data in this example (unless otherwise noted) was downloaded from iDigBio on 2020-02-04 using the query: basisofrecord
= “fossilspecimen.” A data download from iDigBio includes both the raw data, as published by the data provider (e.g. the collection), and a second version of the same data which has been processed by iDigBio. You can learn more about what the difference between raw and processed recordsets contained in an iDigBio data download in this blog post.
# Read into R the raw occurrence data, which should be whatever was published by
# the data provider (e.g. the collection)
raw_idb <- read_csv("4336327f-dae0-4877-9d6d-460cb3a6ef13/occurrence_raw.csv",
na = character(),
col_types = cols())
# Read into R the version of occurrence data processed by iDigBio
processed_idb <- read_csv("4336327f-dae0-4877-9d6d-460cb3a6ef13/occurrence.csv",
na = character(),
col_types = cols())
# Count how many total records are present in `processed_idb`
records_total <- nrow(processed_idb)
# Count how many records are georeferenced in `processed_idb`
records_georef <- processed_idb %>%
filter(`idigbio:geoPoint` != "") %>%
nrow()
Our data here are comprised of 57 provider datasets representing a total of 5,569,112 specimen records.
coreid | aec:associatedTaxa | dc:rights | dcterms:accessRights | dcterms:bibliographicCitation | dcterms:language | dcterms:license | dcterms:modified | dcterms:references | dcterms:rights | dcterms:rightsHolder | dcterms:source | dcterms:type | dwc:Identification | dwc:MeasurementOrFact | dwc:ResourceRelationship | dwc:VerbatimEventDate | dwc:acceptedNameUsage | dwc:acceptedNameUsageID | dwc:accessRights | dwc:associatedMedia | dwc:associatedOccurrences | dwc:associatedOrganisms | dwc:associatedReferences | dwc:associatedSequences | dwc:associatedTaxa | dwc:basisOfRecord | dwc:bed | dwc:behavior | dwc:catalogNumber | dwc:class | dwc:classs | dwc:collectionCode | dwc:collectionID | dwc:continent | dwc:coordinatePrecision | dwc:coordinateUncertaintyInMeters | dwc:country | dwc:countryCode | dwc:county | dwc:dataGeneralizations | dwc:datasetID | dwc:datasetName | dwc:dateIdentified | dwc:day | dwc:decimalLatitude | dwc:decimalLongitude | dwc:disposition | dwc:dynamicProperties | dwc:earliestAgeOrLowestStage | dwc:earliestEonOrLowestEonothem | dwc:earliestEpochOrLowestSeries | dwc:earliestEraOrLowestErathem | dwc:earliestPeriodOrLowestSystem | dwc:endDayOfYear | dwc:establishmentMeans | dwc:eventDate | dwc:eventID | dwc:eventRemarks | dwc:eventTime | dwc:family | dwc:fieldNotes | dwc:fieldNumber | dwc:footprintSRS | dwc:footprintSpatialFit | dwc:footprintWKT | dwc:formation | dwc:genus | dwc:geodeticDatum | dwc:geologicalContextID | dwc:georeferenceProtocol | dwc:georeferenceRemarks | dwc:georeferenceSources | dwc:georeferenceVerificationStatus | dwc:georeferencedBy | dwc:georeferencedDate | dwc:group | dwc:habitat | dwc:higherClassification | dwc:higherGeography | dwc:higherGeographyID | dwc:highestBiostratigraphicZone | dwc:identificationID | dwc:identificationQualifier | dwc:identificationReferences | dwc:identificationRemarks | dwc:identificationVerificationStatus | dwc:identifiedBy | dwc:individualCount | dwc:informationWithheld | dwc:infraspecificEpithet | dwc:institutionCode | dwc:institutionID | dwc:island | dwc:islandGroup | dwc:kingdom | dwc:language | dwc:latestAgeOrHighestStage | dwc:latestEonOrHighestEonothem | dwc:latestEpochOrHighestSeries | dwc:latestEraOrHighestErathem | dwc:latestPeriodOrHighestSystem | dwc:lifeStage | dwc:lithostratigraphicTerms | dwc:locality | dwc:locationAccordingTo | dwc:locationID | dwc:locationRemarks | dwc:lowestBiostratigraphicZone | dwc:materialSampleID | dwc:maximumDepthInMeters | dwc:maximumElevationInMeters | dwc:member | dwc:minimumDepthInMeters | dwc:minimumElevationInMeters | dwc:modified | dwc:month | dwc:municipality | dwc:nameAccordingTo | dwc:namePublishedIn | dwc:namePublishedInID | dwc:namePublishedInYear | dwc:nomenclaturalCode | dwc:nomenclaturalStatus | dwc:occurrenceDetails | dwc:occurrenceID | dwc:occurrenceRemarks | dwc:occurrenceStatus | dwc:order | dwc:organismID | dwc:organismName | dwc:organismQuantity | dwc:organismQuantityType | dwc:organismRemarks | dwc:originalNameUsage | dwc:originalNameUsageID | dwc:otherCatalogNumbers | dwc:ownerInstitutionCode | dwc:parentNameUsage | dwc:phylum | dwc:pointRadiusSpatialFit | dwc:preparations | dwc:previousIdentifications | dwc:recordNumber | dwc:recordedBy | dwc:reproductiveCondition | dwc:rights | dwc:rightsHolder | dwc:sampleSizeValue | dwc:samplingEffort | dwc:samplingProtocol | dwc:scientificName | dwc:scientificNameAuthorship | dwc:scientificNameID | dwc:sex | dwc:specificEpithet | dwc:startDayOfYear | dwc:stateProvince | dwc:subgenus | dwc:taxonID | dwc:taxonRank | dwc:taxonRemarks | dwc:taxonomicStatus | dwc:typeStatus | dwc:verbatimCoordinateSystem | dwc:verbatimCoordinates | dwc:verbatimDepth | dwc:verbatimElevation | dwc:verbatimEventDate | dwc:verbatimLatitude | dwc:verbatimLocality | dwc:verbatimLongitude | dwc:verbatimSRS | dwc:verbatimTaxonRank | dwc:vernacularName | dwc:waterBody | dwc:year | gbif:Identifier | gbif:Reference | idigbio:recordId | symbiota:recordEnteredBy | symbiota:verbatimScientificName | zan:ChronometricDate |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
3ee2f19f-046f-4c52-ab31-f9b42ed12a89 | NA | NA | 2011-05-09 00:00:00 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | FossilSpecimen | NA | NA | Lzzz/4510 | NA | Fossil | NA | Asia | NA | NA | Indonesia | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | Cervidae | NA | NA | NA | NA | Axis | NA | NA | NA | NA | NA | NA | NA | NA | MZLU | NA | NA | NA | NA | NA | Sangiran | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | MZLU:Fossil:Lzzz/4510 | Artiodactyla | NA | NA | NA | NA | NA | NA | NA | NA | NA | Skeletal part(s) | NA | NA | NA | NA | NA | NA | NA | Axis sp | NA | NA | NA | Java | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
9702a3d1-810a-4f9a-b9e9-7bc04f54f7f4 | NA | NA | Open Access, http://creativecommons.org/publicdomain/zero/1.0/; see Yale Peabody policies at: http://hdl.handle.net/10079/8931zqj | Paramys (YPM VP 059011) | en | http://creativecommons.org/publicdomain/zero/1.0/ | 2017-03-28 16:45:37 | http://collections.peabody.yale.edu/search/Record/YPM-VP-059011 | Yale Peabody Museum of Natural History | NA | PhysicalObject | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | FossilSpecimen | NA | NA | YPM VP 059011 | Mammalia | NA | VP | NA | North America | NA | NA | USA | NA | Coordinate data unavailable | NA | 10 | NA | NA | Eocene | Tertiary | NA | NA | 1963-06-10 | NA | NA | NA | Ischyromyidae | NA | 63-188 | NA | NA | NA | Willwood Fm | Paramys | NA | NA | NA | Animalia; Chordata; Vertebrata; Amniota; Mammalia; Theriiformes—–Theria-Placentalia-Epitheria; Preptotheria-Anagalida-Simplicidentata; Rodentia; Sciuromorpha; Ischyromyoidea; Ischyromyidae; Paramyinae | North America; USA; Wyoming | NA | NA | NA | NA | NA | 1 | YPM | NA | NA | NA | Animalia | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 6 | NA | NA | NA | NA | ICZN | NA | NA | urn:uuid:004bd82e-de14-4917-8d21-ab9dcb39b2fb | jaw fragment with tooth, 2 jaw fragments with incisors, 1 incisor, 1 incisor fragment; VP number 59011; lot count 1 | Rodentia | NA | NA | NA | NA | NA | NA | NA | YPM | NA | Chordata | NA | Paramys | NA | Yale 1963 Wyoming (Willwood) Expedition, Yale 1963 Wyoming (Willwood) Expedition | NA | NA | NA | NA | NA | NA | Paramys | Leidy, 1871 | NA | NA | NA | Wyoming | NA | Genus | Fossils, Rocks and Minerals: Fossils - Vertebrates | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | squirrels; rodents; mammals; vertebrates; chordates; animals | NA | 1963 | NA | NA | NA | NA | NA | ||||||||||||||||||||||||||||||||||||||||||
07da9e61-2e81-4eb4-b7c1-74c1cac96630 | NA | NA | Open Access, http://creativecommons.org/publicdomain/zero/1.0/; see Yale Peabody policies at: http://hdl.handle.net/10079/8931zqj | Deinonychus antirrhopus (YPM VP 059012) | en | http://creativecommons.org/publicdomain/zero/1.0/ | 2017-03-22 15:31:23 | http://collections.peabody.yale.edu/search/Record/YPM-VP-059012 | Yale Peabody Museum of Natural History | NA | PhysicalObject | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | FossilSpecimen | NA | NA | YPM VP 059012 | Reptilia | NA | VP | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | Dromaeosauridae | NA | NA | NA | NA | Deinonychus | NA | NA | NA | Animalia; Chordata; Vertebrata; Amniota; Reptilia; Diapsida; Archosauria; Saurischia; Theropoda; Dromaeosauridae | NA | NA | NA | NA | NA | 1 | YPM | NA | NA | NA | Animalia | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | ICZN | NA | NA | urn:uuid:8a42b32c-5934-43c6-8c61-be863403fc55 | Composite left manus for Teaching Collection, see notes for list of elements; VP number 59012; lot count 1 | Saurischia | NA | NA | NA | NA | NA | NA | NA | YPM | NA | Chordata | NA | cast | Deinonychus antirrhopus | NA | NA | NA | NA | NA | NA | NA | Deinonychus antirrhopus | Ostrom, 1969 | NA | NA | antirrhopus | NA | NA | Species | Fossils, Rocks and Minerals: Fossils - Vertebrates | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | raptors; dinosaurs; Reptiles; vertebrates; chordates; animals | NA | NA | NA | NA | NA | NA | NA | |||||||||||||||||||||||||||||||||||||||||||||||||||
05698b27-d162-4628-92e9-3153ff67a6ab | NA | NA | Open Access, http://creativecommons.org/publicdomain/zero/1.0/; see Yale Peabody policies at: http://hdl.handle.net/10079/8931zqj | Rodentia (YPM VP 059002) | en | http://creativecommons.org/publicdomain/zero/1.0/ | 2017-03-28 16:17:01 | http://collections.peabody.yale.edu/search/Record/YPM-VP-059002 | Yale Peabody Museum of Natural History | NA | PhysicalObject | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | FossilSpecimen | NA | NA | YPM VP 059002 | Mammalia | NA | VP | NA | North America | NA | NA | USA | NA | Big Horn County | Coordinate data unavailable | NA | 18 | NA | NA | Eocene | Tertiary | NA | NA | 1963-06-18 | NA | NA | NA | NA | 370 | NA | NA | NA | Willwood Fm | NA | NA | NA | Animalia; Chordata; Vertebrata; Amniota; Mammalia; Theriiformes—–Theria-Placentalia-Epitheria; Preptotheria-Anagalida-Simplicidentata; Rodentia | North America; USA; Wyoming; Big Horn County | NA | NA | NA | NA | NA | 1 | YPM | NA | NA | NA | Animalia | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 6 | NA | NA | NA | NA | ICZN | NA | NA | urn:uuid:e3c74dca-f079-4078-90f0-299b3208cf18 | jaw fragment with teeth; VP number 59002; lot count 1 | Rodentia | NA | NA | NA | NA | NA | NA | NA | YPM | NA | Chordata | NA | Rodentia | NA | Yale 1963 Wyoming (Willwood) Expedition, Yale 1963 Wyoming (Willwood) Expedition | NA | NA | NA | NA | NA | NA | Rodentia | Bowdich, 1821 | NA | NA | NA | Wyoming | NA | Order | Fossils, Rocks and Minerals: Fossils - Vertebrates | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | rodents; mammals; vertebrates; chordates; animals | NA | 1963 | NA | NA | NA | NA | NA | |||||||||||||||||||||||||||||||||||||||||||
8c826bb5-ba30-4357-b119-18b24541a02c | NA | NA | NA | http://ucmpdb.berkeley.edu/cgi/ucmp_query2?spec_id=V285838&one=T | http://vertnet.org/resources/norms.html | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | FossilSpecimen | NA | NA | 285838 | Reptilia | NA | V | NA | North America | NA | NA | United States | NA | Apache County | NA | NA | NA | NA | Mesozoic | Late Triassic | Mesozoic | Triassic | NA | NA | NA | NA | NA | Stagonolepididae | NA | NA | NA | NA | Chinle | Acaenosuchus | -7308 | NA | NA | NA | NA | Late Triassic | NA | NA | NA | NA | Location data available to qualified researchers on request. | UCMP | NA | NA | NA | Animalia | NA | Mesozoic | Late Triassic | Mesozoic | Triassic | NA | Saint Johns 2 | NA | -7308 | NA | Late Triassic | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | ICZN | NA | NA | urn:catalog:UCMP:V:285838 | Aetosauria | NA | NA | NA | NA | NA | NA | NA | NA | NA | transverse process and osteoderms tip | NA | NA | NA | NA | NA | NA | NA | Acaenosuchus geoffreyi | NA | NA | geoffreyi | NA | Arizona | NA | species | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | ||||||||||||||||||||||||||||||||||||||||||||||
f26beca1-32ab-4c87-bc70-57af70aac9c8 | NA | NA | NA | http://ucmpdb.berkeley.edu/cgi/ucmp_query2?spec_id=V285929&one=T | http://vertnet.org/resources/norms.html | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | FossilSpecimen | NA | NA | 285929 | Amphibia | NA | V | NA | North America | NA | NA | United States | NA | Apache County | NA | NA | NA | NA | Mesozoic | Late Triassic | Mesozoic | Triassic | NA | NA | NA | NA | NA | Metoposauridae | NA | NA | NA | NA | Chinle | -7308 | NA | NA | NA | NA | Late Triassic | NA | NA | NA | NA | Location data available to qualified researchers on request. | UCMP | NA | NA | NA | Animalia | NA | Mesozoic | Late Triassic | Mesozoic | Triassic | NA | Saint Johns 2 | NA | -7308 | NA | Late Triassic | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | ICZN | NA | NA | urn:catalog:UCMP:V:285929 | Temnospondyli | NA | NA | NA | NA | NA | NA | NA | NA | NA | skull fragment | NA | Camp, C.L. | NA | NA | NA | NA | NA | NA | Metoposauridae | NA | NA | NA | Arizona | NA | family | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
coreid | idigbio:associatedsequences | idigbio:barcodeValue | dwc:basisOfRecord | dwc:bed | gbif:canonicalName | dwc:catalogNumber | dwc:class | dwc:collectionCode | dwc:collectionID | idigbio:collectionName | dwc:recordedBy | dwc:vernacularName | idigbio:commonnames | dwc:continent | dwc:coordinateUncertaintyInMeters | dwc:country | idigbio:isoCountryCode | dwc:county | idigbio:eventDate | idigbio:dateModified | idigbio:dataQualityScore | dwc:earliestAgeOrLowestStage | dwc:earliestEonOrLowestEonothem | dwc:earliestEpochOrLowestSeries | dwc:earliestEraOrLowestErathem | dwc:earliestPeriodOrLowestSystem | idigbio:etag | dwc:eventDate | dwc:family | dwc:fieldNumber | idigbio:flags | dwc:formation | dwc:genus | dwc:geologicalContextID | idigbio:geoPoint | dwc:group | idigbio:hasImage | idigbio:hasMedia | dwc:higherClassification | dwc:highestBiostratigraphicZone | dwc:individualCount | dwc:infraspecificEpithet | dwc:institutionCode | dwc:institutionID | idigbio:institutionName | dwc:kingdom | dwc:latestAgeOrHighestStage | dwc:latestEonOrHighestEonothem | dwc:latestEpochOrHighestSeries | dwc:latestEraOrHighestErathem | dwc:latestPeriodOrHighestSystem | dwc:lithostratigraphicTerms | dwc:locality | dwc:lowestBiostratigraphicZone | dwc:maximumDepthInMeters | dwc:maximumElevationInMeters | idigbio:mediarecords | dwc:member | dwc:minimumDepthInMeters | dwc:minimumElevationInMeters | dwc:municipality | dwc:occurrenceID | dwc:order | dwc:phylum | idigbio:recordIds | dwc:recordNumber | idigbio:recordset | dwc:scientificName | dwc:specificEpithet | dwc:startDayOfYear | dwc:stateProvince | dwc:taxonID | dwc:taxonomicStatus | dwc:taxonRank | dwc:typeStatus | idigbio:uuid | dwc:verbatimEventDate | dwc:verbatimLocality | idigbio:version | dwc:waterBody |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
3ee2f19f-046f-4c52-ab31-f9b42ed12a89 | NA | NA | fossilspecimen | NA | axis | lzzz/4510 | mammalia | fossil | NA | NA | asia | NA | indonesia | idn | NA | 2017-06-28 13:15:02 | 0.1594203 | 82681e6ba5c73210b7791c287c81de2850d670e6 | cervidae | [“dwc_taxonrank_added”, “dwc_phylum_added”, “dwc_scientificnameauthorship_added”, “dwc_taxonomicstatus_added”, “gbif_genericname_added”, “dwc_datasetid_added”, “dwc_parentnameusageid_added”, “dwc_taxonid_added”, “idigbio_isocountrycode_added”, “gbif_canonicalname_added”, “gbif_taxon_corrected”, “dwc_class_added”, “dwc_kingdom_added”] | axis | FALSE | FALSE | NA | mzlu | NA | NA | animalia | sangiran | NA | NA | NA | NA | mzlu:fossil:lzzz/4510 | artiodactyla | chordata | [“858a7761-82a5-47df-8e8a-dbc8806cf424\mzlu:fossil:lzzz/4510”] | NA | 858a7761-82a5-47df-8e8a-dbc8806cf424 | axis sp | NA | java | 8535967 | doubtful | genus | 3ee2f19f-046f-4c52-ab31-f9b42ed12a89 | NA | NA | NA | NA | ||||||||||||||||||||||||||||||
9702a3d1-810a-4f9a-b9e9-7bc04f54f7f4 | NA | NA | fossilspecimen | NA | paramys | ypm vp 059011 | mammalia | vp | NA | NA | yale 1963 wyoming (willwood) expedition, yale 1963 wyoming (willwood) expedition | squirrels; rodents; mammals; vertebrates; chordates; animals | [“squirrels; rodents; mammals; vertebrates; chordates; animals”] | north america | NA | united states | usa | 1963-06-10 | 2017-12-06 14:53:16 | 0.3478261 | eocene | tertiary | ab063f634bbd55f12925d73fbafebaadc9cae97d | 1963-06-10 | ischyromyidae | 63-188 | [“dwc_country_replaced”, “idigbio_isocountrycode_added”, “gbif_canonicalname_added”, “dwc_taxonomicstatus_added”, “gbif_genericname_added”, “dwc_datasetid_added”, “gbif_taxon_corrected”, “dwc_parentnameusageid_added”, “dwc_taxonid_added”] | willwood fm | paramys | FALSE | FALSE | animalia; chordata; vertebrata; amniota; mammalia; theriiformes—–theria-placentalia-epitheria; preptotheria-anagalida-simplicidentata; rodentia; sciuromorpha; ischyromyoidea; ischyromyidae; paramyinae | 1 | ypm | NA | NA | animalia | NA | NA | NA | NA | urn:uuid:004bd82e-de14-4917-8d21-ab9dcb39b2fb | rodentia | chordata | [“0220907a-0463-4ae0-8a0b-77f5e80fff40\urn:uuid:004bd82e-de14-4917-8d21-ab9dcb39b2fb”] | NA | 0220907a-0463-4ae0-8a0b-77f5e80fff40 | paramys | 161 | wyoming | 4828164 | accepted | genus | 9702a3d1-810a-4f9a-b9e9-7bc04f54f7f4 | NA | NA | NA | NA | ||||||||||||||||||||||
07da9e61-2e81-4eb4-b7c1-74c1cac96630 | NA | NA | fossilspecimen | NA | deinonychus antirrhopus | ypm vp 059012 | reptilia | vp | NA | NA | raptors; dinosaurs; reptiles; vertebrates; chordates; animals | [“raptors; dinosaurs; Reptiles; vertebrates; chordates; animals”] | NA | NA | 2017-12-06 14:53:16 | 0.1884058 | 27d07c44df90c0d9d64f5645bf540cb2d69bc4f3 | dromaeosauridae | [“gbif_canonicalname_added”, “dwc_taxonomicstatus_added”, “gbif_genericname_added”, “dwc_datasetid_added”, “gbif_taxon_corrected”, “dwc_parentnameusageid_added”, “dwc_taxonid_added”, “gbif_vernacularname_added”, “dwc_scientificnameauthorship_replaced”] | deinonychus | FALSE | FALSE | animalia; chordata; vertebrata; amniota; reptilia; diapsida; archosauria; saurischia; theropoda; dromaeosauridae | 1 | ypm | NA | NA | animalia | NA | NA | NA | NA | urn:uuid:8a42b32c-5934-43c6-8c61-be863403fc55 | saurischia | chordata | [“0220907a-0463-4ae0-8a0b-77f5e80fff40\urn:uuid:8a42b32c-5934-43c6-8c61-be863403fc55”] | NA | 0220907a-0463-4ae0-8a0b-77f5e80fff40 | deinonychus antirrhopus | antirrhopus | NA | 4966355 | accepted | species | 07da9e61-2e81-4eb4-b7c1-74c1cac96630 | NA | NA | NA | NA | |||||||||||||||||||||||||||||||
05698b27-d162-4628-92e9-3153ff67a6ab | NA | NA | fossilspecimen | NA | ypm vp 059002 | mammalia | vp | NA | NA | yale 1963 wyoming (willwood) expedition, yale 1963 wyoming (willwood) expedition | rodents; mammals; vertebrates; chordates; animals | [“rodents; mammals; vertebrates; chordates; animals”] | north america | NA | united states | usa | big horn county | 1963-06-18 | 2017-12-06 14:53:16 | 0.3768116 | eocene | tertiary | 1edf097413b16fc09bb889804599f2b4cc6a37bd | 1963-06-18 | 370 | [“dwc_country_replaced”, “idigbio_isocountrycode_added”] | willwood fm | FALSE | FALSE | animalia; chordata; vertebrata; amniota; mammalia; theriiformes—–theria-placentalia-epitheria; preptotheria-anagalida-simplicidentata; rodentia | 1 | ypm | NA | NA | animalia | NA | NA | NA | NA | urn:uuid:e3c74dca-f079-4078-90f0-299b3208cf18 | rodentia | chordata | [“0220907a-0463-4ae0-8a0b-77f5e80fff40\urn:uuid:e3c74dca-f079-4078-90f0-299b3208cf18”] | NA | 0220907a-0463-4ae0-8a0b-77f5e80fff40 | rodentia | 169 | wyoming | NA | order | 05698b27-d162-4628-92e9-3153ff67a6ab | NA | NA | NA | NA | |||||||||||||||||||||||||
8c826bb5-ba30-4357-b119-18b24541a02c | NA | NA | fossilspecimen | NA | acaenasuchus geoffreyi | 285838 | reptilia | v | NA | NA | north america | NA | united states | usa | apache county | NA | 2017-07-10 22:25:29 | 0.3913043 | mesozoic | late triassic | mesozoic | triassic | 5425ef28de328df8942f7bd5bc44c5395afd9985 | stagonolepididae | [“dwc_phylum_added”, “dwc_scientificnameauthorship_added”, “dwc_taxonomicstatus_added”, “gbif_genericname_added”, “dwc_datasetid_added”, “gbif_taxon_corrected”, “dwc_taxonid_added”, “idigbio_isocountrycode_added”, “gbif_canonicalname_added”, “dwc_parentnameusageid_added”, “dwc_genus_replaced”] | chinle | acaenasuchus | -7308 | FALSE | FALSE | late triassic | NA | ucmp | NA | NA | animalia | mesozoic | late triassic | mesozoic | triassic | saint johns 2 | late triassic | NA | NA | NA | NA | urn:catalog:ucmp:v:285838 | aetosauria | chordata | [“5ab348ab-439a-4697-925c-d6abe0c09b92\urn:catalog:ucmp:v:285838”] | NA | 5ab348ab-439a-4697-925c-d6abe0c09b92 | acaenosuchus geoffreyi | geoffreyi | NA | arizona | 4967763 | accepted | species | 8c826bb5-ba30-4357-b119-18b24541a02c | NA | NA | NA | NA | ||||||||||||||||
f26beca1-32ab-4c87-bc70-57af70aac9c8 | NA | NA | fossilspecimen | NA | 285929 | amphibia | v | NA | NA | camp, c.l. | north america | NA | united states | usa | apache county | NA | 2017-07-10 22:25:29 | 0.4492754 | mesozoic | late triassic | mesozoic | triassic | 2ad7cfef0c826acd4d4732229ebb9c998c276c4e | metoposauridae | [“idigbio_isocountrycode_added”] | chinle | -7308 | FALSE | FALSE | late triassic | NA | ucmp | NA | NA | animalia | mesozoic | late triassic | mesozoic | triassic | saint johns 2 | late triassic | NA | NA | NA | NA | urn:catalog:ucmp:v:285929 | temnospondyli | [“5ab348ab-439a-4697-925c-d6abe0c09b92\urn:catalog:ucmp:v:285929”] | NA | 5ab348ab-439a-4697-925c-d6abe0c09b92 | metoposauridae | NA | arizona | NA | family | f26beca1-32ab-4c87-bc70-57af70aac9c8 | NA | NA | NA | NA |
Of these records, 42.2% are georeferenced. The majority of this georeferencing has been done in the recent past.
# Collate data about when records were georeferenced, based on data provided
# in the column `data.dwc:georeferencedDate`
georef_timeline <- raw_idb %>%
select(`dwc:georeferencedDate`) %>%
filter(!is.na(`dwc:georeferencedDate`) & `dwc:georeferencedDate` != "") %>%
mutate(date = lubridate::as_date(`dwc:georeferencedDate`)) %>%
mutate(year1 = lubridate::year(date)) %>%
mutate(year2 = case_when(is.na(year1) ~ `dwc:georeferencedDate`)) %>%
unite(year, c(year1, year2), sep = " ", na.rm = TRUE) %>%
mutate(year = str_trim(str_replace(year, "NA", ""))) %>%
group_by(year) %>%
tally() %>%
filter(nchar(year) == 4 & year > 2000 & year < 2021)
# Plot `georef_timeline`
ggplot(georef_timeline, aes(x = year, y = n)) +
geom_bar(stat = "identity", fill = "steelblue") +
ggtitle("Timeline of when paleo records on iDigBio were georeferenced") +
xlab("Year") +
ylab("Number of records")
Data for the figure below were downloaded from GBIF on 2020-04-23 using the query: basisofrecord
= “fossil” (doi.org/10.15468/dl.7nnj39). This dataset includes 1,1665,493 specimen records provided by >90 collections.
In the figure above, data providers are columns and Darwin Core fields are rows. Green indicates the presence of a particular Darwin Core field in data published by a provider, though the fact that a field is present does not necessarily mean that there are values in it. The takeaway from this figure is that only three standard Darwin Core fields related to georeferencing are in use by the majority of data providers. The top fields used by paleo collections providing data to GBIF are:
Data for the figure below is from the iDigBio dataset introduced at the beginning of this document.
# Summarize frequency of metadata for georeference data
perc_geodeticDatum <- raw_idb %>%
select(`dwc:geodeticDatum`) %>%
filter(!is.na(`dwc:geodeticDatum`) & `dwc:geodeticDatum` != "") %>%
nrow()
# Summarize frequency of metadata for georeference data
perc_coordinateUncertaintyInMeters <- raw_idb %>%
select(`dwc:coordinateUncertaintyInMeters`) %>%
filter(!is.na(`dwc:coordinateUncertaintyInMeters`) &
`dwc:coordinateUncertaintyInMeters` != "") %>%
nrow()
# Summarize frequency of metadata for georeference data
perc_coordinatePrecision <- raw_idb %>%
select(`dwc:coordinatePrecision`) %>%
filter(!is.na(`dwc:coordinatePrecision`) & `dwc:coordinatePrecision` != "") %>%
nrow()
# Summarize frequency of metadata for georeference data
perc_georeferencedBy <- raw_idb %>%
select(`dwc:georeferencedBy`) %>%
filter(!is.na(`dwc:georeferencedBy`) & `dwc:georeferencedBy` != "") %>%
nrow()
# Summarize frequency of metadata for georeference data
perc_georeferencedDate <- raw_idb %>%
select(`dwc:georeferencedDate`) %>%
filter(!is.na(`dwc:georeferencedDate`) & `dwc:georeferencedDate` != "") %>%
nrow()
# Summarize frequency of metadata for georeference data
perc_georeferenceProtocol <- raw_idb %>%
select(`dwc:georeferenceProtocol`) %>%
filter(!is.na(`dwc:georeferenceProtocol`) & `dwc:georeferenceProtocol` != "") %>%
nrow()
# Summarize frequency of metadata for georeference data
perc_georeferenceSources <- raw_idb %>%
select(`dwc:georeferenceSources`) %>%
filter(!is.na(`dwc:georeferenceSources`) & `dwc:georeferenceSources` != "") %>%
nrow()
# Summarize frequency of metadata for georeference data
perc_georeferenceVerificationStatus <- raw_idb %>%
select(`dwc:georeferenceVerificationStatus`) %>%
filter(!is.na(`dwc:georeferenceVerificationStatus`) &
`dwc:georeferenceVerificationStatus` != "") %>%
nrow()
# Summarize frequency of metadata for georeference data
perc_georeferenceRemarks <- raw_idb %>%
select(`dwc:georeferenceRemarks`) %>%
filter(!is.na(`dwc:georeferenceRemarks`) & `dwc:georeferenceRemarks` != "") %>%
nrow()
# Summarize frequency of metadata for georeference data
perc_dataGeneralizations <- raw_idb %>%
select(`dwc:dataGeneralizations`) %>%
filter(!is.na(`dwc:dataGeneralizations`) & `dwc:dataGeneralizations` != "") %>%
nrow()
# Summarize frequency of metadata for georeference data
perc_informationWithheld <- raw_idb %>%
select(`dwc:informationWithheld`) %>%
filter(!is.na(`dwc:informationWithheld`) & `dwc:informationWithheld` != "") %>%
nrow()
# Collate summary data into a single data frame
percSummary <- tribble(
~field,
~yes,
~no,
"geodeticDatum",
perc_geodeticDatum,
sum(records_total-perc_geodeticDatum),
"coordinateUncertaintyInMeters",
perc_coordinateUncertaintyInMeters,
sum(records_total-perc_coordinateUncertaintyInMeters),
"coordinatePrecision",
perc_coordinatePrecision,
sum(records_total-perc_coordinatePrecision),
"georeferencedBy",
perc_georeferencedBy,
sum(records_total-perc_georeferencedBy),
"georeferencedDate",
perc_georeferencedDate,
sum(records_total-perc_georeferencedDate),
"georeferenceProtocol",
perc_georeferenceProtocol,
sum(records_total-perc_georeferenceProtocol),
"georeferenceSources",
perc_georeferenceSources,
sum(records_total-perc_georeferenceSources),
"georeferenceVerificationStatus",
perc_georeferenceVerificationStatus,
sum(records_total-perc_georeferenceVerificationStatus),
"georeferenceRemarks",
perc_georeferenceRemarks,
sum(records_total-perc_georeferenceRemarks),
"dataGeneralizations",
perc_dataGeneralizations,
sum(records_total-perc_dataGeneralizations),
"informationWithheld",
perc_informationWithheld,
sum(records_total-perc_informationWithheld))
# Plot `percSummary`
percSummary_plot <- percSummary %>%
pivot_longer(-field, names_to = "inUse", values_to = "count") %>%
group_by(field) %>%
mutate(perc = round(count/sum(count)*100)) %>%
mutate(ypos = cumsum(perc)- 0.5*perc ) %>%
ggplot(aes(x = 0, y = count, fill = inUse)) +
geom_bar(stat = "identity", color = "black") +
coord_polar(theta = "y") +
facet_wrap(~field) +
labs(title = "Percentage of records with values in georeference metadata fields",
fill = "In use?") +
theme_void() +
theme(plot.title = element_text(margin = margin(10, 0, 10, 0))) +
scale_fill_manual(values = c("white", "steelblue"))
# Save `percSummary_plot` as a file
ggsave("percSummary_plot.png", width = 9, height = 6, units = "in")
# Determine unique values for `dwc:geodeticDatum`
geodeticDatum <- raw_idb %>%
group_by(`dwc:geodeticDatum`) %>%
filter(!is.na(`dwc:geodeticDatum`) & `dwc:geodeticDatum` != "") %>%
tally() %>%
arrange(desc(n)) %>%
rename(value = `dwc:geodeticDatum`)
# Generate wordcloud based on frequency of values
wordcloud(words = geodeticDatum$value,
freq = geodeticDatum$n,
min.freq = 1,
max.words = 200,
random.order = FALSE,
rot.per = 0.25,
colors = brewer.pal(8, "Dark2"))
Unique values present in the dwc:geodeticDatum
field
value | n |
---|---|
WGS84 | 1957318 |
WGS 84 | 65246 |
WGS 1984 | 56442 |
EPSG:4326 | 56029 |
NAD27 | 49588 |
NAD 27 | 19150 |
not recorded (forced WGS84) | 11489 |
WGS84/NAD83 | 6848 |
NAD 1927 | 4772 |
NAD83 | 2528 |
unknown | 2064 |
PRP_M | 310 |
GDA94 | 301 |
WGs84 | 189 |
WGS | 179 |
WGS1984 | 168 |
ENT.30851 | 141 |
Unknown | 106 |
WGS72 | 99 |
IPE.05223 | 98 |
IPB.09111 | 65 |
IPE.05426 | 54 |
IPE.06762 | 46 |
IPC.01379 | 45 |
NAD 1983 | 42 |
IPE.04435 | 39 |
IPB.09112 | 36 |
IPD.00959 | 36 |
IZS.06861 | 36 |
IPB.09171 | 34 |
IZS.12943 | 31 |
IPB.09236 | 27 |
IPE.06925 | 27 |
IZS.12947 | 26 |
h | 23 |
IPE.06528 | 23 |
IPE.06979 | 23 |
IPB.09166 | 21 |
IPE.06796 | 21 |
IPE.09565 | 21 |
IZS.12954 | 21 |
IPD.00967 | 20 |
IPB.09156 | 19 |
IPE.04593 | 19 |
IPB.09145 | 18 |
IPB.09239 | 17 |
IPB.09133 | 16 |
IZS.01535 | 15 |
IZS.04567 | 15 |
ICH.02847 | 14 |
IZS.12958 | 14 |
WGA84 | 14 |
IPD.07947 | 13 |
IZS.12944 | 13 |
NAD1983 | 13 |
IPB.09132 | 12 |
IPD.06427 | 12 |
IZS.12945 | 12 |
IZS.25699 | 12 |
30.51247 | 11 |
IZS.17594 | 11 |
IZS.24653 | 11 |
IPD.00963 | 10 |
IPD.09482 | 10 |
IPE.07230 | 10 |
IPE.09624 | 10 |
IZS.01569 | 10 |
IZS.12942 | 10 |
ORN.10655 | 10 |
Google Earth Estimate | 9 |
IPA.06853 | 9 |
IPD.00962 | 9 |
IPD.01015 | 9 |
IPE.06819 | 9 |
IPB.02344 | 8 |
IPB.09231 | 8 |
IPE.06934 | 8 |
IZS.12955 | 8 |
37.437467 | 7 |
IPE.01374 | 7 |
IZS.01534 | 7 |
IZS.12951 | 7 |
IZS.25702 | 7 |
IPB.02018 | 6 |
IPB.09248 | 6 |
IPD.00971 | 6 |
IPD.01568 | 6 |
IPE.07181 | 6 |
IPE.07551 | 6 |
NWS84 | 6 |
PB.06184 | 6 |
5 | |
IPB.02010 | 5 |
IPB.02015 | 5 |
IPB.09247 | 5 |
IPB.09283 | 5 |
IPD.00965 | 5 |
IZS.25701 | 5 |
PB.05755 | 5 |
IPB.09102 | 4 |
IPB.09237 | 4 |
IPE.06664 | 4 |
IPE.07272 | 4 |
IZS.15214 | 4 |
PB.05117 | 4 |
PB.05140 | 4 |
GeoBasis-DE/BKG | 3 |
IPB.01850 | 3 |
IPB.09154 | 3 |
IPD.01026 | 3 |
IPE.06617 | 3 |
IPE.06636 | 3 |
IZS.01493 | 3 |
IZS.01565 | 3 |
IZS.24599 | 3 |
IZS.29006 | 3 |
IZS.30720 | 3 |
N/A | 3 |
NAD27_CONUS | 3 |
-5.83322 | 2 |
IPA.06877 | 2 |
IPB.01876 | 2 |
IPB.05015 | 2 |
IPB.09113 | 2 |
IPB.09279 | 2 |
IPC.01689 | 2 |
IPE.05421 | 2 |
IPE.06161 | 2 |
IPE.06644 | 2 |
IPE.07177 | 2 |
IPE.07229 | 2 |
IPE.07313 | 2 |
IPE.07400 | 2 |
IPF.00065 | 2 |
IPF.04346 | 2 |
IZS.01553 | 2 |
IZS.08080 | 2 |
IZS.12949 | 2 |
IZS.12956 | 2 |
IZS.12957 | 2 |
IZS.15520 | 2 |
IZS.16785 | 2 |
ORN.01302 | 2 |
ORN.10656 | 2 |
VP.02813 | 2 |
32.54572 | 1 |
34.101905 | 1 |
48.8444 | 1 |
ENT.14228 | 1 |
ENT.24683 | 1 |
GEOBases-DE | 1 |
GeoBasis-DE | 1 |
IPB.01853 | 1 |
IPB.01886 | 1 |
IPB.02005 | 1 |
IPB.02377 | 1 |
IPB.05107 | 1 |
IPB.09136 | 1 |
IPB.09225 | 1 |
IPB.09244 | 1 |
IPC.08142 | 1 |
IPD.00888 | 1 |
IPD.00960 | 1 |
IPD.00966 | 1 |
IPD.00972 | 1 |
IPD.00975 | 1 |
IPD.01016 | 1 |
IPD.01019 | 1 |
IPD.01029 | 1 |
IPD.02127 | 1 |
IPD.04669 | 1 |
IPD.08308 | 1 |
IPD.08926 | 1 |
IPE.03174 | 1 |
IPE.04642 | 1 |
IPE.06162 | 1 |
IPE.06496 | 1 |
IPE.06584 | 1 |
IPE.06618 | 1 |
IPE.06632 | 1 |
IPE.06691 | 1 |
IPE.06693 | 1 |
IPE.06702 | 1 |
IPE.06772 | 1 |
IPE.06800 | 1 |
IPE.06967 | 1 |
IPE.07020 | 1 |
IPE.07144 | 1 |
IPE.07145 | 1 |
IPE.07146 | 1 |
IPE.07147 | 1 |
IPE.07148 | 1 |
IPE.07149 | 1 |
IPE.07150 | 1 |
IPE.07276 | 1 |
IPE.07402 | 1 |
IZS.01536 | 1 |
IZS.12946 | 1 |
IZS.12948 | 1 |
IZS.12950 | 1 |
IZS.12959 | 1 |
IZS.16784 | 1 |
IZS.16811 | 1 |
IZS.17285 | 1 |
IZS.17596 | 1 |
IZS.26967 | 1 |
IZS.28749 | 1 |
MIN.06036 | 1 |
not recorded | 1 |
PB.05312 | 1 |
PB.05670 | 1 |
WGS83 | 1 |
# Determine unique values for `dwc:georeferenceProtocol`
georeferenceProtocol <- raw_idb %>%
group_by(`dwc:georeferenceProtocol`) %>%
filter(!is.na(`dwc:georeferenceProtocol`) & `dwc:georeferenceProtocol` != "") %>%
tally() %>%
arrange(desc(n)) %>%
rename(value = `dwc:georeferenceProtocol`)
# Generate wordcloud based on frequency of values
wordcloud(words = georeferenceProtocol$value,
freq = georeferenceProtocol$n,
min.freq = 1,
max.words = 200,
random.order = FALSE,
rot.per = 0.25,
colors = brewer.pal(8, "Dark2"))
Unique values present in the dwc:georeferenceProtocol
field
value | n |
---|---|
Georeferencing Quick Reference Guide Version 2012-10-02 | 408844 |
digital resource | 358377 |
physical resource | 105145 |
GEOLocate | 101518 |
unspecified | 23878 |
Georeferencing Quick Guide | 23381 |
LACMIP georeferencing 2015-2018 | 16952 |
GBIF Best Practices, Quick Guide | 11385 |
GBIF Best Practices; Quick Guide | 11178 |
“Guide to Best Practices for Georeferencing”“, Chapman and Wieczorek” | 10360 |
Georeferencing Quick Reference Guide | 8514 |
MaNIS/HerpNet/ORNIS Georeferencing Guidelines, GBIF Best Practices | 8024 |
LACMIP georeferencing 2019 | 6861 |
Georeferencing Quick Reference Guide Version 2012-10-08 | 4109 |
“Guide to Best Practices for Georeferencing”" (Chapman and Wieczorek, eds. 2006), Global Biodiversity Information Facility" | 1523 |
Loran A | 1284 |
GBIF Best Practices Quick Guide | 1210 |
Quad Map | 955 |
unknown | 833 |
Batch georeferenced using Google Maps API | 430 |
MaNIS/HerpNET/ORNIS Georeferencing Guidelines | 394 |
Visual Or Radar | 351 |
Guide to Best Practices for Georeferencing Chapman and Wieczorek, eds. 2006, Global Biodiversity Information Facility | 124 |
Not provided by collector | 111 |
Quad Map,Creswell,N.C. | 109 |
Sat Nav | 109 |
Unknown | 96 |
Quad Map,Columbia East | 76 |
Quad Map,Scotia | 76 |
GPS reading in field | 64 |
Quad Map,Roper South | 45 |
Raydist Station Signals 504+585 | 43 |
Quad Map,Frying Pan,N.C. | 29 |
Dead Reckoning | 28 |
GBIF Best Practics; Quick Guide | 22 |
Quad Map,Creswell Se,N.C. | 16 |
Quad Map,Fort Landing,N.C. | 15 |
GBIF Best Practices; Quick Guide; Guidebook | 14 |
Biogeomancer, Point Radius | 13 |
Quad Map,Manteo | 8 |
GBIF Best Practices; Quick Guide | 1 |
Quad Map,Colombia East | 1 |
unknown-migration | 1 |
# DEtermine unique values for `dwc:georeferenceSources`
georeferenceSources <- raw_idb %>%
group_by(`dwc:georeferenceSources`) %>%
filter(!is.na(`dwc:georeferenceSources`) & `dwc:georeferenceSources` != "") %>%
tally() %>%
arrange(desc(n)) %>%
rename(value = `dwc:georeferenceSources`)
# Generate wordcloud based on frequency of values
wordcloud(words = georeferenceSources$value,
freq = georeferenceSources$n,
min.freq = 1,
max.words = 200,
random.order = FALSE,
rot.per = 0.25,
colors = brewer.pal(8, "Dark2"))
Unique values present in the dwc:georeferenceSources
field
Let’s take a more detailed look at the precision of coordinates provided. When we talk about precision in this context we mean “the number of digits after the decimal point on a latitude or longitude that is recorded in decimal degrees.” Precision is a measure of the exactness of the latitude and longitude coordinates compared to reality. This comic illustrates the concept of precision well:
# Summarize precision for all records
precision <- raw_idb %>%
select(`dwc:decimalLatitude`, `dwc:decimalLongitude`) %>%
filter(!is.na(`dwc:decimalLatitude`) | !is.na(`dwc:decimalLongitude`)) %>%
mutate(lat = as.character(`dwc:decimalLatitude`)) %>%
mutate(lon = as.character(`dwc:decimalLongitude`)) %>%
separate(lat, c("int_lat", "dec_lat"), sep = "\\.") %>%
separate(lon, c("int_lon", "dec_lon"), sep = "\\.") %>%
mutate(precision_lat = str_length(dec_lat)) %>%
mutate(precision_lon = str_length(dec_lon)) %>%
mutate(precision_lat = recode(precision_lat, "1" = "0.1",
"2" = "0.01",
"3" = "0.001",
"4" = "0.0001",
"5" = "0.00001",
"6" = "0.000001",
"7" = "0.0000001",
"8" = "0.00000001",
"9" = "0.000000001",
"10" = "0.0000000001")) %>%
mutate(precision_lon = recode(precision_lon, "1" = "0.1",
"2" = "0.01",
"3" = "0.001",
"4" = "0.0001",
"5" = "0.00001",
"6" = "0.000001",
"7" = "0.0000001",
"8" = "0.00000001",
"9" = "0.000000001",
"10" = "0.0000000001")) %>%
group_by(precision_lat, precision_lon) %>%
summarise(n = n()) %>%
arrange(desc(n)) %>%
mutate(percent = round(n/2351102*100, 1)) %>%
rename(count = n)
Precision is an essential concept for paleo collections because reducing precision (typically by truncating decimals) is a common method we use to obscure locality data when sharing it widely.
Summary of precision in georeferenced data, determined from coordinate fields
precision_lat | precision_lon | count | percent |
---|---|---|---|
0.000001 | 0.000001 | 705919 | 30.0 |
0.01 | 0.01 | 569041 | 24.2 |
0.00001 | 0.00001 | 204761 | 8.7 |
0.0001 | 0.0001 | 163805 | 7.0 |
0.0000001 | 0.0000001 | 105205 | 4.5 |
0.1 | 0.1 | 74231 | 3.2 |
0.000001 | 0.00001 | 63567 | 2.7 |
0.00001 | 0.000001 | 58538 | 2.5 |
0.01 | 0.1 | 52133 | 2.2 |
0.001 | 0.001 | 46861 | 2.0 |
0.0001 | 0.001 | 36495 | 1.6 |
0.00001 | 0.0001 | 30938 | 1.3 |
0.0000001 | 0.000001 | 25408 | 1.1 |
0.1 | 0.01 | 24225 | 1.0 |
0.0001 | 0.00001 | 15072 | 0.6 |
0.001 | 0.0001 | 13036 | 0.6 |
0.001 | 0.01 | 11958 | 0.5 |
NA | 0.1 | 11917 | 0.5 |
0.000001 | 0.0001 | 11015 | 0.5 |
0.0001 | 0.01 | 9085 | 0.4 |
0.000001 | 0.0000001 | 8146 | 0.3 |
0.0001 | 0.000001 | 8140 | 0.3 |
NA | NA | 8068 | 0.3 |
0.1 | NA | 8040 | 0.3 |
0.0001 | 0.0000001 | 7811 | 0.3 |
0.0000001 | 0.00001 | 6491 | 0.3 |
0.00001 | 0.0000001 | 5779 | 0.2 |
0.01 | 0.0001 | 5230 | 0.2 |
0.00001 | 0.001 | 4633 | 0.2 |
0.01 | 0.001 | 4497 | 0.2 |
0.00001 | 0.01 | 3955 | 0.2 |
0.0000001 | 0.0001 | 3666 | 0.2 |
0.01 | NA | 3436 | 0.1 |
0.001 | 0.1 | 3433 | 0.1 |
0.0000001 | 0.001 | 3138 | 0.1 |
0.01 | 0.000001 | 3137 | 0.1 |
0.01 | 0.00001 | 2758 | 0.1 |
0.001 | 0.0000001 | 2527 | 0.1 |
0.000001 | 0.01 | 2525 | 0.1 |
0.000001 | 0.001 | 2079 | 0.1 |
0.1 | 0.000001 | 1971 | 0.1 |
0.001 | 0.00001 | 1881 | 0.1 |
NA | 0.01 | 1609 | 0.1 |
0.00001 | 0.1 | 1515 | 0.1 |
0.001 | 0.000001 | 1091 | 0.0 |
0.000001 | 0.1 | 1046 | 0.0 |
NA | 0.000001 | 1035 | 0.0 |
0.0000001 | 0.01 | 925 | 0.0 |
0.1 | 0.00001 | 887 | 0.0 |
0.0001 | 0.1 | 829 | 0.0 |
0.0000000001 | 0.0000000001 | 817 | 0.0 |
0.01 | 0.0000001 | 771 | 0.0 |
0.00000001 | 0.00000001 | 754 | 0.0 |
0.1 | 0.0000001 | 670 | 0.0 |
0.1 | 0.001 | 654 | 0.0 |
0.1 | 0.0001 | 629 | 0.0 |
0.000001 | NA | 625 | 0.0 |
0.0001 | NA | 605 | 0.0 |
0.000001 | 0.0000000001 | 404 | 0.0 |
0.00000001 | 0.0000000001 | 267 | 0.0 |
0.00000001 | 0.0000001 | 181 | 0.0 |
0.00000001 | NA | 139 | 0.0 |
0.0000001 | NA | 122 | 0.0 |
0.001 | NA | 120 | 0.0 |
NA | 0.0001 | 116 | 0.0 |
0.00001 | NA | 112 | 0.0 |
0.0000000001 | 0.001 | 100 | 0.0 |
0.0000001 | 0.1 | 86 | 0.0 |
0.00000001 | 0.0001 | 70 | 0.0 |
0.000000001 | 0.000000001 | 56 | 0.0 |
0.0000000001 | 0.0001 | 41 | 0.0 |
0.00001 | 0.0000000001 | 40 | 0.0 |
NA | 0.001 | 33 | 0.0 |
NA | 0.00001 | 28 | 0.0 |
0.0001 | 0.0000000001 | 27 | 0.0 |
0.001 | 0.0000000001 | 27 | 0.0 |
0.0000000001 | 0.1 | 26 | 0.0 |
0.0001 | 0.00000001 | 26 | 0.0 |
0.0000001 | 0.00000001 | 17 | 0.0 |
0.000001 | 0.000000001 | 9 | 0.0 |
0.0000000001 | 0.000000001 | 7 | 0.0 |
0.0000000001 | 0.0000001 | 6 | 0.0 |
0.00000001 | 0.01 | 5 | 0.0 |
0.0000000001 | 0.00001 | 4 | 0.0 |
0.0000000001 | 0.01 | 4 | 0.0 |
0.00000001 | 0.000001 | 3 | 0.0 |
0.0000001 | 0.000000001 | 3 | 0.0 |
0.0000000001 | NA | 2 | 0.0 |
0.00000001 | 0.001 | 2 | 0.0 |
0.01 | 0.0000000001 | 2 | 0.0 |
0.000001 | 0.00000001 | 1 | 0.0 |
0.1 | 0.0000000001 | 1 | 0.0 |
NA | 0.0000000001 | 1 | 0.0 |
NA | 0.0000001 | 1 | 0.0 |
Possible fields to look at:
# Unique values ? for...
# dwc:georeferencedBy
# dwc:georeferenceRemarks
# dwc:verbatimElevation
# dwc:minimumElevationInMeters
# dwc:maximumElevationInMeters
# dwc:maximumDepthInMeters
# dwc:minimumDepthInMeters
# dwc:coordinateUncertaintyInMeters
# dwc:footprintSRS
# dwc:footprintSpatialFit
# dwc:footprintWKT
# dwc:pointRadiusSpatialFit
# dwc:verbatimCoordinateSystem
# dwc:verbatimCoordinates
# dwc:verbatimDepth
# dwc:verbatimLatitude
# dwc:verbatimLocality
# dwc:verbatimLongitude
# dwc:verbatimSRS
Possible geographic fields to look at:
Possible lithostratigraphic fields to look at:
# Unique values ? for...
# dwc:geologicalContextID
# dwc:lithostratigraphicTerms
# dwc:lowestBiostratigraphicZone
# dwc:fieldNumber
# dwc:locality
# dwc:locationAccordingTo
# dwc:locationID
# dwc:locationRemarks
# Unique values ? for lithostratigraphic fields
# dwc:bed
# dwc:member
# dwc:formation
# dwc:earliestAgeOrLowestStage
# dwc:earliestEonOrLowestEonothem
# dwc:earliestEpochOrLowestSeries
# dwc:earliestEraOrLowestErathem
# dwc:earliestPeriodOrLowestSystem
# dwc:latestAgeOrHighestStage
# dwc:latestEonOrHighestEonothem
# dwc:latestEpochOrHighestSeries
# dwc:latestEraOrHighestErathem
# dwc:latestPeriodOrHighestSystem
# Unique values ? for...
# dwc:institutionCode
# dwc:institutionID
# dwc:collectionCode
# dwc:collectionID
# dwc:datasetID
# dwc:datasetName
# Unique values for `dwc:basisOfRecord` but beyond this dataset
# Look at geopoints
# Country cleanup?