Accessing the Utility of CurrentFormat Registry Efforts forGeospatial ResourcesNancy J HoebelheinrichStanford University Libraries co author and presenter .
With Natalie K Munn Content Innovations LLC co author Tuesday May 5 2009IS T s Archiving 2009 Background NGDA project sponsored by Library of.
Congress NDIIPP Previously an investigation into theMD needed for long lived geospatialresources issued in 2008 Assumption that there is a place for.
format registries in preservation Key Question for study Would current FR efforts work forgeospatial resources Often complex compound digital.
Often proprietary Specialized user domain What s necessary for a generalpreservation repository environment incontrast to a geospatial repository .
No need to re invent Reviewed the current format registrydevelopment efforts at the time The National Archive of the UK PronomTechnical Registry.
Library of Congress Sustainability Factors Global Digital Format Registry collaborativeproject by Harvard University NARA andOCLC and funded by the Andrew MellonFoundation.
NGDA draft wiki based FR Methodology of study Compare data models of and outputfrom current FR efforts for File format characteristics.
Relationships among formats Structures for documenting versions Methodology of study Use real examples of geospatial data formats thatwere intended for ingest into NGDA repositories.
Research locate public documentation of dataformat to be ingested Note where explicit metadata existed within theresources themselves e g in file headers Examine 4 commonly used GIS conversion.
tools utilities to determine how widespread the useand level of support for import export and or directread write of data formats SDRI GDAL Manifold Methodology of study Create format definitions for 9 so far .
data formats based on publiclyavailable specifications White Papers reference materials and input fromexpert GIS users Evaluate report upon utility of FR s.
data structures for geospatialresources identify issues Formats reviewed in study 23 Raster images based on pixels e g TIFF Geotiff.
BIL Band Interleaved by Line ADRG Arc Digitized Raster Graphic ESRI Grid Vector images using points lines curves shapes e g Shapefile.
ArcInfo Coverages Grid formats represents elevation data for ground positions regularly spaced e g Digital Elevation Model DEM Spatial Data Transfer Standard SDTS .
Formats reviewed in study Proprietary formats e g ESRI Openly available formats e g TIFF Data formats used by international US national data sources.
Results from the study to date Report to National Geospatial Digital ArchiveRegarding Geospatial Data latest draft of 4 May NGDA Format Registry Research Bibliography andResources Appendix A .
NGDA Registry Survey spreadsheet of 23 formatsand how whether current FR efforts describe them Appendix B Summary of Registry Field Map across 4 FR efforts Appendix C .
Sample Geospatial Format Registry Definitions using the Pronom XML schema Appendix D GDFR and Pronom Format Registry Definitions comparison Appendix E preliminary Elements compared across FR.
efforts in study NGDA Field Research Survey pdf 49 tags compared across 4 FRs See also links to docs in References at end GDFR Data Model.
Pronom Information model Relevant articles by Alex Ball Adrian Key sometimes problemmatic elements for geospatial resources File format category .
Format family parent child supertype subtype Relationship among formats Container information Version information.
Associated software category Geospatial ex Shapefile vector Files included may vary greatly As defined by the spec shp main file describing shapes .
shx index file dbf database file containing feature attributes As found in the wild also may include prj text files describing projection info veryimportant .
xml sbx sbn often useful files generated byArcGIS tools which provide output descriptiveMD binary spatial indexes used by tool atx ArcGIS files created to provide index toattributes .
Two geospatial examples shapefiles vector images Description of format family relationship among shapefile relatedformats adequate .
Are the variations in shapefilessupertype subtype parent child orSecond cousin once removed Two geospatial examples shapefiles vector images .
Does Pronom s relatedFormat adequatelyexplain How to explain evolution of format inclusion of projection prj files Also true of TIFF family GeoTiff.
High Resolution Orthoimagery HRO Container info GDFR s compositionFacetdoesn t include file directory as container does include container as bundle as wrapper How to describe Same is true.
for DEM s and STDS Two geospatial examples shapefiles vector images Version info If variations in theformat develop over time but appear.
only as different agencies orproducers evolve e g inclusionof prj files are these versions When why would it matter Two geospatial examples .
shapefiles vector images Arguably software apps capable ofrendering shapefiles NOT ubiquitousenough in general purposepreservation repository is there an.
obligation to describe them Very unclear how to link software toformat in GDFR don t seem to showup in Pronom in Search mode Issues Suggestions raised by.
creation of format definitions FR should either keep copies of specs white papers or be sure to provideresolvable link to a preservation copy Encourage creation of ontologies for.
describing relationships among dataformat types Issues Suggestions Include examples of formats as partof the format definition assuming no.
rights restrictions on same Provide or encourage creation ofmore extensive definitions andguidelines Issues suggestions.
Invite software vendors to the party How to persuade include e g How practicable is it to create theformat definitions Issues Suggestions.
Some argue that all this work wouldnot be helpful anyway how to testthat question Next Steps for NGDA efforts Complete as many format definitions as.
possible within time left in grant Attempt to devise rational description offormat relationships among geospatial Approach ESRI and other software orservice vendors for assistance in creating.
format defs or making specs available Contribute format defs created to LC sSustainability Factors site and eitherPRONOM or UDFR as possible References.
National Geospatial Digital Archive NGDA project sponsored by Library ofCongress NDIIPP National Digital Information Infrastructure Preservation Program NDIIPP Nancy Hoebelheinrich et al An Investigation into Metadata for Long Lived Geospatial Data Formats .
The National Archive TNA PRONOM Technical Registry Library of Congress Sustainability Factors for Digital Resources Global Digital Format Registry Report to National Geospatial Digital Archive Regarding Geospatial Data latest draft of 4 May 2009.
Pronom 4 Information Model by Adrian Brown for The National Archives UK Version 1 4 January 2005 GDFR Data Model Specification version 5 0 14 Briefing Paper File Formats and XML Schema Registries issued 31 May2006 by Alex Ball.
White Paper Representation Information Registries issued 29 January2008 for PLANETs project by Adrian Brown The National Archives UK Unified Digital Format Registry Contacts and Questions Nancy Hoebelheinrich nhoebel stanford edu.
