How to publish OFDS data¶
This page provides an overview of the process for publishing Open Fibre Data Standard (OFDS) data and how-to guides for specific topics.
Overview¶
The process for publishing OFDS data can be divided into three phases:
Plan¶
The plan phase covers identifying your priority use cases, deciding what data to publish and identifying your data sources.
Identify your priority use cases¶
There are many use cases for OFDS data, each with their own data needs. You ought to decide which use cases to prioritise so that you can make sure that your data includes the necessary entities and attributes and that it is available via suitable formats and access methods.
Decide what data to publish¶
Bearing in mind your priority use cases, you ought to review the OFDS data model and decide which entities and attributes you want to publish.
OFDS is primarily designed for the public disclosure of open data. However, you can also use it to exchange data with specific partners and to store data within your own organisation. As such, this step can involve deciding which entities and attributes to make public, which to share with partners and which to keep private.
Most attributes in OFDS are optional. However, the more attributes you publish, the more useful your data will be.
If you are concerned about disclosing sensitive location data, see how to obfuscate location data.
Identify your data sources¶
Once you have decided what data to publish, you ought to identify your data sources. These will be the systems, databases and documents that contain the data that you will convert to OFDS format for publication.
Prepare¶
The prepare phase covers choosing your data formats, mapping your data to OFDS, collecting missing data.
Choose your data formats and access methods¶
Bearing in mind your priority use cases, you ought to decide which data formats and access methods you wish to implement.
For more information, see how to format data for publication and how to provide access to data.
Map your data to OFDS¶
Once you have identified your data sources and chosen your data formats, you ought to map your data to the schema for your chosen data formats, that is, identify which data elements within your data sources match which OFDS attributes and codes. If there are data elements that you want to publish but for which you cannot identify a suitable mapping, you can add additional attributes to your OFDS data.
Your mapping acts as a blueprint for preparing your data. It will help you to identify the steps involved in converting your data to OFDS format.
Collect missing data¶
Your mapping might identify attributes that you want to publish but that are missing from your data sources. If so, you’ll need to collect the missing data.
Publish¶
The publish phase covers preparing your data, checking your data and publishing your data.
Prepare your data¶
Once you have completed your mapping and collected missing data, the next step is to convert your data to your chosen format.
The suggested approach is to develop a reproducible data pipeline so that you can easily update your OFDS publication when the data in your data sources is updated. However, you can prepare your data using whichever tools you are most comfortable with.
For guidance on common steps in converting your data to OFDS format, see the following guides:
Check your data¶
Once you have prepared your data, the next step is to use the OFDS Convert, Validate, Explore tool (CoVE) to check that your data is correctly structured and formatted according to OFDS.
Publish your data¶
Once any issues with the structure and format of your data have been resolved, the next step is to publish your data using your chosen access methods.
For your data to be open, you need to publish it using an open license. For more information, see how to license your data.
How-to guides¶
This section contains how-to guides for specific topics. To learn about the process for publishing OFDS data, see the overview.
How to obfuscate location data¶
If you’re concerned about disclosing the exact location of fibre infrastructure, you can truncate the coordinates of node locations and span routes in your public or shared data to obfuscate their exact locations, whilst retaining the precise coordinates for use within your own organisation. Before truncating coordinates, you ought to consider what level of accuracy is required to satisfy your priority use cases. You can use the following table as a guide to the relationship between coordinate precision and accuracy:
Coordinate precision |
Accuracy |
|---|---|
0.001° |
± 111 m |
0.0001° |
± 11.1 m |
0.00001° |
± 1.11 m |
0.000001° |
± 0.111 m |
How to add additional attributes¶
OFDS does not restrict the use of additional attributes, except where noted in the reference documentation. If there is a data element that you wish to publish for which you cannot identify a suitable mapping in OFDS, you can add an additional attribute to your data.
Before adding an additional attribute, you ought to search the standard issue tracker to see if a similar concept has already been discussed. If there are no existing discussions, you ought to open a new issue and describe the concept that you want to publish and your proposed modelling.
If you add an additional attribute, you ought to describe its structure, format and meaning in your data user guide. For more information, see how to write a data user guide.
How to format data¶
OFDS supports several data formats:
The JSON format reflects the structure of the data model, is useful to developers who want to use the data to build web apps, and offers a ‘base’ format that other publication formats can be converted to and from.
The GeoPackage format is useful to GIS analysts who want to import the data directly into GIS tools without any pre-processing.
The CSV format is useful to data analysts who want to import data directly into databases and other tabular analysis tools, and to users who want to explore the data in spreadsheet tools.
If you are publishing open data, to meet the widest range of use cases, you ought to publish data in all three formats. You can export data in whichever format best suits your needs, and use the following tools to convert it to the other formats:
The OFDS QGIS plugin supports importing OFDS data in JSON format and exporting it in GeoPackage format.
The OFDS QGIS plugin supports opening an OFDS GeoPackage and exporting it in OFDS JSON format.
Flatten Tool provides a command-line interface for transforming OFDS data from JSON to CSV format.
To convert data to CSV format:
Download the network schema
If your data is a JSON Lines file, segment it into appropriately sized network packages
Run the following command for each network package:
flatten-tool flatten --truncation-length=9 --root-list-path=networks --main-sheet-name=networks --schema=network-schema.json network-package.json -f csv
Flatten Tool provides a command-line interface for transforming OFDS data from CSV to JSON format.
To convert data to CSV format:
Download the network schema
Run the following command, replacing
path/to/csv/fileswith the path to your CSV files
flatten-tool unflatten -f csv -m networks -s network-schema.json --convert-wkt path/to/csv/files
How to publish large networks¶
This section describes how to:
Use pagination to publish an individual network that is too large to return in a single API response
Use streaming to publish an individual network that is too large to load into memory.
For information on how to use pagination and streaming to publish multiple networks, see the data formats reference.
Pagination¶
The preferred approach is to publish embedded nodes and spans in .nodes and .spans, respectively. If your network is too large to return in a single API response, you ought to use .links to reference separate endpoints for nodes and spans. Each endpoint ought to return a top-level JSON object with a nodes or a spans array, respectively, and a links object with URLs for the next and previous pages of results:
The following example shows a network with embedded nodes and spans:
The following example shows a network with references to separate endpoints for nodes and spans:
The following example shows the response returned by the nodes endpoint with URLs for the next and previous pages of results.
The following example shows the response returned by the spans endpoint with URLs for the next and previous pages of results.
Streaming¶
The preferred approach is to publish embedded nodes and spans. If your network is too large to load into memory, you ought to use .links to reference separate files for nodes and spans. Each file ought to be formatted as a JSON Lines file in which each line is a valid Node or Span, respectively.
The following example shows a network with embedded nodes and spans:
The following example shows a network with references to separate files for nodes and spans:
The following example shows a nodes file in JSON Lines format.
{}
{}
The following example shows a spans file in JSON Lines format.
{}
{}
How to provide access to data¶
Where resources allow, it is best practice to provide multiple access methods for your data so that both humans and machines can access it easily.
With respect to your OFDS publication, which best practices are most important will depend on your priority use cases, but you are encouraged to consider providing bulk downloads and API access.
Bulk downloads¶
If you are publishing only one network, or a small number of networks, you ought to use the approach described in the small file option for each publication format.
If you are publishing a large number of networks, you ought to use the approach to streaming multiple networks described in the streaming option for each publication format.
If you are publishing a network that is very large, you ought to use the approach to streaming nodes and spans described in how to publish large networks.
Compression¶
OFDS data can be compressed in order to save on disk space and bandwidth.
When compressing packages, use ZIP or GZIP, as these are commonly available, often without additional software. Avoid RAR, which requires additional software.
Serving files¶
The web server providing access to bulk files ought to report the HTTP Last-Modified header correctly, so that consuming applications only need to download updated files.
Also, publishers ought to ensure that the data export is completed successfully, i.e. that no files were truncated.
API access¶
If you are publishing data via an API, you need to consider pagination. If you are publishing multiple networks, you ought to use the pagination method described in the API response option for each publication format.
If you are publishing a network that is very large, you ought to use the approach to paginating nodes and spans described in how to publish large networks.
API design is a deep topic. As such, the following guidance is not intended to be comprehensive or prescriptive. Wherever possible, you ought to carry out your own user research.
Discoverability¶
Ensure that the API endpoints and documentation are discoverable. For example, add a link to the footer of your website, and list the API endpoints in your government’s open data portal.
Documentation¶
Provide API documentation, with at least the lists of endpoints, methods and parameters. Many open data publishers use Swagger to document their APIs.
Access control and rate limiting¶
Avoid adding access controls (like user registration or API keys), in order to maximise the ease of access to the publication.
If access controls are necessary, do not use access tokens that need to be refreshed regularly. For example, every two hours is too frequent.
If the API implements rate limits (throttling):
Document the rate limits in the API documentation (example).
When a user exceeds a rate limit, return a HTTP 429 ‘Too Many Requests’ response status code, and set the Retry-After HTTP header to indicate how long to wait before making a new request.
Completeness¶
Ensure that all OFDS data can be accessed via the API.
Response format¶
Put the network package at the top-level of the JSON data. For example, do not embed it under a results array.
Use a JSON library instead of implementing JSON serialisation yourself. This also guarantees that the encoding is UTF-8.
Remove NULL characters (\u0000) from the JSON response. These characters cannot be imported by users into some SQL databases.
If results cannot be returned, use an appropriate HTTP error code (400-599); do not return a JSON object with an error message and a 200 HTTP status code. That said, if a search request returns no results, it is appropriate to use a 200 HTTP status code, with an empty result set.
Monitoring¶
Set up error monitoring, so that if a request causes an HTTP 500 Internal Server Error, you can investigate.
How to transform coordinates to the correct coordinate reference system¶
To publish OFDS data, you need to specify coordinates in the urn:ogc:def:crs:OGC::CRS84 coordinate reference system (CRS). If the coordinates in your data sources are specified in a different CRS, before publishing your data in OFDS format, you first need to transform the coordinates to the correct CRS.
If your data pipeline includes a Geographic Information System such as ArcGIS or QGIS, these tools can transform coordinates from one CRS to another. If you are writing your own software, or if you prefer to use the command line, several libraries and tools are available, for example:
PROJ and its associated Python interface (PYPROJ) and JavaScript implementation (PROJ4JS are generic coordinate transformation tools that transform geospatial coordinates from one coordinate reference system (CRS) to another. They include command-line applications and an application programming interface.
GDAL is a translator library for raster and vector geospatial data formats. It also comes with a variety of useful command line utilities for data translation and processing.
Apache SIS is a free software, Java language library for developing geospatial applications. SIS provides data structures for geographic features and associated metadata along with methods to manipulate those data structures.
If you prefer to use a graphical user interface, several web-based tools are available, e.g. epsg.io.
The urn:ogc:def:crs:OGC::CRS84 CRS is equivalent to EPSG:4326 with reversed axes so, if it is not supported by your chosen transformation tool, you can instead transform your coordinates to EPSG:4326 and manually order your coordinates in longitude, latitude order.
How to generate universally unique identifiers¶
If you are writing your own software or if you prefer to use the command line, several libraries and tools are available to generate universally unique identifiers (UUIDS), for example:
Golang - google/uuid
PHP - ramsey/uuid
C++ - Boost UUID
Linux or C - libuuid
Python - uuid.py
Java - java.util.UUID
C# - System.Guid
JavaScript - Crypto.randomUUID
R - uuid
If you prefer to use a graphical user interface, several web-based tools are available, for example Online UUID Generator.
How to write a data user guide¶
Publishing OFDS data involves making choices about what data to include and exclude, and how to map existing data elements to the attributes in OFDS.
In order for users to interpret data correctly and make effective use of it, it’s important to describe your decisions and to provide guidance to data users. Your data user guide ought to include:
how you prepared the data and how frequently it is updated
the scope of the data
the meaning, structure and format of any additional attributes
the available data formats and access methods
license information for data reuse
any plans for changes to your publication
your contact details
Your data user guide ought to be made available as a public web page. You ought to link to the web page wherever you publish links to your data.
How to license your data¶
Publishing your data under an open license is important because it prevents restrictions on re-use, which could limit the usefulness of the data.
You are encouraged to use either a public domain dedication/certification or an attribution-only license:
A public domain dedication asserts no copyright, database rights or contractual rights over the data. For example, Creative Commons’ public domain tools.
Attribution-only licenses allow for use and reuse, with the only restriction being that attribution (credit) be given to the original publisher. For example, Creative Commons Attribution 4.0 International.
The Open Knowledge Foundation maintains a list of licenses that conform to the open definition. If you use a custom license, you ought to check that it conforms to the open definition.
You need to ensure that a clear license statement is provided wherever publish links to your data.