Data Management Plan

The SAIL Data Management Plan is a document detailing the management of all the data from Project SAIL - S pace- A tmosphere-Ocean I nteractions in the marine boundary L ayer. Project SAIL aims to improve the scientific understanding of the marine boundary layer by means of a unique monitoring campaign on board the iconic Portuguese tall ship NRP Sagres during its 2020 circumnavigation expedition, planned to start on January 5th 2020 (Fig. 1). The campaign will enable the measurement of the atmospheric electric field over the ocean, and to study space-driven interactions via the detailed monitoring of GNSS signals, cosmic radiation, environmental radioactivity and atmospheric ionization. The atmospheric measurements will be complemented by the collection of fish samples for analysis of biological indicators of ocean's health, and by underwater monitoring of the ocean state (temperature, conductivity, dissolved oxygen, pH, spectral radiance), providing unique data for the detailed study of ocean-atmosphere fluxes and surface-atmosphere interactions.


E1
Atmospheric electric field 1 (measured at the high position in the mast)

E2
Atmospheric electric field 2 (measured at the low position in the mast)

Introduction
The SAIL Data Management Plan is a document detailing the management of all the data from Project SAIL -Space-Atmosphere-Ocean Interactions in the marine boundary Layer. Project SAIL aims to improve the scientific understanding of the marine boundary layer by means of a unique monitoring campaign on board the iconic Portuguese tall ship NRP Sagres during its 2020 circumnavigation expedition, planned to start on January 5th 2020 (Fig. 1). The campaign will enable the measurement of the atmospheric electric field over the ocean, and to study space-driven interactions via the detailed monitoring of GNSS signals, cosmic radiation, environmental radioactivity and atmospheric ionization.
The atmospheric measurements will be complemented by the collection of fish samples for analysis of biological indicators of ocean's health, and by underwater monitoring of the ocean state (temperature, conductivity, dissolved oxygen, pH, spectral radiance), providing unique data for the detailed study of ocean-atmosphere fluxes and surface-atmosphere interactions. The SAIL management team is composed by the PI, the co-PI and the CE. The project is divided into three main stages: i) an initial step of planning and preparation of the campaign (October-December 2019); ii) the monitoring campaign during the circumnavigation (January 2020 -January 2021); iii) curation and analysis of the campaign data (February 2021 -September 2023).
In the first stage of the project a multi-parametric sensor infrastructure will be installed onboard the NRP Sagres. A dedicated geo-referencing and temporal synchronization network, based on the Global Navigation Satellite System (GNSS) will be designed and installed, along with a triple-antenna configuration capable of sensing the ship's movements in six-degrees-of-freedom. It will provide accurate time reference for synchronization and will also allow the ship trajectory to be recorded for posterior analysis of the dynamic forces endured by the ship.
For each component of the monitoring system, specific procedures will be detailed in a operation manual defining in detail eventual intervention from the crew on board for tasks such as tow-fish recovering, routine checks, and minor maintenance works like cleaning sensing surfaces or performing periodical visual checks on the equipment.
Software will be developed. for real-time interaction with the sensors in the monitoring system-Furthermore a diagnostic tool will be developed and made available to the NRP Sagres crew, in order to enable routine brief checks on the status of the monitoring system and data collection, and eventually trigger human intervention and troubleshooting.
The planning stage also includes the organisation of a training workshop aiming to enable the NRP crew to follow specific procedures for collection and storage of fish samples and to ensure preservation of the samples for further laboratory analysis after the trip.
The second stage comprises the collection of atmospheric and oceanographic measurements, as well as fish samples and ancillary information (navy records, system logs): -GNSS and atmospheric measurements (electric field, gamma radiation, visibility, ions, solar and cosmic radiation), every 1-second. All the sensors will transmit the data to the onboard PC with the corresponding time-stamp provided by the GNSS system.
underwater measurements from a tow-fish deployed by the Sagres crew for collection of underwater observations.
fish samples from opportunistic fish collection by the NRP Sagres crew,properly stored on board at -20°C for posterior laboratory analysesby CIIMAR meteorological information from the ship's records Intermediate collection of measurements will be performed whenever possible (e.g. after each leg of the trip, when the ship is docked) for a preliminary inspection and check of the data.
Workshops on selected topics related to the project theme will be organised by AIR Centre during the Atlantic part of the trip.
The third stage of the project includes the final curation of all the data and its preparation for publication, as well as the analysis of the data, and the publication of scientific results.
The DMP will be monitored and updated during the whole project and periodically revised and updated according to project changes.

Types of data
Different types of data will be collected, generated and processed in the project, including -Project management data: data used internally in the management of the project, including meeting minutes, protocols, project reports, photos, data collected on public events organized by the project,...
-Observational data: campaign data, meteorological data collected by the ship crew, fish samples data; -Derived data: data resulting from the processing and analysis of data collected in the project, such as processed data and curated databases, maps, plots,...
No data subjects are involved in the project.
Project management data are processed internally and are not passed on to third parties outside of the project except funding entities as may be required by any applicable reporting obligations.
All the other types of data do not include any sensitive or personal information.

Documentation and metadata
Project documents include: Metadata will be created using INSPIRE and Dublin Core. Other specific descriptors can be added during the deposit process if deemed necessary.
To improve the fit for re-use, detailed metadata, e.g. on the temporal and spatial resolutions, and the method used in the collection and analysis of the data will be described. Moreover, the metadata about geographic location will be obtained from the GNSS information.

Data collection
Data collection in the 1st stage of the project of planning and set-up includes the data collected during the installation and testing of the different sensors, as well as photos from the set-up activities.
Data collected in the 2nd stage of the project, corresponding to the monitoring campaign, includes several datasets of all the measured parameters, described in Table 1. All the data collected during the SAIL campaign, and coming directly from the onboard storage system, are hereafter designated as Sensor Data.
In the 3rd stage of the project (quality control and data analysis), two distinct types of datasets will be created from the Sensor Data files: -merged daily files with logging errors corrected, hereafter designated as Raw Data.
-datasets resulting from the application of variable-specific pre-processing and quality-control procedures to the raw data, hereafter designated as Processed Data.
The quality-control procedures will be described in detail in the second version of the DMP.

Data Storage and backup
The data from the set-up stage will be preserved on the PI's PC as well as on the INESC TEC institutional Drive (https://drive.inesctec.pt). The paper documents (e.g. project protocols, instruments manuals,...) and documentation detailed in 2.1. will be preserved in the offices of the researchers responsible for the project. Selected documentation will be digitized and preserved on https://drive.inesctec.pt.
The data from the SAIL monitoring system are stored in the onboard computer, organised in a separate folder for each day containing the individual hourly files for each sensor. Every day, at 01:00, the folder for each day is compressed (tar.gz format). The compressed data are kept for 2 days on the ship computer and copied automatically to a Network-attached storage (NAS) also on board the ship. A further copy of the data is stored in a SSD external disk (256 GB), which can be used by the INESC TEC technical team to access the data while on board or when the ship is docked.
The Sensor Data, Raw Data, and Processed Data will be stored in the INESC TEC repository for research data (https://rdm.inesctec.pt). The size of the Sensor Data is expected to be ≾15 GB /day (< 6 TB in total). The size of the Raw Data is the same magnitude of the size of the Sensor Data. The size of the Pre-Processed Data is expected to be < 10 GB / day (< 4 TB in total).
The organization structure of the Sensor Data is summarized in Table 2. The structure of the Raw Data and of the Processed Data will be detailed in a further version of the DMP.
Security procedures will apply to all laptops and computers used during the project. All equipment will be password secure, all software licensed and frequently updated to assure up to date security. All technical issues related to the software will be controlled by each member of the project and in case of necessary support, they contact the IT staff of the responsible entity.

Data selection and preservation
The data acquired collected during the SAIL monitoring campaign are unique and thus very important for other researchers as well as for educational purposes. All the campaign data will be preserved to enable initially unforeseen uses of the data and to guarantee fully-documented and reproducible data from the project, ensuring the reuse of the data in multiple environmental domains and different applications.
The Sensor Data, Raw Data and Processed Data will be preserved at the INESC TEC institutional repository (https://rdm.inesctec.pt) for at least 5 years after the end of the project completion (at least until 2027) The procedures enabling the transformation of Sensor Data into Raw Data and of Raw Data into Pre-processed data will be preserved in the same repository in the form of computational (Jupyter) notebooks.
The project PI will be responsible for any action related to the long-time preservation in accordance with the repository guarantees. The possibility of long-term preservation of the same data in another repository will be analyzed during the project and the corresponding information added if necessary in a further version of the DMP.
Each preserved dataset of the project will have Digital Object Identifiers (DOIs) assigned by the INESC TEC research data management repository.

Data sharing
Access to the Sensor Data is restricted to the PI and co-workers designated by the PI.
The Raw Data will not be publicly available, but will be shared in case of a specific request to the PI after a 1-year embargo from the end of the project. The procedures enabling the transformation of Sensor Data into Raw Data will be also available upon request.
Raw data acquired by the radiation sensor SN005 will be shared with the University of Bristol, according to the Memorandum of Understanding of 2019/12/19.
The Processed Data will be publicly available at https://rdm.inesctec.pt after a 1-year embargo following the end of the project with license Share-Alike, which allows to (https://creativecommons.org/licenses/by-sa/4.0/) : Share -copy and redistribute the material in any medium or format; Adapt -remix, transform, and build upon the material for any purpose, even commercially.
Under the following terms: -Attribution -You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
-ShareAlike -If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original.
-No additional restrictions -You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits".

Resources
To deliver the data management plan it is necessary to have access to the INESC TEC research data repository (https://rdm.inesctec.pt/) for data deposit and access to the Project SAIL community on ZENODO (https://zenodo.org/communities/sail/?page=1&size=20).
Moreover, during the project the following assets will be used: -Hardware/devices: on-board main computer and data storage, laptop computers for data acquisition and processing in mission.
-Software: all software will be open source running in Linux operating system.
-Cloud services: https://drive.inesctec.pt, with access granted to a responsible from each participating institution according to point 2.6.

Ethics and legal compliance
The SAIL project responds to all existing requirements related to the Research Data Management and Protection of Personal Data. The diverse types of data that will be collected during the SAIL project are under no specific legal requirements and will be shared with others after 1-year embargo following the project conclusion. No dataset containing confidential information, or any ethical or legal issues will be deposited.
All data collected during the SAIL project is a direct result of the project and thus copyrights and IPR belongs to the SAIL members indicated on point 3. Data reusing should be made according to the established license along with the dataset citation.