Abstract

An issue that can limit the long-term value of information published in peer-reviewed engineering publications is the inability of readers to readily access data contained within a publication. This paper discusses experiences in changing the expectations for data sharing by authors in a large, disciplinary engineering journal, the AIChE Journal, in ways that seek to balance the burdens on authors and the benefits to readers.

Keywords

Publication ethics

Data reporting

Introduction

Reliability and reproducibility of data are critical to the value of quantitative engineering research (National Academies of Sciences 2019). In chemical engineering, for example, the concept of process design is rooted in the idea that physical properties of the chemicals and materials being processed are well-defined, repeatable, and can be systematically collected. From this admittedly abstract perspective, one of the key contributions of a high-quality peer-reviewed engineering publication is that it provides data that advances a field in some way. The accessibility of data has always been important, but this situation has intensified with the widespread use of data-driven methods based on machine learning (ML). Careful curation of data to train ML models can be a significant bottleneck in using these methods, and improved accessibility of data could address this situation.

For scientific data to be of the greatest use to future researchers it should have several characteristics: (i) it should be clear to an expert reader how the data was generated and could be reproduced with independent calculations or experiments, (ii) the data should be presented within the context of prior work, and (iii) the data should be readily available to allow future use, refinement, and comparison. These characteristics are codified more formally in the FAIR Guiding Principles, which are intended to improve the Findability, Accessibility, Interoperability, and Reuse of digital assets (https://go-fair.org/fair-principles). AIChE, the world's largest organization for chemical engineering professionals has defined handling of data in professional publications as an ethical responsibility, noting that “authors are obligated to present an accurate account of research” and “a primary research report should contain sufficient detail and references to permit peers to repeat the work” (https://www.aiche.org/resources/publications/ethical-guidelines).

One way to improve the overall quality of data in scientific publications is to develop better habits in individual researchers. Resources with this goal in mind are available that focus on steps that are easy to take (Sholl, 2019; O'Connell et al., 2022). The highly dispersed nature of research training and the pressures associated with career success in competitive fields, however, mean that appealing to the good nature of individuals can only be partially successful. As the primary avenue through which research progress are reported, peer-reviewed journals bear a significant responsibility in consciously or implicitly defining community standards for data quality.

Scientific publishing has changed dramatically in the last 25 years. Around the turn of the century it was common for technical journals to have strict length limits for authors and there was essentially no way for authors to share data other than within a printed manuscript. Technical journals now allow authors to make routine use of Supplementary Information that is available to reviewers and ultimately to readers. Options to readily share data and related materials via means that are independent of a specific journal publisher (e.g., GitHub) are also now common. It is important to note, however, that the content of supplementary files may not be as carefully scrutinized by reviewers and held to the same standards as the primary manuscript, even though these files are available to reviewers. Related issues associated with potential overuse of supplementary materials have been discussed by (Pop and Salzberg, 2015)

This paper describes the recent evolution of data sharing expectations in the AIChE Journal from the author's perspective as the Editor-in-Chief of this journal. The journal publishes around 300 papers per year and the editorial office handles more than 1000 submissions annually. As with any high-quality technical journal, all manuscripts are subject to peer review before publication, and the journal's reviewers have high standards regarding what constitutes a substantial contribution to the discipline's literature. The journal deliberately includes papers from a broad spectrum of subdisciplines in chemical engineering, including process systems engineering, catalysis and reaction engineering, particle technology, soft matter, bioengineering, and fluid mechanics. This disciplinary diversity means that any blanket policies regarding data sharing needed to reflect differing norms and needs among researchers. The size and subject matter diversity of the journal indicate that any successes that have been accomplished in terms of improving data quality could also be achieved with many other similar journals.

AIChE journal's 2023 data sharing requirements

The recent updates in the journal's data sharing requirements were motivated by asking a simple question: how easy was it for readers to obtain quantitative numerical values for data that was reported in published papers in the journal? Here, the term “data” was intended to include results from both experimental and computational studies. Almost all published chemical engineering papers include one or more figures plotting data of one kind or another. In principle, readers can reconstruct this data by digitizing figures, but this approach is tedious and tends to introduce uncertainties (Cai et al., 2021). Prior to 2023, the AIChE Journal required authors to note in their manuscript that data supporting their work was available from the corresponding author upon request. It is easy to imagine how requesting data via this mechanism might fail to yield a timely and positive outcome, even among authors with the best of intentions. A recent preprint described a study of published psychology studies that attempted to obtain data from 52 papers published in prominent disciplinary journals in the last 5 years (Hussey, 2023). Among the papers that included a statement of “data upon request”, only 17 % of the authors actually shared data when requested. Interestingly, this low level of success was similar to that found almost 20 years ago in the same field (Wicherts et al., 2006). The success rate in obtaining data would presumably be even lower if papers that were older than 5 years were examined, as authors move institutions, retire, or otherwise become unavailable. Any reader who is over the age of 40 may wish to perform the thought experiment of being asked to provide data from a paper they published 15 or more years ago.

For years, authors with the AIChE Journal have been able to include Supplementary Material that is available online to readers. To understand if these materials were being used to make numerical values of the main data from a paper accessible, all papers published in the September 2022 issue of the journal were examined. This issue is representative of the journal, including 33 original research articles and 1 review article. 28 of the 34 papers included Supplementary Material, but only one paper tabulated the data from the main figures in the Supplementary Material. The Supplementary Materials varied from relatively short documents to one example that included more than 1500 pages of content for a paper exploring a machine learning application. One paper also included a link to a GitHub repository that is not hosted by the journal in addition to Supplementary Material that is hosted by the journal. This diversity of content illustrates some of the points raised by (Pop and Salzberg, 2015)

The inescapable conclusion from the discussion above is that, for the great majority of papers published in the AIChE Journal prior to 2023, the numerical data underlying the figures associated with the main outcomes of a paper are not readily available to readers. It seems highly likely that this situation also applies to other similar chemical engineering and chemistry journals. This observation raised the question of what, if anything, could be done to improve this situation. There are a small number of journals that already impose stringent data quality standards upon authors. The Journal of Chemical and Engineering Data, for example, has strict guidelines on tabulating and reporting data. The Journal of Open Source Software requires authors to make their software available and moreover expects reviewers to run and test the reported software. For a journal such as AIChE Journal that includes contributions from a diverse range of sub-fields, it was necessary to develop an approach that did not place undue burdens on authors, reviewers, and editors.

Motivated by the issues outlined above, the Editorial Board of the AIChE Journaladopted a new set of data sharing requirements in March 2023 (Sholl, 2023). These requirements have two primary elements. First, authors are now required to use Supplementary Materials to tabulate the numerical data from figures in their manuscript, along with any other information the authors deem relevant. Second, authors are required to include a section in their manuscript titled “Data Availability and Reproducibility Statement” that summarizes what data from the manuscript's figures are tabulated in the Supplementary Materials and also describe steps that were taken in the work to ensure reproducibility of data. The requirement to explicitly include this section in the text of the manuscript is intended to remind reviewers that comments on the quality and availability of data from the work are encouraged as part of the manuscript's review. As noted in the full description of these requirements, “manuscripts that do not include these two elements will be rejected without review” (Sholl, 2023).

These data sharing requirements were designed to be flexible enough to handle the full range of papers published by the journal. They note that some figures may not contain numerical data (e.g., microscopy images, process schematics) and provide flexibility in reporting extensive data sets via files included in the journal's Supplementary Materials or in persistent third-party online repositories. Because the requirement of a “Data Availability and Reproducibility Statement” will be new to most authors, an example of such a statement for a hypothetical paper was provided (Sholl, 2023). This example was constructed to show that a well-constructed statement should include information on where readers can access numerical data from the paper, details of software packages used in calculations (including access to input files), definitions on how error bars were assigned to reported data, and tests that were performed to validate calculations or measurements prior to generating the new results shown in the paper. The latter point is relevant because systematic studies of materials synthesis in materials chemistry have hinted that it is common for researchers to begin their work by repeating experiments reported from prior studies but that the results of these repeat experiments are often not reported (Agrawal et al., 2019). Since replication of previous studies contains used information, this lack of reporting means that valuable opportunities to establish the reproducibility of published data are missed (Park et al., 2017).

The discussion above has considered the roles of authors and editors in the scientific publishing process. Another vital group of contributors to high quality journals are reviewers, who are financially uncompensated and anonymous. The increased use of Supplementary Materials by authors comes with an increase in demands for action by reviewers. Although journals can encourage reviewers to comment on Supplementary Materials, there is little that can be done to ensure that reviewers take time to examine these materials in detail. The data sharing standards described above were developed in part to achieve a compromise in which more information is shared by authors in their publications but the net demands on reviewer time and attention were not changed in a substantial way.

Discussion

One aim of the data sharing requirements outlined above was to not create substantial burdens for authors preparing and submitting manuscripts. In the period since the requirements were introduced the journal's Editorial office has handled hundreds of new submissions. A very significant majority of these submissions were initially returned to the authors because of non-compliance with the new standards, but almost all of these submissions were updated by authors and resubmitted within a few days. This observation suggests that the burdens imposed on authors (and the Editorial office) by these requirements are indeed minor. A small fraction of manuscripts have been rejected without further review because authors were unwilling to accommodate the data sharing requirements after one or more cycles of correspondence with the Editorial office. One of the qualities in a group of authors that leads to high quality publications is a willingness to engage with editors and reviewers in an open way with the goal of improving a manuscript. It seems likely that an unwillingness to modify a manuscript to meet data sharing requirements is correlated with a similar attitude towards possible future changes that would be requested by reviewers. If this surmise is correct, early rejection of these manuscripts without use of reviewers’ time is a net positive for a journal.

From late 2023 the new “Data Availability and Reproducibility Statement” section has become a standard part of all papers in the AIChE Journal, so interested readers will be able to judge for themselves what level of information is being included by authors in their papers. In some cases, this section contains relatively minimal information about how to access the numerical data associated with figures in the paper (Liu et al., 2023). Examples are also available, however, where the authors additionally provide information on calibration and testing of experimental methods (Li et al., 2023) or details about replicate measurements and the reporting of error bars associated with these measurements. (Xiang et al., 2023)

How will we judge whether the changes described above have been successful? In the short term, it is hoped that the AIChE Journal’s data sharing requirements will bolster the journal's reputation among readers and authors as a source for high quality research with long term value. One benefit this can bring to the journal is to make it easier to find reviewers who will agree to review requests, since overburdened reviewers can hardly be expected to be enthusiastic about devoting their time to reading a manuscript that they suspect will not be valuable to their field. In the longer term, it may be hoped that the publishers and editors of other journals within the discipline adopt similar data sharing standards and use AIChE Journal’s experience as a catalyst to consider what steps they can take to enhance the quality of data in their publications. If appropriately elevated standards of data sharing are adopted by the majority of journals in a discipline then these actions would define new norms of behavior for researchers in ways that cannot be accomplished by individual research institutions or appeals to individual researchers.

Although transparent data sharing improves the quality of published engineering research, this approach of course cannot resolve every possible complication associated with the net value of published papers to scientific progress. For example, experimental data could be strongly affected by the presence of a contaminant whose existence was unknown at the time of the study. Methods of data analysis, even for well-known experimental methods, might be influenced by choices made by users that are less uniform than might be assumed (Osterrieth et al., 2022). Many papers use empirical data as a springboard to mechanistic explanations (“Quantity X increased as temperature increased, which occurs because…”), but physically correct data does not imply that the resulting explanations are correct. Although data sharing cannot resolve every possible issue of this kind, it seems reasonable to expect that transparent data sharing can allow the nominally self-correcting nature of scientific research to function more effectively over time.

Concluding remarks

The experiences discussed in this paper suggest several next steps for different kinds of readers. Editors and editorial advisory boards of other technical journals should discuss how these experiences relate to their own journal and its future operations. The exercise of assessing the data availability from a representative recent journal issue and considering the topic of data quality from an ethical perspective (National Academies of Sciences 2023) is highly recommended for these groups. Leaders of large research centers could consider adopting data sharing requirements similar to those discussed above for all members of their center. Academic departments or similar organization that provide formal training to students in research methods could consider including discussion of these issues in this training (for example, by including this paper as reading material to prompt discussion). Finally, individual investigators may decide to adopt minimum data sharing standards for all publications from their research group, regardless of the venue in which they are publishing their work. There is no reason to expect that all of these categories of readers will decide that the specific approach to data sharing described in this paper is optimal, indeed, it is hoped that improved approaches will be introduced and widely adapted by others in the future.

Declaration of competing interest

The author has no conflicts of interest to declare.

Acknowledgments

The author gratefully acknowledges the work of the AIChE Journal's Associate Editors and Contributing Editors in shaping the data sharing requirements discussed in this paper, and especially Hilary Price, who has corresponded with hundreds of authors in helping them prepare manuscripts that meet these requirements.

Submitted to Special Issue on Ethics and Responsible Technology, Digital Chemical Engineering.

✰✰

Notice of Copyright: This manuscript has been authored by UT-Battelle, LLC, under contract DE-AC05–00OR22725 with the US Department of Energy (DOE). The publisher acknowledges the US government license to provide public access under the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan).

© 2024 The Author. Published by Elsevier Ltd on behalf of Institution of Chemical Engineers (IChemE).