Beyond Keywords: Improving How We Explore Scientific Knowledge
Accessing scientific knowledge efficiently remains a major challenge as research output continues to grow. In this blog, BRT PhD student Antoine Bercy from the Czech University of Life Sciences Prague (CZU) introduces a scientific database he developed as part of his doctoral research. Designed to improve how literature can be explored and filtered, the database demonstrates how structured data, clearer classification, and interactive tools can help researchers navigate complex scientific information more effectively, particularly in the field of anaerobic digestion and biogas technologies. Enjoy reading!
Recently, I had the honour to represent the Department of Food and BioResource Technology at Faculty Day at the Faculty of Tropical AgriSciences (FTZ) at the Czech Unversity of Life Sciences, Prague (CZU). The presentation outlined the progress of my PhD thesis, which focuses on modernising scientific publication, particularly in the areas of data collection and sharing, under the title: “Implementation of an Innovative Approach for Data Search and Reporting with Special Focus on Anaerobic Digestion.”
This work is essential if we are to develop the technologies needed to address the climate urgency we are facing. Currently, access to scientific material is unevenly distributed worldwide. Both publishing and accessing articles are expensive and complex, reducing researchers’ ability to share their work and to find the crucial information they need. This occurs within a system already under strain due to the exponential growth of scientific output, pushing an ageing publication model to its limits. Commercial and competitive pressures have also led to abuses, contributing to what is now referred to as the “reproducibility crisis.” This crisis stems from a broad acknowledgement among scientists that there is a problem, confirmed by a large-scale study showing that 70% of scientists were unable to reproduce a peer’s experiment. Such findings significantly weaken the credibility of science, particularly at a time when scepticism is increasing. To address this, the OSIRIS project seeks to identify the underlying issues, drive a paradigm shift, and propose solutions. My work aligns with this objective by tackling the confusion surrounding data reporting and retrieval.
The goal of my work is to identify the precise challenges related to data collection and sharing and to provide a foundation for standardisation and simplification. At present, scientific information is mainly accessed through expensive search engines that rely heavily on keyword-based searches. This makes it difficult to comprehensively identify existing knowledge and introduces uncertainty into results. For example, when searching for species related to horses for comparison, there is no obvious way to retrieve terms such as “odd-toe ungulate,” which are rarely used in searches. Similarly, in biogas research, how can one find an exhaustive list of reactor types to determine which best fits a specific application? Searching for “reactor” alone yields hundreds of articles, resulting in an inefficient and time-consuming process. We remain limited by an ageing system based on PDF documents and primitive search tools. This is further complicated by inconsistent terminology, where the same concept may be described using different terms (e.g., climate change, global warming, greenhouse gases). As a result, important work can be overlooked, recent advancements are difficult to identify, and reviews are rarely exhaustive. The lack of standardisation in terminology and methodology also restricts effective comparison, discussion, and progress.
The aim is therefore to attempt a truly exhaustive review of biogas technology in order to draw lessons about the current state of scientific publication. This is followed by the development of guidelines to improve publication standards and the proposal of modernisation options supported by real-world testing.
During the first phase, reviewing the literature and identifying problems, the PRISMA protocol was adapted using a database screening and sorting approach. This involved analysing thousands of articles, making the process extremely time-consuming, tedious, and therefore prone to error. Despite efforts to remain as broad as possible, the review was still not exhaustive, and significant amounts of relevant information were not identified.
A major limitation lies in the reliance on keywords, which are used inconsistently across the literature. For instance, studies may refer to specific reactor types such as CSTR or UASB without ever using the broader term “reactor.” Terminology can also be confusing: identical terms may describe different concepts, while different terms may refer to the same approach—sometimes intentionally used as attention-grabbing keywords. Extracting key information at scale is tedious and complex, as articles do not always clearly detail their methodology, and relevant data may be buried within the text. Classification remains subjective, requiring the creation of categories that struggle to accommodate edge cases. These challenges are compounded by the limitations of traditional paper-based publications, where large and complex datasets are poorly represented. Based on these observations, the use of broader and more consistent terminology is recommended, along with the structured presentation of methods and results in table formats to improve clarity, accuracy, and accessibility for both reviewers and readers.
Beyond these limitations, classical scientific papers are static and cannot be updated over time, restricting collaboration and the progressive refinement of knowledge. Consequently, publications are often already partially outdated by the time they become available.
Nevertheless, this process enabled the demonstration of modern solutions for presenting and exploring data. Power BI was used as a proof of concept to show that large and complex datasets can be represented in a user-friendly and interactive way, although a dedicated open-source solution would ultimately be required. In parallel, the development of a structured article database illustrates how traditional keyword-based search engines could be replaced by clear classification and filtering systems, allowing researchers to access relevant literature without relying on uncertain keyword selection. To demonstrate this concept, we developed a scientific database, where the articles from this research can be easily filtered according to their respective research fields.
The next step is to work on filtering automation and data gathering, enabling researchers to find exactly what they need instantly, without long and tedious review processes. In the future, we hope this concept can expand to support the creation of “living papers” that can be updated over time as knowledge advances, promoting cooperation and enabling the correction of missing information. This would help reduce the heavy burden of writing and publishing pressure placed on researchers. Such progress will only be possible through strong cooperation. Ultimately, modernising scientific publication is not only a technical challenge, but a necessary evolution to ensure that knowledge remains accessible, reliable, and capable of responding effectively to the urgent global challenges we face.
Explore the scientific database: HERE
For more details on BRT activities, subscribe to our newsletter or follow us on social media for regular updates and highlights!