Artificial Intelligence Transparency: Open or Closed to the Public Benefit

As I look back on my childhood, I remember the excitement of my parents bringing home our first domestic computer, which was connected to our TV screen. This sparked my interest in technology, and I spent countless hours “playing scientist,” teaching myself to code and even publishing simple programs in computer science magazines. Those early experiences not only deepened my passion for technology but also instilled in me a belief in the importance of open science.

The Principles of Open Science

Open science refers to the practice of making all stages of the scientific process transparent and accessible to others. This includes publishing research articles with their data, detailed methods, theoretical and practical bases, experiments, and any necessary information or tools to replicate the research. The objectives of open science are to allow reproducibility, promote collaboration, and facilitate building upon previous knowledge to advance our understanding. This is essential for scientific research to be credible, ethical, and accessible, enabling it to be reviewed, validated, and developed further.

What Happens to AI?

In the field of artificial intelligence (AI), open science is the only way to guarantee reproducibility and transparency, and thus, its public progress and use consistent with collaborative, cumulative principles for the benefit of humanity. Most researchers in computer science believe in publishing their advances following these principles, with open source being a crucial element, although not the only one, for any computer tool aiming to encourage scientific advance. Specialists have created non-profit organizations to define what research and development in their field consist of, such as the Open Source Initiative (OSI) in 1998, which provides the most accepted international standard for open source definition.

For a program to be considered open source, it’s not enough to provide access to the compiled program; the entire source code must be available. The source code, written in a high-level language legible by humans, must be accessible, allowing anyone to read, understand, modify, and redistribute it under the same terms for all uses, including commercial ones.

The Case of Technology Companies

Many technology companies create wealth and benefit society but invest in research only if they believe they will recover their investment. It’s common for private tech companies to leverage public research (funded by taxpayers) to develop products from which they derive significant benefits. The case of Apple’s iPhone, as described by economist Mariana Mazzucato, is a paradigmatic example. With companies dedicated to AI, this reality is even more pronounced. While it’s natural for them to base their products on outsourced ideas and research, many of the most advanced AI models are essentially impenetrable black boxes, with their internal logic, operation, and equity not explained or guaranteed, and their source code not available for analysis.

Depseek and the Issue of Open Source

Some recent AI models, like Depseek, attempt to overcome competition by making their compilation code available. However, this does not qualify as open source and does not contribute to scientific research advances. Depseek, despite being announced as “Open Source,” does not allow access to its source code, only to the binary (compiled) version, which cannot be read, understood, or modified. This limitation means no one can improve the program, and it can only be used as a client of the company, not as a tool for scientific research.

The Example of Rosetta and Alphafold 3

The story of David Baker, Demis Hassabis, and John M. Jumper, who received the 2024 Chemistry Nobel Prize for predicting protein structure, highlights the power of open science. The Rosetta software, born in the late 20th century as a small project in David Baker’s laboratory at the University of Washington, was distributed with its source code in a high-level language, allowing specialists to read, understand, and modify it. Google’s Deepmind company developed Alphafold and Alphafold 2, powerful statistical data analyses through AI, based on these open ideas and published protein databases. However, when Deepmind presented Alphafold 3, it surprisingly maintained that the software code would not be available, despite the editorial policy of the magazine Nature, which emphasizes making material, data, code, and associated protocols promptly available to readers without undue restrictions.

Alphafold is Not Open Source Either

The article on Alphafold 3 did not comply with scientific community norms for being usable, scalable, and transparent, prompting over a thousand members of the scientific community to sign a letter to Nature. Months later, Deepmind made the code available under a restrictive Creative Commons license, which does not meet the Open Source Initiative’s definition of open source. Deepmind does not publish the weights (the result of training its neural network) of the model, and their use for commercial activities, including training similar biomolecular models, is explicitly prohibited. This approach attempts to balance scientific and commercial interests but clearly does not represent an open science process, which is crucial for the advancement of knowledge belonging to all humanity.

Recent Articles

Related News

Leave A Reply

Please enter your comment!
Please enter your name here