Open source has been an important part of the technology landscape for over half a century, although the term as we know it didn’t come into popular use until the 90s. Today there are 100 million developers building open source products, creating software and hardware that ranges from personal hobby projects to Linux, the operating system for billions of devices around the world and the foundation on which the internet is built.
More recently, machine learning and AI technology have started to adopt the open source mantra as they grow more popular. Companies such as Hugging Face, Red Hat, Confluent, and Stability AI are built on open source, building and releasing models and other software publicly for anyone to use, while also generating revenue through providing support. This is big business: Red Hat maintains and supports a Linux operating system, and was acquired by IBM for $34 billion, while Confluent IPO’d at a valuation of $11 billion. Openness is also paramount: Hugging Face builds and maintains their Hub where thousands of people and organisations release models and data to democratise access to AI. Stability AI has released multiple free state of the art models for image and text generation.
At Open Climate Fix, we have also adopted the open source credo, allowing others to collaborate with us to build the best forecasting service possible. We are a small team, and by doing the work openly, we can more easily work with other experts in the field to show what AI driven forecasting can do. Open source also enables others to build with and on our models, and code for use cases we hadn’t even thought about before. This can really maximise the impact we can have with our climate work. We also work to implement 3rd party state of the art models if they haven’t been made publicly available, so that all of us can benefit from the rapidly advancing technology of artificial intelligence.
One of the most powerful parts of open source is the global scope of the collaboration you can achieve. Already our work is being adapted to try to predict where to seed clouds in Dubai, forecast rainfall in Sweden, Germany, Taiwan, and India, as well as predict storm evolution. It’s also being used as a base for many bachelor and master dissertation projects from leading universities in Europe, such as Imperial College and UCL. From all these students, users, and developers, we’re receiving constant valuable payback in the form of bug fixes, enhancements and great technical discussions on what ideas may work, and which might not.
We also freely publish a large amount of data, in commonly used formats. This data can be used for general machine learning and geospatial applications. Most of our open data we have converted from hard to use or proprietary formats to more open formats (such as Zarr). By making it available in a standardised format, third parties can easily use the data for whatever they need.
As part of this open initiative, we have released data on PV generation from 25,000 sites across the UK for 2020 and 2021. This is actually one of the largest public releases of PV generation data ever made in the country. We have also collated and released on the Hugging Face Hub terabytes of numerical weather data covering Europe and globally, and rainfall radar data for multiple countries, and nearly 15 years of 5-minutely European geostationary satellite imagery. This helps others to reproduce our work or do their own forecasting work. We have further created a collection of related datasets for bench-marking different approaches to forecasting PV generation.
By ensuring that all our software is open source, we establish ready made OCF built pipelines and Python libraries for using these datasets for machine learning and forecasting.
Because of all this work, we have received a lot of interest in the datasets we have or host. Our PV dataset has been downloaded hundreds of times, being used by students and interns, as well as other practitioners in the field. One of our weather datasets was recently used to train models for down-scaling global forecasts to higher-resolution as part of Code for Earth 2023.
Our rehosting of a large rainfall radar dataset from a Google Deepmind generative forecasting paper has been accessed thousands of times, as people try to use the dataset without having to pay a download fee from the original source. Our German weather service forecast dataset is also growing, as we publicly archive operational forecasts that would otherwise disappear after 24 hours. These datasets are extremely useful for many different types of forecasting and nowcasting applications, as they avoid the hassle of having to tussle with the difficult native format of this imagery. Users also appreciate the fact that we update the data every week with new material whenever possible.
There is arguably one disadvantage with adopting an open source model: by definition, open source is not secret. This can make adoption more difficult for companies that want to keep technologies, techniques, or data private to maintain a competitive advantage, or meet licensing limitations.
At Open Climate Fix we don’t see this as a disadvantage - we actually want people to use our code, as that way more people will be using advanced forecasting approaches, reducing carbon emissions around the world. For companies such as Red Hat, they have a revenue model based on providing services for open source software Linux, which is very successful for them. However for other firms, if this is a concern, it could be mitigated through open sourcing only part of a company’s intellectual property, rather than everything, to still enable the sharing of useful tools and libraries without risking the company’s core intellectual property. Another option is to integrate any bug fixes or features added to open source projects that are being used at the company, and contribute them back to the community.
Overall, OCF has been really pleased with the experience of building and sharing our code, data, and models in open source. We have seen engagement with our software and datasets increase to over a hundred contributors from all over the world, with many more users accessing our data and building amazing things on top of the models and data we have shared.
Going forward, we fully expect to keep seeing positive results for modelling the electricity grid and the climate; especially as our code matures and becomes more powerful, and as our datasets become more complete. With the climate crisis, speed and impact are key, and being able to work with people around the world to improve our products allows us to move faster and have a bigger impact than we ever could on our own. The powerful benefit of collectively sharing the lessons we learn, as well as the opportunity to inspire more researchers and companies to join us in the mammoth task to tackle climate change, makes it all worthwhile.