High-quality labeled data has increasingly become a bottleneck restricting the development of AI, especially in the field of fusion: the US Department of Energy reports that researchers related to fusion AI often spend 70% of their time on data collation. In response to this pain point, Startorus Fusion and Tsinghua University recently jointly launched toklabel, an open source fusion data labeling platform based on Label Studio, hoping to significantly improve the labeling efficiency of fusion data, promote the standardization and collaboration of fusion data labeling, accelerate the application of AI in fusion research to realize fusion energy faster.
The toklabel complies with the Apache 2.0 protocol, and you are welcome to use it freely. You are also welcome to give feedback or contribute code.
High Quality Labeled Data Accelerates AI-Enabled Fusion
Fusion energy development is ushering in the era of Al technology-driven change, but the lack of high-quality labeling data has been a key bottleneck restricting the performance improvement of various AI models. Although there are many labeling tools in the fields of text, image and video, most of the data labeling work in the fusion field is still in the “slash-and-burn” stage (a very fisrt beginning stage). This is because the general labeling tools often have a single function and limited scalability, and do not provide a convenient and unified management scheme for multi-modal data (such as time series, images, etc.) with “shot” as the unit, which is unique to fusion.
In response to the urgent needs of the fusion field, toklabel has solved the following problems:
Multi-modal support: one-dimensional (time series data) and two-dimensional (high-speed camera, etc.) data labeling are provided to meet the diverse needs of fusion diagnosis and perception data.
Efficient storage design: combined with PostgreSQL and Redis, the structured storage and fast retrieval of labeling results are realized to facilitate the subsequent model training.
Intelligent auxiliary labeling: functions such as parametric labeling and Al pre-labeling (such as Time-Series Transformer) are integrated to significantly improve the labeling.
Practical demonstration of toklabel
Time series labeling: discharge characteristic time item
1. Input the shot number to automatically generate data and import the following three time series data into label studio.
2. The model automatically predicts three characteristic times (breakdown, rupture and end) to provide a reference for manual labeling.
3. Automatically train the model according to the results of manual labeling.
Image Labeling: Based Plasma Configuration Labeling Based on Visible Light
The outermost closed magnetic surface is labeled according to the plasma configuration parameters.
Building an ecosystem together: Calling for industry collaboration and improving data standards
The open source of toklabel will effectively reduce the time for fusion researchers to collate data and improve the efficiency of data labeling. Fusion researchers around the world can use this tool free of charge and support team collaboration.
Open source is the beginning, not the end. As an initial version, toklabel still needs to be iterated. Startorus Fusion looks forward to working with its peers to build a data labeling and management system adapted to the fusion field and explore the infinite possibilities of AI and fusion integration.
Visit toklabel now: [GitHub link]https://github.com/STARTORUS/tok-label