Processing Safety Data Sheets (SDS) with AI

3. March 2025

Extracting critical data from Safety Data Sheets (SDS) is vital for companies because it ensures they have accurate, up-to-date information on chemical hazards and safety protocols to protect employees and maintain regulatory compliance. SDS documents come in various formats, are often lengthy, and may be scanned or rotated—making manual extraction error-prone and costly. By automating this extraction process, organizations can streamline risk management and integrate essential safety data into their broader operational and product lifecycle management systems.

Our solution, cbs AID (Advanced Integration of Documents), is specifically tailored for SDS processing. By integrating advanced AI techniques in two distinct phases—Document Reading and Document Understanding— cbs AID achieves an extraction accuracy of over 99% while reducing both processing time and costs.

Document Reading

In the initial Document Reading phase, each SDS is standardized to prepare it for detailed analysis. cbs AID automatically corrects rotated pages, ensuring that all content is properly aligned. For scanned documents lacking embedded text, cbs AID then uses optical character recognition to convert images into machine-readable text. Additionally, small and cost-effective Large Language Models (LLMs) are leveraged to classify important pages. This smart pre-classification allows it to complete the extraction process without unnecessarily processing every page.

Document Understanding

Building on the standardized output from Document Reading, cbs AID emulates human comprehension to accurately extract critical data. It captures substance classifications, detailed ingredient lists, and the classifications associated with GHS pictograms. Complex tables within SDS are analyzed using multi-modal Large Language Models, which discern intricate relationships between data elements.
A notable challenge is the extraction of pictograms from scanned documents. To tackle this, cbs AID hides all text in the document, applies a lightweight form detection to isolate graphical elements, and then uses an image classification model to accurately identify and categorize each pictogram.

System Integration

cbs AID is built on SAP BTP, leveraging SAP’s AI foundation to harness advanced large language models (LLMs). This integration with SAP Business AI not only boosts data extraction accuracy but also enables smooth, secure integration within the SAP ecosystem

Conclusion

cbs AID transforms SDS processing by combining robust Document Reading with tailored Document Understanding. With over 99% extraction accuracy, this solution marks a new era in automated document extraction. cbs AID’s smart application of advanced AI technology outperforms traditional methods in quality, speed, and maintainability.
Author and your contact person
Jannis Conen
Consultant
Related articles
Rethinking Performance Management:
A roadmap to sustainable business success
2. April 2025
Read More
Enabling the company strategy with an optimized Target Operating Model and S/4HANA transformation
24. March 2025
Read More
AI-Powered LTSD Processing in SAP GTS
20. March 2025
Read More