Capsa Pro: An Automated Uncertainty Detection Solution
Capsa Pro: Empowering Language Models with Uncertainty Awareness for Enhanced Robustness and Trustworthy AI
Capsa Pro: Empowering Language Models with Uncertainty Awareness for Enhanced Robustness and Trustworthy AI
One of the most pressing issues with Language Models (LMs) is their lack of awareness of what they don't know and their inability to recognize when they produce incorrect outputs. That is, LMs currently lack the capability to provide users with a value indicating the reliability of the text they generated. This lack of transparency leads to undetected errors and undermines the trust we have in those models.
We at Themis AI have developed Capsa Pro, an automated and quick solution to address this concern. Capsa is a Python library designed to infuse machine learning models with the ability to detect different levels of uncertainty. This product promises to revolutionize the way companies assess the safety and robustness of their models, thus advancing the goal of developing and adopting trustworthy artificial intelligence. Other groups are also working on similar issues but their solutions are mostly ad hoc, i.e., the tedious re-engineering of models by hand, often in such a way that is not compatible with their previous training progress. Others provide interfaces for the design and deployment of experiments that focus largely on model input and require engineers to create their own evaluation metrics. Those solutions are not scalable and not compatible with the training and deployment methods of many LMs.
Capsa Pro offers solutions to address the major sources of failure for LMs. First, LMs may show ‘representation bias’ if they are trained on data sets that overrepresent or underrepresent specific categories. Capsa Pro is an automated method to calculate and mitigate this type of bias. Second, Capsa Pro allows LMs to spot gaps in the training data. Additionally, LMs don’t track label noise in training sets and are thus not aware of aleatoric uncertainty. Finally, models wrapped with Capsa Pro provide epistemic uncertainty values in real-time with each output. The wrapped models created with Capsa Pro can be trained to effectively handle this lack of awareness.
More specifically, Capsa Pro can be adopted for the following:
By integrating Capsa Pro with LMs, the following functionalities become available:
At Themis AI, our core focus lies in providing cutting-edge tools rather than creating and distributing LMs ourselves. For instance, we do not use Capsa Pro to develop the top performing models. In contrast, our primary aim is to showcase the potential of our tools, enabling experts to utilize them effectively to create robust and secure AI.
Capsa has been tested on a number of different AI systems. We previously shared some results for a text encoder (CLIP) as part of our Stable Diffusion communications which analyzes the prompts describing the generated image. It is noteworthy that in Stable Diffusion we were able to use Capsa Pro to improve LM prompts. Now we are going the other way too as Capsa Pro can be used to improve on the text that is generated by a LM.
Some examples of models we have tested Capsa Pro on are: LLaMA, LLaMA2, BERT, MPT, GPT-2, RoBERTa, Falcon, GPT-NeoX. We have also been able to use Capsa Pro with NVIDIA’s NeMo LM framework. This is possible because Capsa Pro is model-agnostic and will work with any PyTorch or Tensorflow model, regardless of its size, architecture, or complexity. What’s more, during the development of Capsa Pro, we have conducted internal testing of LMs as well. The reason behind this is that LMs are complex in terms of software, making them an excellent benchmark for assessing software compatibility and functionality.
We have released a Private Beta of our Capsa Pro and companies in the Private Beta are applying Capsa Pro to several different LMs. Other companies will be able to join our waitlist very soon. Finally, though Capsa Pro is for now only accessible by participating in the Private Beta, a wider release will be announced in the future, thus expanding the number of companies that can make their models more robust and secure.
We will share more news about our latest tools’ results and implementations in a follow-up blog, so stay tuned!