Calendar Icon - Dark X Webflow Template
August 23, 2023
Clock Icon - Dark X Webflow Template
 min read

Capsa Pro: An Automated Uncertainty Detection Solution

Capsa Pro: Empowering Language Models with Uncertainty Awareness for Enhanced Robustness and Trustworthy AI

Capsa Pro: An Automated Uncertainty Detection Solution

One of the most pressing issues with Language Models (LMs) is their lack of awareness of what they don't know and their inability to recognize when they produce incorrect outputs. That is, LMs currently lack the capability to provide users with a value indicating the reliability of the text they generated. This lack of transparency leads to undetected errors and undermines the trust we have in those models. 

We at Themis AI have developed Capsa Pro, an automated and quick solution to address this concern. Capsa is a Python library designed to infuse machine learning models with the ability to detect different levels of uncertainty. This product promises to revolutionize the way companies assess the safety and robustness of their models, thus advancing the goal of developing and adopting trustworthy artificial intelligence. Other groups are also working on similar issues but their solutions are mostly ad hoc,  i.e., the tedious re-engineering of models by hand, often in such a way that is not compatible with their previous training progress. Others provide interfaces for the design and deployment of experiments that focus largely on model input and require engineers to create their own evaluation metrics. Those solutions are not scalable and not compatible with the training and deployment methods of many LMs.

Capsa Pro offers solutions to address the major sources of failure for LMs. First, LMs may show ‘representation bias’ if they are trained on data sets that overrepresent or underrepresent specific categories. Capsa Pro is an automated method to calculate and mitigate this type of bias. Second, Capsa Pro allows LMs to spot gaps in the training data. Additionally, LMs don’t track label noise in training sets and are thus not aware of aleatoric uncertainty. Finally, models wrapped with Capsa Pro provide epistemic uncertainty values in real-time with each output. The wrapped models created with Capsa Pro can be trained to  effectively handle this lack of awareness. 

More specifically, Capsa Pro can be adopted for the following:

  • Automatically cleanup training data by using aleatoric uncertainty to find noisy labels, inconsistencies, errors, etc.
  • Detect what types of text are over and underrepresented in your training data (aka representation bias) and compensate for it during training.
  • Sort and schedule the annotation of your data so that underrepresented text and text that will improve your model is incorporated first. 
  • Report epistemic uncertainty along with the output of your LM. More specifically, Capsa Pro provides an uncertainty value for each token (piece of word) being generated by the language model. It also provides uncertainty values for every token in the vocabulary it considers (i.e., each logit). 
  • Use uncertainty values to (i) prevent unreliable output from being generated automatically (ii) ask for human intervention when needed (iii) improve performance during downstream tasks like question answering by only using output with high certainty (iv) understand which inputs and generated tokens cause the most uncertainty.   

By integrating Capsa Pro with LMs, the following functionalities become available:

  • Wrapping an existing, pre-trained model, and thus getting an uncertainty value for each logit as the model generates text. With Capsa Pro it doesn’t matter how big the language model is (e.g., billions of parameters): Capsa Pro is able to wrap it automatically with one line of code. Wrapping takes between 1 and 3 seconds depending on the number of parameters.
  • Wrapping an existing, pre-trained model, and then fine-tuning it by training the uncertainty-aware version of the model some more. 
  • Wrapping the LM and then training from scratch. Our tests show that if we train with the wrapped model, this will account for uncertainty and become more robust.  
  • Analyzing over/underrepresented features in the model’s training set and any missing information with our aleatoric and vacuity wrappers.

At Themis AI, our core focus lies in providing cutting-edge tools rather than creating and distributing LMs ourselves. For instance, we do not use Capsa Pro to develop the top performing models. In contrast, our primary aim is to showcase the potential of our tools, enabling experts to utilize them effectively to create robust and secure AI. 

Capsa has been tested on a number of different AI systems. We previously shared some results for a text encoder (CLIP) as part of our Stable Diffusion communications which analyzes the prompts describing the generated image. It is noteworthy that in Stable Diffusion we were able to use Capsa Pro to improve LM prompts. Now we are going the other way too as Capsa Pro can be used to improve on the text that is generated by a LM.

Some examples of models we have tested Capsa Pro on are: LLaMA, LLaMA2, BERT, MPT, GPT-2, RoBERTa, Falcon, GPT-NeoX. We have also been able to use Capsa Pro with NVIDIA’s NeMo LM framework.  This is possible because Capsa Pro is model-agnostic and will work with any PyTorch or Tensorflow model, regardless of its size, architecture, or complexity. What’s more, during the development of Capsa Pro, we have conducted internal testing of LMs as well. The reason behind this is that LMs are complex in terms of software, making them an excellent benchmark for assessing software compatibility and functionality.

We have released a Private Beta of our Capsa Pro and companies in the Private Beta are applying Capsa Pro to several different LMs. Other companies will be able to join our waitlist very soon. Finally, though Capsa Pro is for now only accessible by participating in the Private Beta, a wider release will be announced in the future, thus expanding the number of companies that can make their models more robust and secure.

We will share more news about our latest tools’ results and implementations in a follow-up blog, so stay tuned! 

Latest articles

Browse all