Understanding the Role of Artificial Intelligence in Contract Analysis

•

Summary•4 min read

We explore artificial intelligence in contract analysis, focusing on text-based AI with deep learning and semi-supervised/supervised learning.

Methods for training machine learning

Before we can understand artificial intelligence (AI) in contract analysis, we need to first understand AI itself. Computer artificial intelligence is just software, very advanced software, but software nonetheless.

There are two major types of AI:

Artificial General Intelligence (AGI), which is defined as the ability for a system to understand and learn in a similar way as humans. This type of AI is many years away from reality.
Narrow AI, which is designed for a specific use case like Google Translate, image processing, self-driving cars and SIRI question and answer systems. The Docusign Insight uses this type of AI.

There’s a broad array of methods and functions currently being included within the banner of AI. For this blog, we're focusing on text-based AI, specifically machine learning, with deep learning and semi-supervised/supervised learning. Both utilize algorithms that are capable of learning through examples and are the fastest growing aspects in Narrow AI.

Methods for training machine learning

AI’s learning capability is the ability for it to take examples of something, including language, pictures, data, etc. and store a numerical representation of them for comparison against new examples in the future. The primary methods for training machine learning are supervised, semi-supervised and unsupervised:

Supervised learning is where humans teach it by providing many labeled examples, and then further improves the system via active learning or feedback loops
Semi-Supervised learning is defined as pre-teaching the system from unlabeled data and then providing smaller numbers of labeled data to fine-tune the system’s predictions
Unsupervised learning is where the system is provided only data and learns to represent the information via features it selects; this is where data only and no human intervention drives the predictions

Supervised learning requires high-quality training data, domain expertise and potentially data scientists to train a model used by the software. The resulting model can be a black box that’s wholly dependent on the quantity and quality of the training, or it can be influenced via feature selection and generation from the data scientists. Accurate models are dependent on relevant data and human review and can be expensive and time-consuming to create.

Unsupervised and/or deep learning is similar to pre-training in humans. It provides answers with minimal examples, instead relying on pre-training at a more granular level, including syntax, grammar, punctuation, patterns and more. Pre-training can be achieved by using huge datasets of publicly available information like Wikipedia and online books. Deep learning models can be the most efficient and most accurate models; however, they require lots of computational power to create the initial model. In terms of efficiency, the deep learning methods also allow for a form of transfer learning, where pre-training on one task can be applied to other similar tasks. This saves training times and data amounts, as well as increasing the speed of training within a new domain. This type of training also allows for such things as language translation and “train once, use many” methods.

While both methods excel at certain tasks, they aren’t well suited for others. For this reason, other technologies should be used in conjunction. Two such technologies are rules and semi-supervised:

Rules are explicit well-defined statements. These rules excel at finding regular patterns like phone numbers, zip codes and contract IDs. They struggle with linguistic variances and spelling mistakes. However, rules are extremely precise; they’re either true (matching) or false (non-matching).
Semi-supervised relies on word similarity, frequency and distribution to generate results. They’re highly effective at finding standardized language commonly found throughout the dataset via clustering around a single example or nearest neighbor unsupervised, but fixed, clusters.

As we can see, each technology has its own strengths and weaknesses. An optimized solution will select and apply the right combination of technology for each individual task resulting in the most efficient and precise results possible.

Docusign Insight delivers a proven combination of AI technologies—natural language processing, machine learning, latent semantic indexing, OCR and rules-based logic—to help organizations manage and analyze their agreements. Insight brings together agreements from across your enterprise in virtually any location and format. Advanced search and filtering lets you zero in on the ones that matter. Automated extraction policies identify the clauses and terms you need to review.

Learn more about Docusign contract analytics solutions that leverage AI.