Building AI Products with Quality and Trust

The last few years have seen an increasing number of new trends and concerns around AI and machine learning. Research communities, policymakers, media and other stakeholders are debating about risk, bias and discrimination within the AI field.

We’ve been inspired by what we’ve seen to explore how we, at Docusign, can proactively work on new processes and tools to build AI software with quality and trust.

Diversity in data science is key to overcoming bias. Because data scientists are critical to how data is used and models are designed, the team of researchers involved in AI production must be diverse. With backgrounds ranging from mathematics and machine learning to linguistics and law, we bring unique perspectives and experiences to design Docusign Insight and our other AI products. Working in such a diverse environment results in more effective collaboration and productivity.

Data privacy, quality and diversity are key factors needed to build trust in AI. In our most recent initiative, we implemented a set of best practices to manage data and model quality within Docusign Insight.

Data privacy

Regulations such as GDPR and the newly effective California Consumer Privacy Act (CCPA) have driven us to improve data security, storage and processing. In order to promote and encourage best practices for handling personal data and confidential information, we’ve put an enormous amount of effort into the creation of a data lake and data management infrastructure, including data security labs, guidelines, a data factory, etc.

The data lake that we created within Docusign Insight is a centralized repository for raw data, boasting high security and well-defined backup policies. The data factory governs the conversion of raw data to clean and indexed files ready for analysis. All these systems go hand-in-hand to create streamlined processes that prioritize data privacy and security.

Data quality

The golden rule of machine learning is that the model should be trained on high-quality data. This means that we must have a very good understanding of the data we work with. For this reason, we involve legal experts and linguists at every stage of data preparation and review. Technical expertise and user-friendly tools are an integral part of the process. The machine learning team has built and implemented several pipelines, annotation tools and guidelines that make the process of working with data easy, timely and scalable.

We believe that building trust in AI products is a continuous process. Recently, we established a model and data management framework aimed at maintaining and improving both data and model quality. This framework requires collaboration between feature engineers, QA engineers and our in-house legal team. To that end, our AI-powered products are rigorously tested, and quality is assured before release and continuously monitored thereafter.

Data diversity

Diversity within the research group creating an AI product is essential but not enough to ensure data diversity. A good machine learning system must be trained with large sets of diverse data. This approach enables us to have a range of algorithms and datasets customized to solve specific customer problems with increased accuracy. 

Having diversity in models isn’t an easy task. It requires hundreds of experiments in addition to time and resources. We’ve built lab pipelines that enable us to minimize several manual steps, as well as scale and track the process. Every team member runs dozens of automated tests daily, which are used to make decisions for future models and determine inclusion within the platform.

By being innovative and visionary, the Docusign machine learning team not only delivers a valuable product for customers but contributes to building trust within our AI solution. In order to achieve this, we work continuously to build new tools and frameworks that improve data diversity, quality, and privacy.

Author’s Note: This blog was co-authored by Emanuella Wallin, Qing Zheng and Alexandra Kukresh, data scientists on Docusign’s machine learning team