Benefits of Predictive Coding in Document Discovery

The information technology is best suited for larger complex cases that are fairly document-intensive, especially as a subset of the documents will need to be reviewed by lawyers’ team who are a subject matter expert in a process called “discovery.”

Document review is the most labor-intensive and expensive stage of the litigation/arbitration process and e-discovery investigative process. Legal professionals examine documents with rules of relevance & hearsay, responsiveness, legal privilege, and confidentiality. Each page of data in data collection is reviewed and analysed by lawyers to advise what documents must be withheld from production to opposing parties. Compliance professionals’ understanding of the electronic discovery reference model is of paramount importance. This article analyses how technology with use of predictive coding is applied to maximise our collection, retrieval, processing, analysis and production of documents.

The Practice Direction SL1.2 (“PD”) provides a Pilot Scheme for Discovery and Provision of Electronically Stored Documents in Cases in the Commercial List. The PD contemplates methods of technology assisted review (“TAR”) and other automated searches, Indeed, it states a framework for “reasonable, proportionate and economical discovery” of documents for court’s use. It also encourages the parties to reach agreement on the discovery of electronic documents proportionately and cost-effectively disclosure methods.

IT and law of Document Review

“Documents” depending on context, include sensitive legal information recorded, held or stored by other means than paper, as is recognised in the Civil Procedure Rules. It extends to electronic documents, including emails and text messages. Document reviewers possess legal technical backgrounds examining documents relevant to pending litigation and regulatory investigations. They may also summarise, tab, highlight, chart, and collect certain documents or information gleaned from the documents as well as create privilege and redaction logs.

Document reviewers are mostly lawyers, paralegals or consultants in litigation support. It is because the information is subject to attorney-client privilege, confidentiality rules and privacy laws. If not, they cannot be withheld from production. For example, if a document concerns a trade secret in manufacturing process. The legal team is not obligated to turn it over to opposing parties. Sometimes, for personal data privacy concerns, the reviewers will advise to redact certain portions of that document to protect the client’s personal privacy confidential information.

Since document reviewers are often monitored for speed and efficiency, those with a strong work ethic and willingness to meet deadlines to keep clients’ costs down. With advanced ICT, documents are “coded and culled” into a litigation database so as to narrow the number of documents for review. With tight deadlines, over millions of documents such as memos, letters, e-mails, PowerPoint presentations, spreadsheets, and other e-documents are examined in order to response to a discovery requests (such as an interrogatory or request for inspection/production) raised in internal investigation and litigation process.

Different search methods in e-discovery: Why not keyword search? Interpretation of predictive coding

Different search methods will make the e-disclosure vary. Firstly, e- search by Keywords are inherently biased thereby naturally excluding a proportion of relevant documents or necessitating the review of increasing volumes of irrelevant documents. On the other hand, search by use of predictive coding exercise typically begins with a senior lawyer training an algorithm by reviewing a ‘seed set’ of example documents.The algorithm analyses the characteristics of these documents, learns from case law, the lawyers’ decision making and thereafter seeks to identify similar documents and rank them by their likelihood of relevance. The court will order disclosure of the documents even if they are in the physical possession of a third party

Newsday, the courts have no restriction for use of particular search methods. But it seems that predictive coding may prevail in long term. Predictive coding is studied in domain of computer science with use of AI innovation. During the process, the most highly ranked documents can then be prioritised for review. This review continues until the system fails to return any further relevant documents or when the proportion of relevant documents becomes so low that continuing the review becomes disproportionate.

In Brown v BCA Trading Ltd [2016] EWHC 1464, Mr. Registrar Jones approved the use of predictive coding to identify potentially relevant documents. The courts ruled that the lower cost of predictive coding (related to the volume of documents) compared to keyword and manual searches as a fact to justify its use. In the case of Pyrrho Investments Ltd & anr v MWB Property Ltd & ors [2016] EWHC 256 (Ch) the court had stated that “best practice would be for a single, senior lawyer who has mastered the issues in the case to consider the whole [teaching] sample”.

In Triumph Controls UK Ltd & Ors v Primus International Holding Co & Ors [2018] EWHC 165 (TCC), Mr. Justice Coulson considered that (a) Predictive Coding may not have been “educated” as well as it might have been, as it only depends on input from a human expert that understands the factual and legal issues involved. As such, Predictive Coding will rarely categorise an entire data set perfectly first-time round. In the circumstances, lawyers and clients still need to review and check the programme’s output to ensure that it has correctly captured all the documents that it needs to, and that it has not captured all the documents that it should not, particularly documents that contain either privileged or commercially sensitive information.

The U.S. courts tend to take more liberal approach for the parties’ choice of e-search methods. As demonstrated in Hyles v. New York City, 2016 WL 4077114, at *2-3 (Mag. S.D.N.Y. Aug. 1, 2016) and re Viagra (Sildenafil Citrate) Products Liability Litigation, 2016 WL 7336411, at *1 (Mag. N.D. Cal. Oct. 14, 2016), it is not up to the court or the requesting party to force the responding party to use the TAR. It is up to the requesting party “to use the search method of its choice.” (emphasis added)

In practice, it is hard to compel use of predicative coding. Research studies tend to suggest that a human review may not be that consistent when determining relevance with use of perceive coding. One should assess the following pros and cons of Predictive coding that can be encountered:

In summary, reliability and efficiency increase when using predictive coding in conjunction with traditional e-discovery methods to review enormous Document caches. More commercial and professional firms bring such technology in-house can cut client costs by eliminating the need to hire an outside vendors and reducing the amount of time they outsource to complete the E-discovery process.

Predictive coding is also beneficial because it works well for handling different forms of information like photos, videos, emails, and other correspondences in social media (eg WeChat and WhatsApps)—the types of unstructured data becoming more prevalent in e-discovery.

According to a recent survey from the Cowen Group, law firms have had exposure to “using more than one predictive coding tool/search method which law firms can competently handle data processing in-house and bypass vendors.” Predictive coding is one significant e-discovery technology innovation with increasing use by legal personnel across the world. It is submitted that under big data era, our legal and compliance professionals must be trained and be familiar to choose the best technology on the case at hand to ensure that the case can be managed in a cost-effective and efficient way. 



1. Early idenficiation of key issues with supporting documents - senior document reviewers or lawyers’ engagement at an early stage in the process.

Questionable ability to deal with multiple issues or degrees of relevance on very complex litigation or arbitration cases

2. Improved Accuracy - quality checking measure.
Results are better than keyword search and manual review

Training by conservatives/under-confident or technophobic users may undermine integrity of the process

3. Increase efficiency and consistency.
Fewer irrelevant documents reviewed with use of ICT

Questionable ability to cope with the evolution of relevance throughout a review

4. Higher proportion of relevant documents identified within a short time span

Hard to eliminate the problem of a rogue reviewer - experienced lawyers’ review and advice on documentation are encouraged.

5. Lower costs - lawyers in M-generation become more technologically savvy on cost control

Questionable ability to deal with documents containing little or no text

6. Faster access to the most relevant documents - ability to prioritise core issues in case bundles

Questionable ability to cope with foreign language documents




Solicitor, FCIS, FCS, LLM, FHKIArb, Zhang Lawyers

Zoe Chan So Yuen is a highly skilled solicitor, arbitrator and chartered secretary with an extensive exposure in private practice and in-house corporate, lecturing and university administration expertise as well as competition compliance training. She has since 2010 taught, researched and published on Legal and Professional Aspects of Digital Forensics as part of her work in MSc Electronic Security and Digital Forensics at Middlesex University London and HKU SPACE. The programme is accredited by the British Computer Society for Chartered IT Professional (CITP) status.