380 likes | 536 Vues
Scientists See Promise in Deep-Learning Programs. Microsoft Seeks an Edge in Analyzing Big Data. The Age of Big Data. Why Hire a Lawyer? Computers Are Cheaper. Armies of Expensive Lawyers, Replaced by Cheaper Software. Google Offers Big-Data Analytics.
E N D
Scientists See Promise in Deep-Learning Programs Microsoft Seeks an Edge in Analyzing Big Data The Age of Big Data Why Hire a Lawyer? Computers Are Cheaper Armies of Expensive Lawyers, Replaced by Cheaper Software Google Offers Big-Data Analytics Jeff Hawkins Develops a Brainy Big Data Company How Big Data Became So Big
The total amount of digital data in the world is estimated toexceed 1.8 Zettabytes (1.8 TRILLION Gigabytes)) The digital universe is doubling every 2 years 85% of that data is owned or controlled by corporations at some point in its lifecycle Source: International Data Corporation (IDC) Study, 2012
Big Data is Here And it’s coming soon to a litigation near you… What’s changed?
Redefining scalability in eDiscovery. 1 X 1012 1000 1
Predictive Coding is a Form of Machine Learning What is Machine Learning?
It’s already a part of our lives. . . • voice recognition software, e.g., calling your bank or credit card company • handwriting, facial or fingerprint recognition • analyzing market trends and guiding investment decisions • making decisions on applications for credit or loans • modeling and predicting severe weather patterns • filtering spam in your email inbox • targeted marketing on the internet • robotics
KEY POINT: Predictive coding is just a part of a continuum of technology assisted review (TAR) methods that we are already very familiar with in searching and analyzing data. Concept Clustering • Concept Search • Predictive Coding Key Words Three supporting propositions: Each successive approach incorporates the preceding approaches. Each successive approach contains more supporting criteria. All are ultimately based on the concept of pattern matching.
Key Words = Simple pattern matching dog rhino wolf domestic External input: “wild,” “wolf,” “pet” wild ferret cat goldfish cow pet
01110111011010010110110001100100 (wild) 011001000110111101100111 (dog) 011100000110010101110100 (pet) Concept Clustering = Organizationbased on internal relationships tiger tiger rhino rhino dog cat wild ferret domesticated ferret dog wolf wolf pet wild goldfish goldfish cat cow domesticated pet cow
Concept Searching = Key words + Concept organization rhino zoo dog rhino wolf tiger wolf domestic dog cat External input: “zoo,” wild,” “domesticated” wild wild ferret 01111010011011110110111 (zoo) 01110111011010010110110001100100 (wild) 011001000110111101101101011001010111001101110100011010010110001101100001011101000110010101100100 (domesticated) domesticated pet ferret cat goldfish farm goldfish cow cow pet
Predictive Coding = document-level input + probabilistic modeling rhino zoo dog rhino wolf tiger wolf domestic dog cat external input: human-coded documents wild output: doc-level probability rankings wild ferret 01111010011011110110111 (zoo) 01110111011010010110110001100100 (wild) 011001000110111101101101011001010111001101110100011010010110001101100001011101000110010101100100 (domesticated) domesticated pet ferret cat goldfish farm goldfish cow cow pet
Infer Step 1. sample documents from entire set.
Step 2: attorney review of sample documents to create training and control set. In the European mind, wolves long stood as a symbol of baneful, uncontrollable nature. As far back as the time of Aesop in 500 BCE (Before the Common Era), wolves in literature are portrayed as wicked villains and long-fanged, terrible beasts. Before the Middle Ages, wolves were nearly always the greedy thief, criminal trickster, or cruel remorseless murderer. The wolf does not fare well in the European imagination. Can the wolf be domesticated? The domesticated dog isdescended from the wolf found in the wild.While some people have occasionally attemptedto raise wolves as pets, their2 ½ inch fangs and tendencyto eat nearby small animals such as catscan create socially awkward situations withneighbors. Responsive Not Responsive
011001000110111101100111 011001000110111101100111 011001000110111101100111 011001000110111101100111 011001000110111101100111 011001000110111101100111 011001000110111101100111 Step 3: create model from human coded training set (responsive and not responsive). In the European mind, wolves long stood as a symbol of baneful, uncontrollable nature. As far back as the time of Aesop in 500 BCE (Before the Common Era), wolves in literature are portrayed as wicked villains and long-fanged, terrible beasts. Before the Middle Ages, wolves were nearly always the greedy thief, criminal trickster, or cruel remorseless murderer. The wolf does not fare well in the European imagination. Can the wolf be domesticated? The domesticated dog isdescended from the wolf found in the wild.While some people haveoccasionally attemptedto raise wolves as pets, their2 ½ inch fangs and tendencyto eat nearby small animals such as catscan create socially awkward situations withneighbors. Can the wolf be domesticated? The domesticated dog isdescended from the wolf found in the wild.While some people have occasionally attemptedto raise wolves as pets, their2 ½ inch fangs and tendencyto eat nearby small animals such as catscan create socially awkward situations withneighbors. dances raise wolf wolves werewolf pet costner
Step 4: test model against sample (human coded) set. Wolves are sometimes kept as exotic pets, and in some rarer occasions, as working animals. Although closely related to dogs (which are believed to have split from wolves between 10,000 and 100,000 years ago), wolves do not show the same tractability as dogs in living alongside humans. Wolves also need much more space than dogs, about 10- 15 sq. miles. "Dances With Wolves" has the makings of a great work, one that recalls a variety of literary antecedents, everything from "Robinson Crusoe" and "Walden" to "Tarzan of the Apes." Michael Blake's screenplay touches both on man alone in nature and on the 19th-century white man's assuming his burden among the less privileged.
Apply model to remainder of documents that have not been reviewed Responsive Yes No Non-responsive
Step 5: Apply model to entire set and rank documents. 100 % 90% 80% 70% 60% 50% 40% 30% 20% 10% 0%
PREDICTIVE CODING AND BIG DATA NYLJ/Pangea3 Webinar April 15, 2013
OUTLINE • Mitigating Big Data in E-Discovery • Stakeholder Analysis • The New Reality of Predictive Coding • Long-Term Trends
Predictive Coding and Big Data Mitigating Big Data in e-discovery
BIG DATA IN E-DISCOVERY • Bigger haystack—more documents in general • Corporate data culture—more relevant documents • More sources—poses collection/preservation challenges
MITIGATING BIG DATA IN E-DISCOVERY • Some mitigating factors: • Principles of proportionality and cooperation • Information governance tools and document management • Technology-assisted review and predictive coding
Predictive Coding and Big Data Stakeholder analysis
PREDICTIVE CODING STAKEHOLDER ANALYSIS • Judges: generally receptive • Clients: cost efficiencies vs. risk management • Lawyers: new model, building expertise
Predictive Coding and Big Data The new reality of predictive coding
Predictive Coding and Big Data Long-term trends
LONG-TERM TRENDS • Over time, Big Data growth > predictive coding benefits • Some document-by-document human review necessary • Strategic nuances in a new discovery battleground
SEARCH (1) How do we search for discoverable ESI? • Manually? • With automated assistance? • Which is“better” and why? • M.R. Grossman & G.V. Cormack, “The Grossman-Cormack Glossary of Technology-Assisted Review,” 7 Fed. Cts. Law R. 1 (2013) • Maura R. Grossman & Gordon V. Cormack, “Technologically-Assisted Review in E-Discovery Can Be More Effective and More Efficient than Exhaustive Manual Review,” XVII Rich. J.L. & Tech. 11 (2011) (available at http://jolt.richmond.edu/v17i3/article11.pdf) • For a “shorter” discussion, see Efficient E-Discovery, ABA Journal 31 (Apr. 2012)
SEARCH (2) • Using search terms? How accurate are these? See In re National Ass’n of Music Merchants, Musical Instruments and Equipment Antitrust Litig., 2011 WL 6372826 (S.D. Ca. Dec. 19, 2011)
SEARCH (3) Automated review or “predictive coding” as an alternative to the use of search terms. For decisions which address automated review, see: • EORHB, Inc. v. HOA Holdings LLC, C.A. No. 7409 (Del. Ct. Ch. Oct. 15, 2012) • In re Actos (Pioglitazone) Prod. Liability Litig., MDL No. 6:11-md-2299 (W.D. La. July 27, 2012) • Da Silva Moore v. PublicisGroupe SA, 2012 U.S. Dist. LEXIS 23350 (S.D.N.Y. Feb. 24), aff’d, 11 Civ. 1279 (ALC (AJP) (S.D.N.Y. Apr. 26, 2012) • Global Aerospace Inc. v. Landow Aviation, L.P., Consol. Case No. CL 61040 (VA Cir. Ct. Apr. 23, 2012)
SEARCH (4) WHAT LESSONS CAN BE DRAWN FROM THE DECISIONS? • Judge approved automated search at a “threshold” level. “Results” may be subject to challenge and later rulings. • Threshold superiority of automated vs. manual review recognized given volume of ESI and attorney review costs. • Large volumes of ESI in issue. • Party seeking to do automated review must offer “transparency of process” or something close to it. • “Reasonableness” of methodology is key. • Speculation by the opposing party is insufficient to defeat threshold approval.
SEARCH (5) LET’S TAKE A DEEP BREATH AND RECAP WHERE WE ARE TODAY, VENDOR HYPE NOTWITHSTANDING: • We have yet to see a judicial analysis of process and results in a contested matter. • Safe to assume that the proponent of a process will bear the burden of proof (whatever that burden might be). • Safe to assume at least some transparency of processmay/will be expected. • If “reasonableness” is standard, how reasonable must the results be? Is “precision” of 80% enough? 90%? Remember, there are no agreed-on standards.
INTERLUDE Assume a party makes production of ESI based on search terms proposed by an adversary. Assume further that the adversary suspects “something” is missing. Is suspicion enough to warrant direct access to the party’s databases by a consultant retained by the adversary? If not, what proofs should be required? • Will an attorney’s certification or affidavit suffice? • Will/should the attorney become a witness? • Will experts be needed? Note, with regard to proofs, S2 Automation LLC v. Micron Technology, Inc., No. 11-0884 (D.N.M. Aug. 9, 2012), where the court, relying on Rule 26(g)(1), required a party to disclose its search methodology.
INTERLUDE A collision between search and ethics? • Assume a party’s attorney knows that search terms proposed by adversary counsel, if applied to the party’s ESI, will not lead to the production of relevant (perhaps highly relevant) ESI. • Absent a lack of candor to adversary counsel or the court under RPC 3.4 (which implies if not require,s some affirmative statement), does not RPC 1.6 require the party’s attorney to remain silent? • What if the “nonproduction” becomes learned later? If nothing else, will the party’s attorney suffer bad “PR” if nothing else? • If the party’s attorney wants to advise the adversary, should the attorney secure her client’s informed consent? What if the client says, “no?” (with thanks to the Hon. John M. Facciola)
INTERLUDE AS WE THINK ABOUT SEARCH, THINK ABOUT THE ETHICS ISSUES THAT USE OF A NONPARTY VENDOR MAY LEAD TO!