More on AI Hallucinations…

The mainstream adoption of AI continues to reveal the precarious balance between the benefits and the pitfalls.

Yes, AI tools can reduce the time it takes to research information, to draft documents and to complete repetitive tasks.

But, AI is not so good at navigating subtle nuances, interpreting specific context or understanding satire or irony. In short, AI cannot “read the room” based on a few prompts and a collection of databases.

And then there is the issue of copyright licensing and other IP rights associated with the original content that large language models are trained on.

One of the biggest challenges to AI’s credibility is the frequent generation of “hallucinations” – false or misleading results that can populate even the most benign of search queries. I have commented previously on whether these errors are deliberate mistakes, an attempt at risk limitation (disclaimers), or a way of training AI tools on human users. (“Spot the deliberate mistake!) Or a get-out clause if we are stupid enough to rely on a dodgy AI summary!

With the proliferation of AI-generated results (“overviews”) in basic search queries, there is a tendency for AI tools to conflate or synthesize multiple sources and perspectives into a single “true” definition – often without authority or verified citations.

A recent example was a senior criminal barrister in Australia who submitted fake case citations and imaginary speeches in support of a client’s case.

Leaving aside the blatant dereliction of professional standards and the lapse in duty of care towards a client, this example of AI hallucinations within the context of legal proceedings is remarkable on a number of levels.

First, legal documents (statutes, law reports, secondary legislation, precedents, pleadings, contracts, witness statements, court transcripts, etc.) are highly structured and very specific as to their formal citations. (Having obtained an LLB degree, served as a paralegal for 5 years, and worked in legal publishing for more than 10 years, I am very aware of the risks of an incorrect citation or use of an inappropriate decision in support of a legal argument!!!)

Second, the legal profession has traditionally been at the forefront in the adoption and implementation of new technology. Whether this is the early use of on-line searches for case reports, database creation for managing document precedents, the use of practice and case management software, and the development of decision-trees to evaluate the potential success of client pleadings, lawyers have been at the vanguard of these innovations.

Third, a simple document review process (akin to a spell-check) should have exposed the erroneous case citations. The failure to do so reveals a level laziness or disregard that in another profession (e.g., medical, electrical, engineering) could give rise to a claim for negligence. (There are several established resources in this field, so this apparent omission or oversight is frankly embarrassing: https://libraryguides.griffith.edu.au/Law/case-citators, https://guides.sl.nsw.gov.au/case_law/case-citators, https://deakin.libguides.com/case-law/case-citators)

In short, as we continue to rely on AI tools, unless we apply due diligence to these applications or remain vigilant to their fallibility, we use them at our peril.

 

AI hallucinations and the law

Several years ago, I blogged about the role of technology within the legal profession. One development I noted was the nascent use of AI to help test the merits of a case before it goes to trial, and to assess the likelihood of winning. Not only might this prevent potentially frivolous matters coming to trial, it would also reduce court time and legal costs.

More recently, there has been some caution (if not out and out scepticism) about the efficacy of using AI in support of legal research and case preparation. This current debate has been triggered by an academic paper from Stanford University that compared leading legal research tools (that claim to have been “enhanced” by AI) and ChatGPT. The results were sobering, with a staggering number of apparent “hallucinations” being generated, even by the specialist legal research tools. AI hallucinations are not unique to legal research tools; nor to the AI tools and the Large Language Model (LLMs) they are trained on, as Stanford has previously reported. While the academic paper is awaiting formal publication, there has been some to-and-fro between the research authors and at least one of the named legal tools. This latter rebuttal rightly points out that any AI tool (especially a legal research and professional practice platform) has to be fit for purpose, and trained on appropriate data.

Aside from the Stanford research, some lawyers have been found to have relied upon AI tools such as ChatGPT and Google Bard to draft their submissions, only to discover that the results have cited non-existent precedents and cases – including in at least one high-profile prosecution. The latest research suggests that not only do AI tools “imagine” fictitious case reports, they can also fail to spot “bad” law (e.g., cases that have been overturned, or laws that have been repealed), offer inappropriate advice, or provide inaccurate or incorrect legal interpretation.

What if AI hallucinations resulted in the generation of invidious content about a living person – which in many circumstances, would be deemed libel or slander? If a series of AI prompts give rise to libelous content, who would be held responsible? Can AI itself be sued for libel? (Of course, under common law, it is impossible to libel the dead, as only a living person can sue for libel.)

I found an interesting discussion of this topic here, which concludes that while AI tools such as ChatGPT may appear to have some degree of autonomy (depending on their programming and training), they certainly don’t have true agency and their output in itself cannot be regarded in the same way as other forms of speech or text when it comes to legal liabilities or protections. The article identified three groups of actors who might be deemed responsible for AI results: AI software developers (companies like OpenAI), content hosts (such as search engines), and publishers (authors, journalists, news networks). It concluded that of the three, publishers, authors and journalists face the most responsibility and accountability for their content, even if they claimed “AI said this was true”.

Interestingly, the above discussion referenced news from early 2023, that a mayor in Australia was planning to sue OpenAI (the owners of ChatGPT) for defamation unless they corrected the record about false claims made about him. Thankfully, OpenAI appear to have heeded of the letter of concern, and the mayor has since dropped his case (or, the false claim was simply over-written by a subsequent version of ChatGPT). However, the original Reuters link, above, which I sourced for this blog makes no mention of the subsequent discontinuation, either as a footnote or update – which just goes to show how complex it is to correct the record, since the reference to his initial claim is still valid (it happened), even though it did not proceed (he chose not to pursue it). Even actual criminal convictions can be deemed “spent” after a given period of time, such that they no longer appear on an individual’s criminal record. Whereas, someone found not guilty of a crime (or in the mayor’s case, falsely labelled with a conviction) cannot guarantee that references to the alleged events will be expunged from the internet, even with the evolution of the “right to be forgotten“.

Perhaps we’ll need to train AI tools to retrospectively correct or delete any false information about us; although conversely, AI is accelerating the proliferation of fake content – benign, humourous or malicious – thus setting the scene for the next blog in this series.

Next week: AI and Deep (and not so deep…) Fakes

 

 

 

 

Startmate Virtual Demo Day

Despite being under lock-down, the current cohort of Australian & NZ startups participating in the Startmate accelerator programme managed to deliver their Demo Day presentations on-line, including a virtual “after party” where founders were available for Q&A.

Given the large number of startups, and the fact that several were very early stage businesses, I have grouped them into loose clusters, with just a brief summary of each project. More info can be found at the links in the names:

Real Estate

Landlord Studio – tax & book-keeping solution for landlords. I tend to think the need for very niche accounting solutions is either overstated, or existing software platforms like Xero will come up with a plug-in of their own. Also, tax rules vary greatly by jurisdiction, so scaling internationally can be a challenge.

Passingdoor – an online estate agency trying to remove some of the costs and hassles of selling your home. Rather than listing with a traditional estate agent, Passingdoor will find buyers on your behalf (via a matching process?). I assume that prospective buyers will come from: people in the process of selling their own home; buyer advocates; or recent mortgage applicants – which is why the founders will need relationships with traditional agencies (referrals), mortgage brokers (cross-selling) and real estate ad platforms (leads). But given that sellers on Passingdoor only pay a 0.5% commission once an offer becomes unconditional, I’m not sure how the cashflow model will work.

MedTech

Mass Dynamics – scaling spectrometry for improved patient care. From what I understand, Mass Dynamics is using cloud-based architecture to “lease out” spectrometry capacity on demand, and to accelerate sample analysis.

LaserTrade – a marketplace for second-hand medical laser equipment. Rather than seeing re-usable equipment go to scrap, the founder saw an opportunity to create a marketplace for unwanted items. All items are tested beforehand. Has the potential to extend to other types of equipment, assuming the certification process is valid?

Health & Wellbeing

Body Guide – semi-customised rehab exercises to suit your symptoms. With superb timing as we emerge from months of inaction (or poor posture) while working from home during lock-down, this service is an aid to physical recovery, once your condition has been formally diagnosed. I’d probably want to check in with my GP or physio that the programme was right for me, though.

Sonnar – offers a library of audio content for people with reading disabilities. This is a subscription service, which claims to be cheaper than other audio-book services, and with a broader type of content (periodicals as well as books). I was unclear whether Sonnar is cheaper because they don’t need to pay publisher or author royalties (as it is deemed a charity?), or because they only license out-of-copyright content.

Good Thnx – promises to be “the world’s best gifting and recognition tool, with impact”. Aiming to provide a service for businesses, individuals and partner charities, Good Thnx is still in development. But as part of the Startmate Demo Day, gave attendees an opportunity to allocate a small financial donation to a selection of charities.

Food & Agriculture

Cass Materials – With the search for sustainable alternatives to meat, Cass Materials is developing a cheap and edible high fibre cell scaffold on which to grow cultivated meat – otherwise known as bacterial nanocellulose (BNC). I’m not opposed to the idea of “meat substitutes”, but I’m generally wary of what are sometimes called “fake meats” – vegetable proteins that are so processed so as to resemble animal flesh. I’d rather go vegetarian (I’m not sure I can go full vegan, because if we weren’t supposed to eat honey and yoghurt, why do they taste so good, especially together?).

Digital Agriculture Services – An AgTech platform is using AI-powered applications for developing a range of data-driven solutions across rural, agricultural and climate applications. The potential to bring more business insights and practical analysis to farming and allied industries is of huge potential in the Australian economy.

Heaps Normal – This company has taken a novel approach to producing non-alcoholic beer. Rather than chemical extraction or other processing to remove alcohol from ordinary beer, Heaps Normal has managed to brew beer without alcohol content.

Energy

Gridcognition – Using digital twin mapping of buildings, structures and locations to optimise the planning and operation of distributed energy projects. Given the value of lower transmission and storage costs, as well as more efficient energy generation, Gridcognition is aiming to bring their “decarbonised, decentralised, digitised” solutions to a range of industry participants.

ZeroJet – Helping the marine industry transition to sustainable energy solutions with the development of electric propulsion systems. In particular, targeting small inshore craft which are ideal boats for this type of engine.

Logistics & Analytics

PyperVision – This startup has developed a system for fog dispersal at airports. By aiming for zero fog delays, PyperVision is helping to reduce disruption in the travel and logistics sectors.

Arlula – An API service to stream satellite images from space. As we know, satellite imagery is an important input to modelling, planning and analysis. Arlula also offers access to historic and archive content.

Database CI – A platform for in-house software developers to access the right sort of enterprise data for real-life testing purposes. For example, realistic and appropriate “dummy” data that does not compromise privacy, confidentiality or other obligations.

Law on Earth – On-line access to self-serve legal documents, forms and precedents, plus lower-cost legal advice. With a mission to “empower the public to safely manage their own legal needs”, Law on Earth already has a tie-up with Thomson Reuters, one of the largest legal information providers in the world.

Next week: Are we there yet?

Melbourne Legal Hackers Meetup

Given my past legal training and experience, and my ongoing engagement with technology such as Blockchain, I try to keep up with what is going on in the legal profession, and its use and adoption of tech. But is it LawTech, LegalTech, or LegTech? Whatever, the recent Legal Hackers Meetup in Melbourne offered some definitions, as well as a few insights on current developments and trends.

The first speaker, Eric Chin from Alpha Creates, defined it as “tech arbitrage in the delivery of legal services”. He referred to Stanford Law School’s CodeX Techindex which has identified nine categories of legal technology services, and is maintaining a directory of companies active in each of those sectors.

According to Eric, recent research suggests that on average law firms have a low spend on legal technology and workflow tools. But typically, 9% of corporate legal services budgets are being allocated to “New Law” service providers. Separately, there are a growing number of LegalTech hubs and accelerators.

Meanwhile, the Big Four accounting firms are hiring more lawyers, and building our their legal operations, and investing in legal tech and New Law (which is defined as “using labour arbitrage in the delivery of legal services”).

Key areas of focus for most firms are Practice Management, Legal Document Automation,
Legal Operations and e-Discovery.

Joel Seignior, Legal Counsel on the West Gate Tunnel Project, made passing mention of Robert J Gordon’s economic thesis in “The Rise and Fall of American Growth”, which at its heart postulates that despite all appearances to the contrary, the many recent innovations we have seen in IT have not actually delivered on their promises. He also referred to
Michael Mullany’s 8 Lessons from 16 Years of the Gartner Hype Cycle, which the author considers to be past its use-by date. Which, when taken together, suggest that the promise of LegalTech is somewhat over-rated.

Nevertheless, businesses such as LawGeex are working in the legal AI landscape and other disciplines to deliver efficiency gains and value-added solutions for matter management, e-billing, and contract automation. Overall, UX/UI has finally caught up with technology like document automation and expert systems.

Finally, Caitlin Garner, Head of Innovation at Allens spoke about her firm’s experience in developing a Litigation Innovation Program, underpinned by a philosophy of “client first, not tech first”. One outcome is REDDA, a real estate due diligence app, that combines contract analytics, knowledge automation, reporting and collaboration. Using off-the shelf solutions such as Kira’s Machine Learning, Neota’s Expert System and HighQ, the Allens team have developed a transferable template model. Using a “Return & Earn” case study, the firm has enabled the on-boarding of multiple suppliers into a streamlined contract management, signature and execution solution.

Next week: Notes from New York Blockchain Week