Wednesday, 28 June 2023

Dust uses large language models on internal data to improve team productivity

Dust is a new AI startup based in France that is working on improving team productivity by breaking down internal silos, surfacing important knowledge and providing tools to build custom internal apps. At its core, Dust is using large language models (LLMs) on internal company data to give new superpowers to team members.

The company was co-founded by Gabriel Hubert and Stanislas Polu, who have known each other for more than a decade. Their first startup was called Totems and was acquired by Stripe in 2015. After that, they both spent a few years working for Stripe before parting ways.

Stanislas Polu joined OpenAI, where he spent three years working on LLMs’ reasoning capabilities while Gabriel Hubert became the head of product at Alan.

They teamed up once again to create Dust. Unlike many AI startups, Dust isn’t focused on creating new large language models. Instead, the company wants to build applications on top of LLMs developed by OpenAI, Cohere, AI21, etc.

The team first worked on a platform that can be used to design and deploy large language model apps. It has then focused its efforts on one use case in particular — centralizing and indexing internal data so that it can be used by LLMs.

From an internal ChatGPT to next-gen software

There are a handful of connectors that constantly fetch internal data from Notion, Slack, GitHub and Google Drive. This data is then indexed and can be used for semantic search queries. When a user wants to do something with a Dust-powered app, Dust will find the relevant internal data, use it as the context of an LLM and return an answer.

For example, let’s say you just joined a company and you’re working on a project that was started a while back. If your company fosters communication transparency, you will want to find information in existing internal data. But the internal knowledge base might not be up to date. Or it might be hard to find the reason why something is done this way, as it’s been discussed in an archived Slack channel.

Dust isn’t just a better internal search tool, as it doesn’t just return search results. It can find information across multiple data sources and format answers in a way that is much more useful to you. It can be used as a sort of internal ChatGPT, but it could also be used as the basis of new internal tools.

“We’re convinced that natural language interface is going to disrupt software,” Gabriel Hubert told me. “In five years’ time, it would be disappointing if you still have to go and click on edit, settings, preferences, to decide that your software should behave differently. We see a lot more of our software adapting to your individual needs, because that’s the way you are, but also because that’s the way your team is — because that’s the way your company is.”

The company is working with design partners on several ways to implement and package the Dust platform. “We think there are a lot of different products that can be created in this area of enterprise data, knowledge workers and models that could be used to support them,” Polu told me.

It’s still early days for Dust, but the startup is exploring an interesting problem. There are many challenges ahead when it comes to data retention, hallucination and all of the issues that come with LLMs. Maybe hallucination will become less of an issue as LLMs evolve. Maybe Dust will end up creating its own LLM for data privacy reasons.

Dust has raised $5.5 million (€5 million) in a seed round led by Sequoia with XYZ, GG1, Seedcamp, Connect, Motier Ventures, Tiny Supercomputer, and AI Grant. Several business angels also participated, such as Olivier Pomel from Datadog, Julien Codorniou, Julien Chaumond from Hugging Face, Mathilde Collin from Front, Charles Gorintin and Jean-Charles Samuelian-Werve from Alan, ElĂ©onore Crespo and Romain Niccoli from Pigment, Nicolas Brusson from BlaBlaCar, Howie Liu from Airtable, Matthieu Rouif from PhotoRoom, Igor Babuschkin and Irwan Bello.

If you take a step back, Dust is betting that LLMs will greatly change how companies work. A product like Dust works even better in a company that fosters radical transparency instead of information retention, written communication instead of endless meetings, autonomy instead of top-down management.

If LLMs deliver on their promise and greatly improve productivity, some companies will gain an unfair advantage by adopting these values as Dust will unlock a lot of untapped potential for knowledge workers.

https://techcrunch.com/

Tuesday, 6 June 2023

Governments worldwide grapple with regulation to rein in AI dangers

Ever since generative AI exploded into public consciousness with the launch of ChatGPT at the end of last year, calls to regulate the technology to stop it from causing undue harm have risen to fever pitch around the world. The stakes are high — just last week, technology leaders signed an open public letter saying that if government officials get it wrong, the consequence could be the extinction of the human race.

While most consumers are just having fun testing the limits of large language models such as ChatGPT, a number of worrying stories have circulated about the technology making up supposed facts (also known as "hallucinating") and making inappropriate suggestions to users, as when an AI-powered version of Bing told a New York Times reporter to divorce his spouse.

Tech industry insiders and legal experts also note a raft of other concerns, including the ability of generative AI to enhance the attacks of threat actors on cybersecurity defenses, the possibility of copyright and data-privacy violations — since large language models are trained on all sorts of information — and the potential for discrimination as humans encode their own biases into algorithms. 

Possibly the biggest area of concern is that generative AI programs are essentially self-learning, demonstrating increasing capability as they ingest data, and that their creators don't know exactly what is happening within them. This may mean, as ex-Google AI leader Geoffrey Hinton has said, that humanity may just be a passing phase in the evolution of intelligence and that AI systems could develop their own goals that humans know nothing about.

All this has prompted governments around the world to call for protective regulations. But, as with most technology regulation, there is rarely a one-size-fits-all approach, with different governments looking to regulate generative AI in a way that best suits their own political landscape.

Countries make their own regulations

“[When it comes to] tech issues, even though every country is free to make its own rules, in the past what we have seen is there’s been some form of harmonization between the US, EU, and most Western countries,” said Sophie Goossens, a partner at law firm Reed Smith who specializes in AI, copyright, and IP issues. “It's rare to see legislation that completely contradicts the legislation of someone else.”

While the details of the legislation put forward by each jurisdiction might differ, there is one overarching theme that unites all governments that have so far outlined proposals: how the benefits of AI can be realized while minimizing the risks it presents to society. Indeed, EU and US lawmakers are drawing up an AI code of conduct to bridge the gap until any legislation has been legally passed.

Generative AI is an umbrella term for any kind of automated process that uses algorithms to produce, manipulate, or synthesize data, often in the form of images or human-readable text. It’s called generative because it creates something that didn’t previously exist. It's not a new technology, and conversations around regulation are not new either.

Generative AI has arguably been around (in a very basic chatbot form, at least) since the mid-1960s, when an MIT professor created ELIZA, an application programmed to use pattern matching and language substitution methodology to issue responses fashioned to make users feel like they were talking to a therapist. But generative AI's recent advent into the public domain has allowed people who might not have had access to the technology before to create sophisticated content on just about any topic, based off a few basic prompts.

As generative AI applications become more powerful and prevalent, there is growing pressure for regulation.

“The risk is definitely higher because now these companies have decided to release extremely powerful tools on the open internet for everyone to use, and I think there is definitely a risk that technology could be used with bad intentions,” Goossens said.

First steps toward AI legislation

Although discussions by the European Commission around an AI regulatory act began in 2019, the UK government was one of the first to announce its intentions, publishing a white paper in March this year that outlined five principles it wants companies to follow: safety, security, and robustness; transparency and explainability; fairness; accountability and governance; and contestability and redress.

In an effort to to avoid what it called “heavy-handed legislation,” however, the UK government has called on existing regulatory bodies to use current regulations to ensure that AI applications adhere to guidelines, rather than draft new laws.

Since then, the European Commission has published the first draft of its AI Act, which was delayed due to the need to include provisions for regulating the more recent generative AI applications. The draft legislation includes requirements for generative AI models to reasonably mitigate against foreseeable risks to health, safety, fundamental rights, the environment, democracy, and the rule of law, with the involvement of independent experts.

The legislation proposed by the EU would forbid the use of AI when it could become a threat to safety, livelihoods, or people’s rights, with stipulations around the use of artificial intelligence becoming less restrictive based on the perceived risk it might pose to someone coming into contact with it — for example, interacting with a chatbot in a customer service setting would be considered low risk. AI systems that present such limited and minimal risks may be used with few requirements. AI systems posing higher levels of bias or risk, such as those used for government social-scoring systems and biometric identification systems, will generally not be allowed, with few exceptions.

However, even before the legislation had been finalized, ChatGPT in particular had already come under scrutiny from a number of individual European countries for possible GDPR data protection violations. The Italian data regulator initially banned ChatGPT over alleged privacy violations relating to the chatbot’s collection and storage of personal data, but reinstated use of the technology after Microsoft-backed OpenAI, the creator of ChatGPT, clarified its privacy policy and made it more accessible, and offered a new tool to verify the age of users.

Other European countries, including France and Spain, have filed complaints about ChatGPT similar to those issued by Italy, although no decisions relating to those grievances have been made.

Differing approaches to regulation

All regulation reflects the politics, ethics, and culture of the society you’re in, said Martha Bennett, vice president and principal analyst at Forrester, noting that in the US, for instance, there’s an instinctive reluctance to regulate unless there is tremendous pressure to do so, whereas in Europe there is a much stronger culture of regulation for the common good.

“There is nothing wrong with having a different approach, because yes, you do not want to stifle innovation,” Bennett said. Alluding to the comments made by the UK  government, Bennett said it is understandable to not want to stifle innovation, but she doesn’t agree with the idea that by relying largely on current laws and being less stringent than the EU AI Act, the UK government can provide the country with a competitive advantage — particularly if this comes at the expense of data protection laws.

“If the UK gets a reputation of playing fast and loose with personal data, that’s also not appropriate,” she said.

While Bennett believes that differing legislative approaches can have their benefits, she notes that AI regulations implemented by the Chinese government would be completely unacceptable in North America or Western Europe.

Under Chinese law, AI firms will be required to submit security assessments to the government before launching their AI tools to the public, and any content generated by generative AI must be in line with the country’s core socialist values. Failure to comply with the rules will results in providers being fined, having their services suspended, or facing criminal investigations.

The challenges to AI legislation

Although a number of countries have begun to draft AI regulations, such efforts are hampered by the reality that lawmakers constantly have to play catchup to new technologies, trying to understand their risks and rewards.

“If we refer back to most technological advancements, such as the internet or artificial intelligence, it’s like a double-edged sword, as you can use it for both lawful and unlawful purposes,” said Felipe Romero Moreno, a principal lecturer at the University of Hertfordshire’s Law School whose work focuses on legal issues and regulation of emerging technologies, including AI.

AI systems may also do harm inadvertently, since humans who program them can be biased, and the data the programs are trained with may contain bias or inaccurate information. “We need artificial intelligence that has been trained with unbiased data,” Romero Moreno said. “Otherwise, decisions made by AI will be inaccurate as well as discriminatory.”

Accountability on the part of vendors is essential, he said, stating that users should be able to challenge the outcome of any artificial intelligence decision and compel AI developers to explain the logic or the rationale behind the technology’s reasoning. (A recent example of a related case is a class-action lawsuit filed by US man who was rejected from a job because AI video software judged him to be untrustworthy.)

Tech companies need to make artificial intelligence systems auditable so that they can be subject to independent and external checks from regulatory bodies — and users should have access to legal recourse to challenge the impact of a decision made by artificial intelligence, with final oversight always being given to a human, not a machine, Romero Moreno said.

Copyright a major issue for AI apps

Another major regulatory issue that needs to be navigated is copyright. The EU’s AI Act includes a provision that would make creators of generative AI tools disclose any copyrighted material used to develop their systems.

“Copyright is everywhere, so when you have a gigantic amount of data somewhere on a server, and you’re going to use that data in order to train a model, chances are that at least some of that data will be protected by copyright,” Goossens said, adding that the most difficult issues to resolve will be around the training sets on which AI tools are developed.

When this problem first arose, lawmakers in countries including Japan, Taiwan, and Singapore made an exception for copyrighted material that found its way into training sets, stating that copyright should not stand in the way of technological advancements.

However, Goossens said, a lot of these copyright exceptions are now almost seven years old. The issue is further complicated by the fact that in the EU, while these same exceptions exist, anyone who is a rights holder can opt out of having their data used in training sets.

Currently, because there is no incentive to having your data included, huge swathes of people are now opting out, meaning the EU is a less desirable jurisdiction for AI vendors to operate from.

In the UK, an exception currently exists for research purposes, but the plan to introduce an exception that includes commercial AI technologies was scrapped, with the government yet to announce an alternative plan.

What’s next for AI regulation?

So far, China is the only country that has passed laws and launched prosecutions relating to generative AI — in May, Chinese authorities detained a man in Northern China for allegedly using ChatGPT to write fake news articles.

Elsewhere, the UK government has said that regulators will issue practical guidance to organizations, setting out how to implement the principles outlined in its white paper over the next 12 months, while the EU Commission is expected to vote imminently to finalize the text of its AI Act.

By comparison, the US still appears to be in the fact-finding stages, although President Joe Biden and Vice President Kamala Harris recently met with executives from leading AI companies to discuss the potential dangers of AI.

Last month, two Senate committees also met with industry experts, including OpenAI CEO Sam Altman. Speaking to lawmakers, Altman said regulation would be “wise” because people need to know if they’re talking to an AI system or looking at content — images, videos, or documents — generated by a chatbot.

“I think we’ll also need rules and guidelines about what is expected in terms of disclosure from a company providing a model that could have these sorts of abilities we’re talking about,” Altman said.

This is a sentiment Forrester’s Bennett agrees with, arguing that the biggest danger generative AI presents to society is the ease with which misinformation and disinformation can be created.

“[This issue] goes hand in hand with ensuring that providers of these large language models and generative AI tools are abiding by existing rules around copyright, intellectual property, personal data, etc. and looking at how we make sure those rules are really enforced,” she said.

Romero Moreno argues that education holds the key to tackling the technology’s ability to create and spread disinformation, particularly among young people or those who are less technologically savvy. Pop-up notifications that remind users that content might not be accurate would encourage people to think more critically about how they engage with online content, he said, adding that something like the current cookie disclaimer messages that show up on web pages would not be suitable, as they are often long and convoluted and therefore rarely read.

Ultimately, Bennett said, irrespective of what final legislation looks like, regulators and governments across the world need to act now. Otherwise we’ll end up in a situation where the technology has been exploited to such an extreme that we’re fighting a battle we can never win.

https://www.computerworld.com/