Q&A, new models, AGI @ 50%, GPT-4... - LifeArchitect.ai LIVE

Name: Q&A, new models, AGI @ 50%, GPT-4... - LifeArchitect.ai LIVE
Uploaded: 2023-06-23T13:45:28.000Z
Duration: 2 h 34 min 48 s

Introduction

In this section, the speaker introduces the concept of using Transformer models to generate art and explains how they are trained using different types of data.

Training Data

Transformer models are fed with books, internet data, and other sources to train them.

Different types of data are used for training such as popular web pages, academic publications, and Wikipedia content.

The speaker uses jelly crystals to demonstrate the amount of data used for training. Six parts represent general web pages, two parts represent popular web pages, two parts represent books and papers while one part represents Wikipedia content.

The speaker pours the jelly crystals into a jar to show how much data is used for training. Six parts represent general web pages (100 million), two parts represent popular web pages (8 million), two parts represent books and papers (millions), while one part represents Wikipedia content (6 million).

Magic Black Box

The speaker talks about feeding the training data into a "magic black box" program called Google Transformer that can make connections between words by looking forward and backward in the text.

The model takes 288 years worth of computing time to train on a single computer but can be run on multiple computers simultaneously to reduce training time.

Neurons in the Brain

The speaker mentions that there are 86 billion neurons in the human brain connected by synapses.

The Relationship Between Tensegrity Models and the Human Brain

In this section, the speaker explains how tensegrity models can be used to understand the human brain.

Tensegrity Models and the Human Brain

Tensegrity models can be used to represent the human brain.

Each data point in a tensegrity model represents a dot on the model, while connections between dots represent parameters or synapses in the human brain.

After three months of training, or 288 years of training equivalent, a black box no longer contains original data points.

The Singularity and History

In this section, the speaker discusses history and technology.

Living in a Time with History and Technology

It is amazing that we get to live in a time with so much history that we can touch, talk about, and be with.

We are also living in a time of runaway technology, runaway singularity, and runaway artificial general intelligence.

Superintelligence is coming soon; it may even already be here in some fashion.

Microsoft's Phi One Model

In this section, the speaker talks about Microsoft's Phi One model for training AI models.

Microsoft's Phi One Model

Sebastian presented Sparks of AGI on GPT4 by Microsoft.

Sebastian wrote a paper called "Textbooks are All You Need" with a few other authors.

The paper argues for using higher quality data, such as textbook quality data, to train AI models.

Microsoft's Phi One model uses high-quality textbook data from the web to train AI models.

GPT1 was trained on non-fiction books, while GPT2 and GPT3 were trained on popular web links and everything we know about from Wikipedia to more textbooks.

Conclusion

In this section, the speaker concludes the video.

Conclusion

Superintelligence is coming soon; it may even already be here in some fashion.

Sebastian presented Sparks of AGI on GPT4 by Microsoft.

Sebastian wrote a paper called "Textbooks are All You Need" with a few other authors.

Microsoft has been concentrating on interesting models in some ways smaller models but they've always been

Different AGI Models

In this section, the speaker talks about different AGI models that have been trained and their parameters.

Different AGI Models

The speaker talks about different AGI models that have been trained.

Some of the models mentioned are Pact, Flame, Cosmos, and Wizard LM.

Microsoft is focusing on small models to prove concepts rather than investing billions of dollars in large models.

GPT4 is rumored to be made up of eight 220 billion parameter models, making it a mixture of experts model.

Groundedness and Truthfulness in AI

In this section, the speaker discusses groundedness and truthfulness in AI and how it can be achieved.

Groundedness and Truthfulness in AI

The speaker talks about groundedness being one of his triggers for achieving AGI.

Harvard introduced a concept called inference time intervention which checks for accuracy or truthfulness after inference at inference time.

This mechanism is similar to watermarking text used by OpenAI to ensure truthfulness.

Google DeepMind's Rubber Cat uses more checking than Gato.

Predictions for Achieving AGI

In this section, the speaker talks about his predictions for achieving AGI.

Predictions for Achieving AGI

The speaker predicts that AGI may appear at the end of this year or next year with a two to five trillion parameter model with complete multimodality.

He believes that AGI may appear sooner than expected and mentions labs such as Tesla and Nvidia working on it.

Chinese AI Labs and GPT-4

In this section, the speaker talks about the advancements in AI technology in China, specifically mentioning the Ernie 260b model which is comparable to GPT-3. The speaker also discusses the possibility of a visual component for GPT-4.

Advancements in AI Technology in China

Chinese AI Labs are doing some incredible work that is often ignored by Western media.

The Ernie 260b model is comparable to GPT-3 and may even be better.

There are massive models being developed, such as M6, which went from a trillion to 10 trillion sparsely activated mixture of experts in their model.

Visual Component for GPT-4

The speaker discusses why OpenAI has not released a multimodal input for GPT-4 after all this time.

DV was thought to stand for Da Vinci but it actually stands for DaVinci vision.

There was another code name there - GPT Vision - that was confirmed via Twitter.

Multimodal means that it can see stuff so it can convert images into text output.

Stable Diffusion Extra Large Version 0.9

In this section, the speaker talks about stable diffusion extra large version 0.9 and how it compares to other models. They also discuss how easy it is to use on phones or web pages.

Stable Diffusion Extra Large Version 0.9

The speaker mentions that they have just released research on stable diffusion extra large version 0.9.

This model is still quite popular if you want to use it on your phone right now or a slightly earlier version.

You can access version 1.5 and the latest, which looks like you either need to log in or upgrade to Pro.

Using Stable Diffusion Extra Large

The speaker uses mage.space in all their Keynotes because everyone can just scan a QR code with their iPhone or Android and it's a web address and off they go.

You can use stable diffusion immediately without messing around by going to phone or webmage dot space.

Total Data Available to Humanity

In this section, the speaker talks about the total amount of data available to humanity and how it is increasing rapidly.

Total Data Available to Humanity

The speaker analyzes an aspect of this at life architect.ai Gemini.

They believe that if you search for deepmind Gemini, you'll find them as the top result there.

The speaker mentions that very few people are talking about this currently but predicts that it will be explosive in six months.

Conclusion

In conclusion, the speaker discusses advancements in AI technology in China, specifically mentioning the Ernie 260b model which is comparable to GPT-3. They also discuss the possibility of a visual component for GPT-4. Additionally, they talk about stable diffusion extra large version 0.9 and how easy it is to use on phones or web pages. Finally, they mention the total amount of data available to humanity and predict that it will become more widely discussed in six months.

Massive Text Data Sets

In this section, the speaker discusses various massive text data sets that are available for training language models.

Available Data Sets

Google's internal GitHub repository contains all their code for all their products and is about 86 terabytes in size.

Refined web or C4 can go up to 23.2 terabytes or 5 trillion tokens thanks to TTI in the United Arab Emirates.

DeepMind's Massive Text data set curated 12 terabytes of books, four terabytes of web and C4, nearly three terabytes of code from GitHub, news, and Wikipedia.

YouTube has around 800 million videos at an average of 11 minutes each which might be around 10 billion minutes total or approximately 1.5 trillion words.

Limitless Possibilities

The speaker believes that there is no limit to the amount of high-quality and clean data on Earth. OpenAI licensed data for GPT4 from various sources such as LinkedIn repository from Microsoft, YouTube, Windows code Repository via Microsoft Pap's, old Google Plus information from social media among others.

A dense model with five trillion parameters would only require collecting a hundred trillion text tokens which could be repeated during training.

Mixture of Experts (MoE)

In this section, the speaker talks about MoE and its potential impact on GPT4.

MoE Hypothesis

The speaker is disappointed with MoE but proposes that it may feel smarter than other models due to its connection to eight different dense models.

The speaker hypothesizes that MoE may have very specific experts, including a 220 billion parameter dense model for general use with multilingual capabilities and another expert for code.

Benefits of Code Expert

A separate expert for code would make sense as it allows the use of massive data sets without polluting or limiting how big models can go in terms of parameter count. It also helps with teaching reasoning.

Learning Models for Different Fields

In this section, the speaker discusses how different fields may require different learning models. He mentions that there are separate models for legal government and policy, medicine, science, and economics.

Different Learning Models for Different Fields

There may be a model of conversation and dialogue for learning how to learn.

The speaker picked the aforementioned fields based on what he learned from early testers of GPT4.

Andrew D White writes about using GPT4 for chemistry.

OpenAI completed a world tour where they encouraged China to step up in AI development.

Quantum Computing and AI Intersection

In this section, the speaker talks about quantum computing and its intersection with AI. He mentions that he does not know enough about quantum computing to comment on it but notes that IBM's James Waver is a quantum computing advocate.

Quantum Computing and AI Intersection

The speaker does not know enough about quantum computing to comment on it.

James Waver is a quantum computing advocate for IBM.

They are making fast strides in quantum computing, which could surpass classical computing soon.

Multi-modality of GPT4

In this section, the speaker discusses the multi-modality of GPT4. He compares it to DeepMind's Chinchilla model and Flamingo model.

Multi-modality of GPT4

DeepMind's Chinchilla model and Flamingo model are multimodal.

GPT4 is also multi-modal, with a possible 10 or 20 billion parameter vision model.

However, GPT4 does not do sound robotics video and IMU like temperature and Direction and acceleration.

AI Alignment and Safety

In this section, the speaker talks about AI alignment and safety. He mentions that OpenAI completed a world tour where they discussed the importance of AI alignment and safety.

AI Alignment and Safety

OpenAI completed a world tour where they discussed the importance of AI alignment and safety.

Chat GPT has an additional layer of safety on top of it to block certain things.

Chat GPT used people in Africa to tune it on human preferences.

Lack of Transparency in AI Development

In this section, the speaker discusses their disappointment with the lack of transparency in AI development and initiatives. They mention how many labs were transparent before but have now stopped publishing major research on AI.

Transparency in AI Development

The speaker expresses their disappointment with the decision of Deep Mind to stop publishing any major research on AI from 2018.

Many other labs were incredibly transparent before, but that has changed. OpenAI and Google Palm 2 technical reports are not informative enough.

Microsoft is doing better than others, and Meta AI is still publishing. Both DeepMind and OpenAI are still publishing some papers that don't restrict their ability to make lots of money.

Predictions for AGI Development

In this section, the speaker talks about predictions for AGI development and mentions a chart that shows the progress made so far.

Progression Towards AGI

GPT4 plotted out a chart showing progression towards AGI by January 2027.

The One X Neo bot backed by a large language model is different from anything else we've ever seen. It will be available by November/December 2023.

The speaker talks about exponential growth since discovering Transformer in 2017 and GPT3 in 2020.

Making predictions is hard, but the speaker guesses we might hit 70% by the end of 2023 given truthfulness.

Embodiment in AGI

In this section, the speaker talks about embodiment in AGI and mentions a mid-year AI report.

Embodiment in AGI

The speaker mentions that embodiment is the big one for them and will talk more about it in the mid-year AI report.

The One X Neo bot is not like Honda's Asimo or pre-programmed Boston Dynamics stuff. It's backed by chat GPT Neo and some of the most incredible language models we've ever seen.

Introduction

The speaker introduces the topic of AI and data sets, and mentions some communities that are working on gathering big data sets.

The speaker asks where they left off.

The speaker mentions a question about a disconnect between content creators and AR model generation, but doesn't understand it. They mention communities like Luther AI that are working on gathering big data sets.

The speaker mentions other places like stability AI and Refined Web that have developed large data sets. They also mention collecting specialty data from different fields.

Importance of Data Sets in AI

The speaker discusses the importance of collecting large amounts of data for AI models to learn from.

The speaker explains that common crawl has traditionally been used to collect web data, but code is also important to collect. They mention GitHub as a source for public domain code.

The speaker talks about collecting specialty pieces of data from different industries or even entire governments. They emphasize the importance of collecting high-quality data for training models.

Milestones in AGI Development

The speaker discusses milestones in artificial general intelligence (AGI) development.

The speaker mentions hitting various milestones in AGI development, with the next milestone being at 50% completion where the model needs to be completely grounded and truthful.

At 60%, physical embodiment becomes important, with research being done on incorporating AGI into drones, robots, and other physical devices.

At 80%, the speaker mentions passing Steve Wozniak's test of AGI, where the model can perform tasks like making a cup of coffee from scratch. The speaker emphasizes that AGI can outperform humans in many areas.

AI in Governance

The speaker discusses which countries are currently using AI the most in their governance.

The speaker mentions a view of rankings for different countries using AI in governance. They don't provide specific details on how it's working out for them so far.

Ranking Countries by AI Talent

In this section, the speaker discusses how countries are ranked based on their talent infrastructure, environment research, Dev government strategy and commercial. The speaker also talks about how Romania is using a large language model as an advisor for governance.

Using Large Language Models in Governance

Romania has developed a system where insights from the public are fed through a form via the web and then their prime minister can ask questions to a large language model embedded into a mirror.

Palantir is using open-source large language models including Pythia GBT Neo and T5 to help them with planning ground operations for getting ready for war.

Synthetic Data and Super Intelligence

In this section, the speaker talks about synthetic data being used in superintelligence to discover new data. The speaker also compares pre-war steel with post-war steel containing radioactive particles from the steel-making process.

Synthetic Data Usage

Microsoft's textbooks are all you need use synthetic data from GPT 3.5 to free train their model.

Imitation models are using synthetic data generated by GPT 3.5 and GPT4 as supplementary data.

Pre-War Steel vs Post-War Steel

Pre-war steel did not contain any radioactive particles from the steel-making process while post-Trinity bomb in 1945 all of our steel contains these signatures.

Everything that you see online will have some sort of AI inside it after years or decades.

Proprietary Data and Healthcare Databases

In this section, the speaker talks about how much non-public data is out there and how it can be used to gather quality training data. The speaker also discusses using healthcare databases and records as an excellent example.

Non-Public Data Usage

Using proprietary sets of journals that would never release their data online or via the open web to package up as quality training data.

Gathering a few billion tokens from the government of Iceland to bring in very high-quality contextual data.

Healthcare Databases

Healthcare databases and records are an excellent example of non-public data usage.

Conclusion

In this section, the speaker concludes by discussing how hundreds of different tribes and languages exist in Australia, which could be mapped for use in AI. The speaker also thanks paid subscribers who sponsor developers in developing countries.

Mapping Different Tribes and Languages

There are hundreds of different tribes and languages in Australia that could be mapped for use in AI.

Paid Subscribers

Paid subscribers from Ukraine India Mexico Malaysia Indonesia sponsor developers particularly in Ukraine India parts of Africa where hundreds of dollars for a subscription to this kind of news is a bit of a challenge for them.

The Potential of AI to Package Local Customs and Cultures

In this section, the speaker discusses the potential of AI to package up local customs, cultures, spiritual beliefs, governance beliefs, political motivations, and entire knowledge bases from different countries.

AI's Ability to Package Different Cultures

AI can package up local customs and cultures from different countries.

The benefits of knowing about multi-culture in New Zealand and bringing in their beliefs on spirituality or politics.

Imagine bringing in all of Polynesia's research on food and diet; this would be incredible.

AI can make links that have never been made before.

Testing GPT4 with Nathan Gaunt

In this section, the speaker talks about testing GPT4 with an artist named Nathan Gaunt.

Testing GPT4 with Nathan Gaunt

The speaker tests GPT4 by asking it about an artist named Nathan Gaunt.

GPT4 makes strange connections between Nathan Gaunt and Jeff Martin that haven't been made before.

GPT4 found a connection where Nathan's old drummer was Jeff Martin's new drummer for a little bit.

The Flakiness of GPT4 Inference

In this section, the speaker talks about the flakiness of GPT4 inference.

The Flakiness of GPT4 Inference

The inference behind GPT4 is really flaky in the last seven days since release.

The speaker doesn't know what happened to GPT4 in the last seven days.

AI's Ability to Change Connections in Real Time

In this section, the speaker talks about whether AI can change its connections in real-time.

AI's Ability to Change Connections in Real Time

The Transformer architecture does have to be frozen so it's pre-trained and then it just sits there.

OpenAI has mentioned that they're going to pre-train models to a certain extent and then update its data set by adding a few billion parameters on top.

There are real-time mechanisms where you can go out and search the net, listen to what's happening in the environment that's going to be very soon which doesn't actually update the model.

The Best Model for Life Coaching and Meditation

In this section, the speaker talks about a model called HeyPi that is useful for life coaching and meditation.

HeyPi Model

HeyPi is a model developed by the ex-members of Transformer launch.

It is backed by a lot of dialogue and conversation, making it better than models like GPT4 and Palm 2.

It is trained to be dialogue and conversation-oriented, which makes it more effective in responding to queries.

It can provide better responses than Eliza from 80 years ago.

Safety and Voice Features of HeyPi

In this section, the speaker talks about the safety features of HeyPi as well as its voice capabilities.

Safety Features

The safety features of HeyPi are better than chat GPT.

There is no login process required to use it.

Voice Capabilities

You can turn on different speech-to-text examples.

There are examples where it refuses to answer questions related to making bombs.

Mid-Year Report on Artificial Intelligence Progress

In this section, the speaker talks about his mid-year report on artificial intelligence progress.

Mid-Year Report

Full members of Memo will receive the mid-year report soon.

The report covers emotional aspects as well as technical aspects of AI models released in the first six months of 2023.

Joining Private Mailing List for Priority Access

In this section, the speaker invites people to join his private mailing list for priority access to his articles, videos, and behind-the-scenes tips.

Joining Private Mailing List

You can join the private mailing list at lifearchitect.ai/memo.

Subscribers get priority access to articles, videos, and behind-the-scenes tips as soon as they are released.