- Forward Future AI
- Posts
- Forward Future Episode 24
Forward Future Episode 24
And…we’re back! Not only am I back from traveling last week, but the AI world is back in a big way with an insane week of AI. This week, the AI battle between Meta and OpenAI heated up, with numerous game-changing launches from OpenAI, bringing incredible new capabilities to ChatGPT, Meta launching more AI features and also a pair of AI-enabled sunglasses, Amazon is catching up in the AI race with a massive investment in a leading AI company, Tesla shows off its updated Optimus robot, and Microsoft launches Windows 11 with AI built into everything.
OpenAI Dall-E 3
OpenAI dominated AI news this week with several new launches. Honestly, even if they launched just one of these things, it would have been incredible. First, OpenAI launched Dall-E 3. This happened last week, but I didn’t get a chance to talk about it. Dall-E 3 is the newest version of their generative art product, directly competing with Midjourney and Leonardo.ai. By the initial samples I’ve seen, Dall e is now on par with the newest version of Midjourney. Check out some of these images. What also impresses me is the range of styles it can create. Version 3 is a giant leap forward as compared to Version 2, check out this example comparing V2 to V3, with the prompt “An expressive oil painting of a basketball player dunking, depicted as an explosion of a nebula.”. Additionally, it’s built natively on ChatGPT, which means you can use ChatGPT as a brainstorming partner to help create the best prompts. This is already a popular technique to create prompts with MidJourney, and now it’s seamlessly built into the ChatGPT workflow. OpenAI spent a lot of time safety testing Dall e 3. According to their blog post “DALL·E 3 has mitigations to decline requests that ask for a public figure by name. We improved safety performance in risk areas like the generation of public figures and harmful biases related to visual over/under-representation, in partnership with red teamers—domain experts who stress-test the model—to help inform our risk assessment and mitigation efforts in areas like propaganda and misinformation.” Right now, Dall e 3 is only available to ChatGPT Plus and enterprise users. That $20/mo fee continues to increase in value.
ChatGPT Re-Enables Browsing
Another OpenAI launch that went under the radar but is incredibly important is web browsing in ChatGPT. Now, ChatGPT has access to the entire internet instead of just what is built into the model, where we would frequently get the “my knowledge cutoff date is Sept 2021” warnings. But, you’re probably thinking, didn’t ChatGPT already have web browsing? The answer is yes, they did. It was an incredible feature, but a couple of months ago, OpenAI disabled it without much explanation. The only reason they gave is that ChatGPT browsing would occasionally display content in unintended ways. The example they cite is when users asked for the full text of a URL, and ChatGPT would give it. That’s probably a considerable copyright risk exposure. Now, website owners can decide whether they want ChatGPT to be able to pull content from their site or not through the robots.txt file. I’m glad browsing is back because it makes ChatGPT much more powerful.
ChatGPT Multi-Modal
And in the biggest and most impressive launch this week by OpenAI, ChatGPT now can see, hear, and speak. This multi-modal capability allows ChatGPT to read images and have voice dialogue with users. In an example in the launch blog post, a user asks ChatGPT how to lower their bike seat and provides a picture of the bike for context. ChatGPT then provides some advice, and the user follows up with another image showing the specific part of the bike that might need to be adjusted. After this back and forth, ChatGPT advises the user’s specific bike. Then, the user shows a picture of their toolset, and ChatGPT tells the user which tool to use. It’s incredible. I’ve been collecting insane examples of ChatGPT Vision, such as taking a hand-written website flowchart and ChatGPT building the website. Let me know in the comments if you want me to create a video showing off all the fantastic examples of ChatGPT vision I’ve been collecting. But that’s not all. ChatGPT also can now communicate with voice and can have entire conversations. Simply open the ChatGPT app on your phone and start talking. ChatGPT will also reply with a voice rather than only text. They trained ChatGPT using voice actors, and the voice is actually really good, not robotic at all. ChatGPT is also allowing Spotify podcasters to translate their voice into different languages. But it’s not dubbed or transcribed. It’s the podcasters’ actual voice but in different languages. Check out this clip from Lex Friedman showing him speaking in Spanish. This is what Siri could have been all along, and Apple has some big goals to hit to compete with ChatGPT.
Sam Altman Teases AGI (Kind Of)
I know it seems like just OpenAI news this week, but we’re almost done. On Reddit, Sam Altman seemingly confirmed OpenAI achieved AGI but quickly followed up to ensure people knew he was just kidding. Ha…ha. For a company whose goal is AGI and the clear leader in cutting-edge AI technology, this joke landed flat on its face. He should probably stick to building AI and leave the jokes for comedians.
AI Phone
The last story about OpenAI is an exciting rumor. The famed designer Jony Ive, who helped transform Apple into the design powerhouse it is today with his work including Macs, iPods, and iPhones, is reportedly in talks with Sam Altman to create the “iPhone of AI.” They have raised $1b from Softbank CEO and founder Masayoshi Son for the project and could include chipmaker Arm for the hardware. There’s very little in the way of confirmation about this story, so I’ll keep you updated as it unfolds.
Tesla Optimus Robot
Next, as we accelerate into the future, Tesla released a new video of their Optimus robot. Since launching just a couple of years ago, Optimus has vastly improved. I mean, at the initial launch, it was humans in a robot suit dancing around. In this update video, Optimus can now self-calibrate its arms and legs using only vision and joint position encoding, sorting blocks by color, even when the environment is dynamically changing, and balancing on one leg. Boston Dynamics is still the king of robotics, with its robots able to do parkour, but they’ve been working on it for decades. As mentioned, Tesla is only a couple of years into their development, and the progress they’ve made is impressive.
Meta AI
Meta had a few major AI launches this week. First, Meta launched Meta AI, a new AI experience across their family of products. Meta AI, in beta, is an advanced conversational assistant available in WhatsApp, Messenger, and Instagram. It’ll also be coming to their new Quest 3 VR and sunglasses product. More on that next. Meta’s blog post says, “Meta AI is powered by a custom model that leverages technology from Llama 2 and our latest large language model (LLM) research. In text-based chats, Meta AI has access to real-time information through our search partnership with Bing and offers a tool for image generation.” So Microsoft is not only powering ChatGPT browsing but also Meta AI browsing. Seems like the clear winner here is Microsoft. Additionally, Meta is “creating AIs that have more personality, opinions, and interests, and are a bit more fun to interact with. Along with Meta AI, there are 28 more AIs you can message on WhatsApp, Messenger, and Instagram. You can think of these AIs as a new cast of characters – all with unique backstories.” These characters include TikTok star Charlie D’Amelio, Chris Paul, Kendall Jenner, Mr. Beast, and Snoop Dogg. A complete list of characters can be found in the blog post, which I’ll link to in the description below. I’m all about an AI Snoop Dogg. Who will you use?
Meta Emu
Meta also launched Emu, their next-generation AI generative art product. Emu is looking to compete directly with MidJourney Ann is built into several of their different products, including Messenger. Emu will also be capable of creating trendy stickers on the messenger platform. They’re also building AI generative art functionality into Instagram and WhatsApp. Continuing the theme of adding generative art to their products, Mehta is also adding AI image editing. Using learnings from their segment anything product, for example, you’ll easily be able to change the backdrop of a photo to change the location, for example With the feature called backdrop And in the name of safety, they’re going to mark images that were created and manipulated with AI which you already know I’m a big fan of
Meta AI Glasses
Next, as mentioned, Meta is launching sunglasses in partnership with Ray-Ban. These glasses look…normal. Remember Google’s attempt at making smart glasses? Meta’s glasses will include a ton of AI functionality and allow you to livestream, capture photos, play music, make phone calls, and chat with Meta AI easily. Coming in 2 styles and several color variations, the only real distinction that makes them unique is the cameras on the front. This seems like a privacy nightmare, but everyone already has cameras in their pocket, so maybe this isn’t so much different. What do you think?\
Meta Quest 3 AR
Meta also launched a new version of their Quest VR headset. With all of the AI news coming from Meta, it’s easy to forget that Mark Zuckerberg pivoted their entire company around the metaverse. With increased processing power, improved graphics and resolution, a slimmer profile, and improved sound quality, Meta is racing to prepare for the soon-to-be-launched Apple Vision headset. I’ve played with VR headsets, but they’ve never become part of my daily workflow. I’m incredibly excited about Apple Vision, and maybe that’s because I’m an Apple Fanboy, but this new Meta Quest also looks fantastic. Coming in at $499, the Quest 3 is 1/7th the cost of the Apple Vision, so Meta is taking a very different go-to-market approach than Apple. However, Meta is clearly labeling the Quest 3 as a mixed reality headset, whereas before, I believe they called it virtual reality. This is likely in response to Apple calling their headset mixed reality. It seems virtual reality has gone out of style.
Mistral 7b
Mistral AI has launched its 7 billion parameter large language model. This new model, Mistral 7b, beats LLaMA 2 7b on all benchmarks and LLaMA 1 34b on many benchmarks. Best of all, it is truly 100% open source, coming with an Apache 2.0 license. According to the launch blog post, it “Approaches CodeLlama 7B performance on code, while remaining good at English tasks”. The AI benchmarks are acceptable but don’t usually represent real-world use cases. Do you want me to run a full test on it myself? Let me know in the comments.
Amazing Invests in Anthropic
Not to be left out of the AI race, Amazon made a couple of AI announcements this week. First, Amazon acquired a significant stake in the AI company Anthropic. Anthropic is the maker of Claude, a direct and competent competitor to ChatGPT. Amazon invested $4b into Anthropic but also signaled a more extensive collaboration between the two companies, including AWS becoming Anthropic’s primary cloud provider. The two companies have already launched the Claude model on Amazon Bedrock, one of their many AWS cloud services. You’ll be able to customize and fine-tune Claude using Bedrock. Claude’s AI capabilities will also be incorporated into other Amazon products. This smart move by Amazon echoes a similar strategy between Microsoft and OpenAI.
Alexa AI
Amazon is also bringing generative AI functionality to Alexa. According to Dave Limp, the SVP of devices and services at Amazon, “Our latest model has been specifically optimized for voice and the things we know our customers love — like having access to real-time information, efficiently controlling their smart home, and getting the most out of their home entertainment” Amazon’s new AI in Alexa will be conversational and will not only take into account voice, but also body language, eye contact, and gestures. You’ll also be able to control your smart home capabilities, for which Amazon has been the clear winner in the space.
Leonardo.ai Elements
Leonardo.ai, a competitor to Midjourney with a fantastic interface, has launched a new feature called Elements. Elements adds the ability to incorporate LoRAs into your gen AI workflow. According to the announcement, “We have simplified the process for you to seamlessly blend various styles, mix models, and achieve incredible effects that align perfectly with your creative vision. You can create an array of powerful effects on your generated images by combining artistic styles, such as Baroque, GlassSteel, Inferno, and many more.” Leonardo is clearly the David vs. multiple goliaths, including Midjourney and Dall E 3. But I’ve been a big fan of Leonardo from the early days. One of my first videos was about Leonardo, so I’m rooting for them. Elements is available for all users right now, so be sure to check it out.
Windows Copilot AI
Microsoft this week launched Windows 11 with Copilot. According to the announcement, “New for Windows 11, Copilot in Windows is an AI-powered intelligent assistant that helps you get answers and inspirations from across the web, supports creativity and collaboration, and helps you focus on the task at hand.” Copilot has been built into every aspect of the Windows operating system, and can answer your questions and control different aspects of your Windows environment. I haven’t had a chance to download and play around with Copilot for Windows yet, but I plan to soon.
SpaceX Starshield
While not AI news, it’s undoubtedly futuristic. Elon Musk’s SpaceX has won a big US Space Force contract for Starshield. SpaceX will provide customized satellite communications for the military under the company’s new Starshield program. According to a quote from CNBC: “The SpaceX contract provides for Starshield end-to-end service (via the Starlink constellation), user terminals, ancillary equipment, network management, and other related services,” Space Force spokesperson Ann Stefanek said. Starshield is a new line of business for Space X, which it launched just last year, and the Pentagon already purchases the company’s rockets. Not much more detail is available about this yet, but I’ll keep an eye on it.
CIA AI
Also, in government news, the CIA is building its artificial intelligence tool to rival China’s abilities. According to decrypt.co, “Nameless for now, the tool will be trained on publicly available data and aims to help U.S. spies to quickly verify information.” It doesn’t have a launch date yet, and I wonder if they are working with leading AI companies like OpenAI and Meta on this. The CIA’s AI will be able to analyze large swathes of data to help keep the US safe.
Quantum Supremacy
Apparently, the real technology to be worried about is quantum computing. Quantum computing promises to give us incredible power to improve our world and threatens to upend many other security technologies, such as encryption. Quantum computing doesn’t operate like standard computers, using 1s and 0s known as binary, but instead uses quantum bits, or qubits, that allow for calculating an unlimited number of possible outcomes. This method can transform many industries, including logistics, healthcare, finance, cybersecurity, weather predictions, etc. Leading tech companies, including Google and IBM, are investing heavily in developing quantum computing. But as Spiderman’s Uncle Ben said, with great power comes great responsibility.
YouTube AI
Next, YouTube is launching several AI features. Thanks to Bilawal Sidhu for putting together a summary of the features, including:
1. AI Video with Dream Screen - Visually transport yourself anywhere by typing a prompt. This new Shorts feature generates fantastical backgrounds in both image and video form.
2. Free Editing App YouTube Create - A free mobile app provides easy professional editing tools to craft high-quality videos in minutes.
3. Personalized AI Insights - Get tailored video ideas and outlines in YouTube Studio based on your channel and current trends.
4. Auto-Dubbing with Aloud - Use AI dubbing to automatically localize your videos into other languages with one click.
5. Assistive Music Search - Instantly find the perfect free soundtrack by describing your video. AI recommends songs and beats that fit best to your music.
AI Video Of The Week
Now for the AI video of the week! Tim Graupmann provided the suggestion this week, so thank you to Tim. And a reminder: if you want to suggest videos for the AI video of the week, jump into my Discord; the link is in the description below. In this video, we see a spectacular, giant video during a concert showing a rapidly evolving skeleton guy. The visuals are stunning, and I can’t even imagine what it was like to watch it on a massive screen with music playing. Check out the video.
Gemini AI Soon
For our last story, Google is nearing the launch of Gemini, its competitor to GPT4. Although OpenAI beat Google to launch multi-modal features, it’s rumored that Gemini will include them at launch. Google gives a small set of companies access to Gemini for testing purposes. Gemini is a collection of AI models with access to the internet and all your information, such as email, calendars, and docs. It’ll also be capable of writing code and generating images, all of the features ChatGPT already supports. I’ve mentioned this before, but every company, including Google, is playing catch-up with OpenAI. This must be incredibly frustrating for Google, given they published the original research paper that kickstarted this wave of AI technology: Attention is All You Need.
Reply