- Forward Future AI
- Posts
- OpenAI’s Voice AI Risks, Co-Founders Step Back, and Figure 02 Humanoid Robot Upgrades
OpenAI’s Voice AI Risks, Co-Founders Step Back, and Figure 02 Humanoid Robot Upgrades
OpenAI warns of potential emotional attachment risks with its new voice mode, as co-founders Brockman and Schulman step back from leadership roles. Meanwhile, Figure 02 humanoid robot debuts major upgrades, including enhanced AI-driven vision and autonomous functionality. Explore these key developments in AI safety, leadership changes, and robotics advancements.
Hold onto your astronaut helmets, folks. Today’s newsletter is packed with enough AI goodness to make your head spin.
We've got the 'Godmother of AI' throwing shade at California's AI bill, Amazon's AI partnership getting grilled by the UK, and Humane's AI Pin proving that not all that glitters is gold. We'll also dive into the murky world of Facebook's AI-generated spam, explore Audible's AI-powered search, and check out Hugging Face's latest acquisition.
Onward!
Sponsor
CodiumAI is a quality-first generative AI coding platform, offering developers tools for writing and refactoring, as well as testing and reviewing. Generate confidence, not just code. Try for free.
Top Stories
OpenAI Warns Users Could Become Emotionally Hooked on Its Voice Mode
OpenAI has introduced a highly humanlike voice interface for ChatGPT that raises concerns about users forming emotional attachments to AI. In their safety analysis and "system card" for GPT-4o, OpenAI describes potential risks like bias amplification, disinformation spread, and misuse for developing weapons. The company acknowledges criticism on its approach to AI commercialization and safety, aiming for more transparency to address these concerns. The document details AI's safety testing and mitigation strategies, but critics note it lacks comprehensive information on the model's training data and the issue surrounding consent. As AI tools like OpenAI's voice interface become more advanced and integrated, these risks and the importance of continual real-world assessment of AI safety measures are underscored. OpenAI is monitoring the emotional impacts of its voice interface, including the potential to influence users or disrupt human interactions. Other companies like Google DeepMind also recognize the ethical challenges posed by AI assistants that mimic human behavior.
OpenAI Co-Founders Schulman and Brockman Step Back
OpenAI co-founders Greg Brockman and John Schulman are stepping down from their roles, with Brockman taking a sabbatical and Schulman joining rival AI startup Anthropic. This marks a significant change in OpenAI's leadership, following other key departures and new hires, including a new chief financial officer and chief product officer. Brockman, a pivotal ally of CEO Sam Altman, and Schulman, known for his work on ChatGPT, leave only two original founders at the company. Schulman will focus on AI safety at Anthropic, joining other former OpenAI researchers.
Introducing Figure 02
Brett Adcock announced significant upgrades to the Figure 02 humanoid robot, including a custom 2.25 KWh battery pack for over 20 hours of daily use, improved integrated wiring for enhanced reliability, and an AI-driven vision system with six RGB cameras. The exoskeleton structure, inspired by aircraft design, adds structural stiffness, while the new 4th generation hands boast 16 degrees of freedom for human-like tasks. Additionally, Figure 02 features a CPU and GPU with three times the computational power of its predecessor, enabling fully autonomous AI tasks.
Introducing Structured Outputs in the API
OpenAI has launched Structured Outputs in the API to ensure model-generated outputs match developer-supplied JSON Schemas precisely. This new feature enhances the reliability of generating structured data from unstructured inputs, a key use case for AI. The gpt-4o-2024-08-06 model, optimized for this feature, achieves 100% accuracy in following complex JSON schemas, outperforming previous models. Structured Outputs are available through function calling and a new response_format parameter, with native support in OpenAI’s Python and Node SDKs. This improvement aims to streamline development and reduce the need for workarounds.
‘The Godmother of AI’ says California’s well-intended AI bill will harm the U.S. ecosystem
Fei Fei Li criticizes California's proposed SB-1047, emphasizing its potential to hinder the AI sector. The author argues that the bill would unfairly hold AI developers liable for model misuse, implementing a "kill switch" that could disrupt open-source development, and stifle university and public sector research by restricting access to essential data and computational resources. The piece cautions that the bill errs by setting arbitrary regulatory thresholds and doesn't address genuine AI problems like bias and deepfakes. Instead of such legislation, the author advocates for AI policy that encourages innovation, uniform rules, and consumer confidence, urging collaboration to shape more effective governance that advances a human-centered AI future.
Continue reading here.
Sponsor
AI Hub by Qualcomm - Run, download, and deploy your optimized models on Snapdragon® and Qualcomm® devices. Learn more about AI Hub by Qualcomm at https://aihub.qualcomm.com/
More AI Stories
Humane’s daily returns are outpacing sales
Humane's $699 AI Pin faced significant consumer returns and criticism post-launch. Sales data revealed more returns than purchases from May to August, with only around 7,000 units not returned. Launch reviews were negative, with tech reviewers labeling the device dysfunctional and calling it one of the worst products reviewed. Financial concerns arise as the company has generated just over $9 million in sales versus the $200 million in funding from high-profile investors. Returns contribute to revenue loss, with no refurbishing option due to a T-Mobile limitation. Humane has released updates to address issues and is reportedly experiencing executive turnover while trying to instill confidence in its future and exploring funding options, including potential debt financing.
Continue reading here.
Hugging Face Acquires Seattle-Based Data Storage Startup XetHub
Hugging Face has acquired XetHub, a Seattle-based data storage and collaboration startup founded by former Apple engineers, aiming to enhance its AI model hosting and collaboration capabilities. XetHub CEO Yucheng Low highlighted their shared vision of democratizing AI. This acquisition, Hugging Face's largest to date, adds 14 employees to its workforce. Hugging Face, known for its developer tools to manage large-scale AI models, continues to expand following a $235 million Series D funding round and the acquisition of Argilla. XetHub's co-founders bring extensive experience from previous roles at Apple, AWS, and Microsoft.
UK Starts Probe into Amazon's AI Partnership with Anthropic
The UK's Competition and Markets Authority (CMA) has initiated an investigation into Amazon's partnership with AI startup Anthropic, following a similar probe into Alphabet's collaboration with the same startup. The CMA has until October 4 to decide whether to conduct a deeper probe or clear the partnership of competition concerns. Amazon and Anthropic maintain that their collaboration does not raise competition issues, emphasizing Anthropic's independence and freedom to partner with multiple providers.
Continue reading here.
Where Facebook's AI Slop Comes From
Emotionally provocative AI-generated images can be monetized through Facebook's performance bonus system. This ecosystem benefits from exploiting algorithm vulnerabilities and content moderation challenges, all amplified by AI tools like Microsoft's Bing Image Creator. Facebook acknowledges the situation with its 40,000-strong security and safety team but faces challenges due to layoffs and rapid exploitation of loopholes by savvy operators. Despite the oddity of the spam content, the core issue lies in the broader context of the internet being shaped by financial incentives and the under-moderation of non-English content by big tech companies. This entire cycle is buoyed by Meta's direct financial incentives to post viral content through its Creator Bonus Program.
Continue reading here.
Audible is testing an AI-powered search feature
Audible, Amazon's audiobook subsidiary, is testing an AI-powered search feature named Maven to help users find audiobooks based on natural language queries. Available to select U.S. customers on iOS and Android, Maven provides tailored recommendations from a subset of Audible's nearly one million titles. Audible is also experimenting with AI-curated collections and AI-generated review summaries, aiming to enhance user experience. This follows reports of increasing use of AI-voiced audiobooks on the platform, which has raised concerns among human narrators about job security.
AI is mining the sum of human knowledge from Wikipedia. What does that mean for its future?
Wikipedia, a vital resource for AI's language models, is experiencing challenges. Its traffic has seen some decline, with executives at Wikimedia Foundation questioning accuracy of third-party data but acknowledging unique user statistics are stable. The advent of AI tools like ChatGPT could risk volunteer engagement, crucial for content creation and site quality. Wikimedia executives see potential in AI to expand access to knowledge but stress the importance of attributing Wikipedia as a source to maintain volunteer motivation and prevent misinformation. They maintain that AI integration should support, not replace, human efforts. Wikimedia Enterprise, a paid service for high-volume users, helps fund Wikipedia while promoting the responsible use of its content. Google is a notable client, contributing to a small percentage of Wikimedia's revenue. The foundational stance is optimistic about AI's role provided the correct attributions and support for volunteers are in place.
WPP and Nvidia Partner to Make 3D Ads Using Gen AI
ADWEEK reports that advertising conglomerate WPP has partnered with chipmaker Nvidia to utilize generative AI for creating realistic 3D advertising environments from simple text prompts. This collaboration allows for swift generation of intricate visuals, significantly reducing the need for coding and backend work. Initially testing with clients like Coca-Cola and Ford, WPP is piloting Nvidia's AI tools in 3D image production. Nvidia's tech also includes guardrails to ensure brand guideline adherence. The partnership bolsters WPP's commitment to AI, investing $318 million annually to rejuvenate growth. Coca-Cola and Ford are leveraging the technology for rapid and scalable content creation, enhancing customization and efficiency in advertising production.
One of America’s Hottest Entertainment Apps Is Chinese-Owned
The Talkie app, a popular AI chatbot allowing users to converse with celebrities or virtual romantic partners, has gained significant traction in the U.S. Despite its widespread use, many are unaware that the app is Chinese-owned, with its parent company being Shanghai-based MiniMax. Leveraging OpenAI’s foundation model, Talkie has become one of the top downloaded entertainment apps in the U.S., highlighting the challenges Chinese AI firms face due to strict regulations at home and tensions with Washington. The app’s success underscores the complexities of geopolitics in the tech industry.
Reddit to test AI-powered search result pages
Reddit is set to introduce AI-generated summaries at the top of search results, aiming to enhance user experience by summarizing and recommending content. CEO Steve Huffman announced during an earnings call that this new feature, powered by both first-party and third-party technology, will help users explore content more deeply and discover new communities. The testing will begin later this year. This follows Reddit's partnerships with OpenAI and Google, which provide access to large language models and data. Additionally, Reddit's AI-powered language translation feature is expanding, contributing to the platform's user growth and increased revenue.
AI Is Coming for India’s Famous Tech Hub
AI is transforming India's technology outsourcing sector, potentially displacing many low-end jobs like those in call centers. Major outsourcing firms are integrating AI to maintain competitiveness, as simple tasks are rapidly becoming automated. Despite contributing significantly to India's economy and employing millions, the industry faces pressure to move up the value chain and embrace higher-end services. While AI offers new business opportunities, it also accelerates trends towards fewer employees per revenue unit. The shift underscores the need for workforce adaptation towards roles requiring critical thinking and creativity.
Intel reportedly gave up a chance to buy a stake in OpenAI in 2017
The article discusses how Intel missed an opportunity to invest in artificial intelligence by not taking a stake in OpenAI when it had the chance in 2017 and 2018. Former Intel CEO Bob Swan decided against an investment, not foreseeing AI models' widespread market impact. Meanwhile, competitors Nvidia and AMD have advanced in the AI hardware space. Microsoft invested in OpenAI, helping it grow significantly. Intel's recent data center and AI performance has seen reduced revenue, but they are moving forward with new processors and AI accelerators, hoping to recover in the sector. Neither Intel nor OpenAI commented on the matter.
OpenAI reportedly leads $60M round for webcam startup Opal
OpenAI is set to lead a $60 million Series B funding round for Opal Camera Inc., a startup specializing in high-end webcams. Investors include Founders Fund and Kindred Ventures. Opal's primary product is the 'Tadpole' webcam, which boasts a recording resolution of up to 3840x2160 pixels and allows for customizable settings via the Composer app. Post-funding, Opal aims to pivot partly towards developing AI-powered creative tools. OpenAI, having a history of investing in AI-integrated startups, could see Opal integrate its AI technologies into future devices. Previously, Opal released the C1 webcam, with AI capabilities powered by Intel's Movidius Myriad X Vision chip. Additionally, OpenAI is exploring custom AI hardware development, highlighted by its discussions with Broadcom and the recruitment of former Google TPU team engineers.
Awesome Research Papers
MiniCPM-V: A GPT-4V Level MLLM on Your Phone - The rapid evolution of Multimodal Large Language Models (MLLMs) is transforming AI research, yet their deployment is hindered by high operational costs and the need for powerful cloud servers. The newly developed MiniCPM-V series addresses these issues, providing efficient MLLMs for end-device implementation. Notably, the MiniCPM-Llama3-V 2.5 excels in performance, outpacing competitors on various benchmarks, showcasing strong image processing and OCR capabilities, minimal hallucination, and multilingual support. The trend indicates shrinking model size requirements for high-level performance, suggesting a future with MLLMs like GPT-4V running on everyday devices, broadening AI's real-world applicability.
POA: Pre-training Once for Models of All Sizes - The paper presents a study introducing POA (Pre-training Once for All), a tri-branch self-supervised training framework designed to generate multiple-sized models from a single pre-training session, addressing the issue of model scaling due to computation or storage constraints. The framework incorporates an elastic student branch within a self-distillation paradigm, allowing the parallel training of models of varying sizes, which also enhance representation learning. POA can be used with different backbones like ViT, Swin Transformer, and ResNet, achieving state-of-the-art performance on various tasks and evaluations.
RAG Foundry: A Framework for Enhancing LLMs for Retrieval Augmented Generation - RAG Foundry is an open-source framework designed to simplify the complex process of developing Retrieval-Augmented Generation (RAG) systems. It consolidates the creation of datasets, training, inference, and evaluation of RAG applications within a singular workflow. The framework supports easy prototyping and experimenting with RAG methods and improves large language models using specialized data. It has proven effective in enhancing models like Llama-3 and Phi-3 with various RAG settings, yielding consistent performance gains on knowledge-intensive datasets.
MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models - Introducing the Multimodal Multi-image Understanding (MMIU) benchmark, a specialized suite for assessing Large Vision-Language Models (LVLMs) on their ability to process multiple images. MMIU comprises 7 multi-image relationship types, 52 tasks, 77K images, and 11K multiple-choice questions. An evaluation of 24 prominent LVLMs showcased substantial comprehension difficulties, particularly in spatial understanding, with even top-performing models like GPT-4o only scoring 55.7% accuracy. The analytics from this benchmark aims to guide future enhancements in LVLMs and foster research for more complex multimodal interactions.
Achieving Human Level Competitive Robot Table Tennis - This research marks a significant stride in robotics, presenting the first robot capable of playing competitive table tennis at an amateur human level. Developed by a team from Google DeepMind, the robot was assessed over 29 matches, winning 45% against unseen human opponents of varying expertise. Its performance was strongest against beginners and intermediate players, with wins at 100% and 55% respectively, while advanced players overcame the robot. The hierarchical and modular policy architecture underpinning the robot's capability includes low-level skill controllers and a high-level controller for skill selection. The work also features innovative zero-shot sim-to-real transfer techniques and real-time opponent adaptation. The robot's play was highly regarded for its engagement and fun factor by participants, establishing it as a potential dynamic practice partner.
Language Model Can Listen While Speaking - Researchers from the MoE Key Lab of Artificial Intelligence at Shanghai Jiao Tong University and ByteDance Inc. have developed the Listening-while-Speaking Language Model (LSLM), advancing speech language models (SLM) by enabling full duplex interaction—where the model can listen while speaking. Unlike existing turn-based SLMs, LSLM leverages a decoder-only TTS and a streaming self-supervised learning (SSL) encoder for simultaneous speaking and real-time audio input processing. The system effectively fuses listening and speaking channels using three strategies—Early, Middle, and Late Fusion—with Middle Fusion deemed optimal. Tested in both clean and noisy conditions, LSLM has shown robust duplex communication, minimally impacting current systems, thus promising improved real-world applicability for conversational AI.
LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct - EXAONE-3.0-7.8B-Instruct is a bilingual (English and Korean) generative AI model with 7.8 billion parameters, pre-trained on 8T curated tokens. It underwent fine-tuning to excel in language tasks, outperforming similar-sized models on multiple benchmarks. Usage requires transformers v4.41 or later, with specific system prompts for optimal operation. The model sometimes creates biased or inappropriate content, a byproduct of training data limitations. Usage guidelines prohibit malicious activities. The work is offered under the EXAONE AI Model License Agreement 1.0 - NC, and support is provided by LG AI Research.
OpenBMB/MiniCPM-V - MiniCPM-V is a series of multimodal large language models (MLLMs) designed for vision-language tasks, supporting image, video, and text inputs to provide high-quality text outputs. The latest model, MiniCPM-V 2.6, boasts 8 billion parameters and surpasses GPT-4V in image and video understanding, featuring real-time video processing on devices like the iPad. Released in August 2024, MiniCPM-V 2.6 also improves on the strong OCR capabilities and multilingual support of its predecessors, advancing the practical deployment of these technologies.
Introducing Qwen2-Math - Introducing the Qwen2-Math series, specialized language models optimized for solving mathematical problems. Notably, the Qwen2-Math-72B-Instruct model surpasses other advanced models, including GPT-4o, in math tasks. The developers have meticulously curated training datasets, implemented decontamination to minimize test set contamination, and are planning bilingual English-Chinese models. The models are tested on various benchmarks, and their performance is demonstrated through detailed case studies of complex math problems. The Qwen2-Math series builds upon the Qwen2 foundation and represents a focused effort to advance the reasoning capabilities of large language models in the field of mathematics.
Build, tweak, repeat - Mistral announces new advancements in software development using language models. It introduces customization features for flagship and specialist models on La Plateforme, allowing developers to tailor models such as Mistral Large 2 and Codestral with techniques like base prompting, few-shot prompting, or fine-tuning using their own datasets. Anticipating groundbreaking applications, the site encourages trying out the fine-tuning documentation for model customization. Additionally, an alpha version of 'Agents' is presented. This feature enables the creation of custom workflows using the advanced reasoning of Mistral Large 2 and is designed to integrate with various tools and data sources.
Call for Applications: Llama 3.1 Impact Grants - Presenting the Llama 3.1 Impact Grants, an initiative offering up to $2 million USD to support projects using the Llama 3.1 AI model for social good. The grant program emphasizes the transformative power of AI in enhancing human productivity, creativity, and quality of life, advocating for open-source access as a means for equitable technology deployment. Following the success of its previous grant program and the Llama Impact Innovation Awards, the organization now seeks proposals utilizing Llama 3.1's new features for projects with significant economic and social impact. Selected proposals can receive up to $500,000 USD. Additionally, the program will host regional events worldwide for further engagement and special awards. The call for applications is global, with a focus on economic development, science, public service, and more, and the deadline for submissions is November 22, 2024.
Claude AI Conversation Summaries feature spotted in testing - Anthropic is developing a conversation summary feature, identified through reverse engineering, that will be accessible from the home screen and will include a "Generate New Summaries" button. Primarily aimed at team plans, it will enable users to stay updated on team conversations, but its availability for personal plans remains uncertain. The release date for this feature has not been disclosed. User feedback on social media is generally favorable. Additionally, on the Claude platform, an enhancement allows users to bulk delete conversations, complemented by a new "select all" option for improved chat management.
Introducing AI Skills in Microsoft Fabric: Now in Public Preview - Fabric has launched AI skills, empowering users to create custom generative AI tools for conversational Q&A, tailored to organizational data contexts. These skills facilitate data-driven answers with less user input, enhancing productivity. Fabric Copilot is an existing AI product that aids data professionals by generating code and finding answers, with users validating its outputs. The new AI skills focus the AI on specific datasets, follow English instructions, and employ example queries to maintain accuracy and data governance. AI skills are now available for Fabric customers using F64 or higher capacities, subject to admin activation and a public preview phase. Users are encouraged to explore and provide feedback on this generative AI solution.
Amazon Titan Image Generator v2 is now available in Amazon Bedrock - Amazon has launched the Titan Image Generator v2 within its Bedrock platform, enhancing capabilities for image creation and editing. The updated version allows for detailed image conditioning using reference photos, precise color palette control, background removal, and subject consistency for brand coherence. Key features include Canny edge and segmentation for structural guidance, and fine-tuning to maintain specific subjects across generated images. Users can access these features via API, SDK, or AWS CLI, with illustrative Python code provided for ease of use. Titan Image Generator v2 is available in select US regions, with further information on Amazon's product and pricing pages.
ClipAnything - Clip any moment from any video with prompts - ClipAnything presents an innovative AI-powered tool designed to extract specific moments from videos by analyzing visual, audio, and sentiment cues. It recognizes various elements like objects, scenes, emotions, and texts to evaluate each segment's potential for virality. Users can guide the AI using natural language prompts to locate select scenes, actions, characters, events, or emotionally charged or viral moments within video content.
ByteDance Joins AI Video App Market with Jimeng AI Launch - ByteDance, the parent company of TikTok, has introduced Jimeng AI, a text-to-video app developed by its subsidiary Faceu Technology. This app, now available on both Android and the Apple App Store, joins a competitive market alongside similar models from other Chinese tech firms like Kuaishou's Kling AI and Zhipu AI's Ying. Jimeng AI offers subscription plans, enabling users to generate AI videos and images, further expanding ByteDance's presence in the AI-driven video creation space. This move follows Microsoft's collaboration with OpenAI on the Sora model, which has yet to be publicly released.
ChatGPT Desktop App for macOS Adds Side-by-Side Feature - The ChatGPT desktop app for macOS now includes a new feature that allows users to open a companion window using Option + Space. This window stays in front of other applications, providing convenient side-by-side access to ChatGPT. This update aims to enhance multitasking and ease of use, enabling users to interact with ChatGPT while working on other tasks.
OpenAI releases the GPT-4o System Card - The system card outlines GPT-4o's capabilities, limitations, and safety measures, reflecting OpenAI's commitment to safe AI development as per agreements with the White House. It includes evaluations of speech-to-speech, text, and image functionalities, and discusses the societal impacts and third-party assessments of potential dangers associated with the model.
Reply