- Forward Future AI
- Posts
- Microsoft Build Event: AI EVERYTHING
Microsoft Build Event: AI EVERYTHING
GPT4o integration into windows, and so much more
Microsoft Makes its AI Announcements
Microsoft's event Build 2024 showcased its commitment to innovation by introducing Copilot Plus PCs, a new category of AI-enhanced computers featuring Qualcomm's Snapdragon X Elite and Plus processors. Enabling a 58% performance increase over the M3 MacBook Air, these PCs will also support Intel and AMD chips. Key to this line is the Recall feature, which logs PC activity for easy retrieval of past interactions and content.
Copilot integration extends further into Windows 11, employing OpenAI's GPT-4o model for more responsive assistance across the OS. Hardware updates include a faster Surface Pro with an OLED option and a new Surface Laptop that outperforms its predecessor by 80%. Both devices emphasize AI readiness with the new chips.
Additionally, Microsoft announces collaborations with software giants like Adobe to bring native Arm64 optimized apps to Copilot Plus PCs, enhancing creative workflows with AI functionality. These developments mark a significant step in Microsoft's AI integration strategy, reaffirming its leadership in the PC market.
OpenAI created a team to control 'superintelligent' AI — then let it wither - OpenAI had a dedicated "Superalignment" team focused on developing ways to control and align superintelligent AI systems. However, this team was allegedly denied sufficient compute resources to do their work effectively, leading to the resignations of co-lead Jan Leike and others. Leike cited disagreements with OpenAI leadership over prioritizing safety and alignment efforts versus product launches. He expressed concerns that OpenAI is not on track to solve critical AI safety challenges as models become more advanced. The Superalignment team was led by Leike and OpenAI co-founder Ilya Sutskever, who also resigned amid a power struggle with CEO Sam Altman. After the team's disbandment, AI safety work will be distributed across OpenAI rather than having a dedicated team. This raises fears that AI development may not be as safety-focused moving forward.
How the voices for ChatGPT were chosen - OpenAI collaborated with award-winning casting directors and voice actors over a 5-month period to select the 5 distinct voices for ChatGPT's Voice Mode feature. They received over 400 submissions, narrowed it down to 14 actors, and then selected the final 5 voices, which were recorded in San Francisco in June-July 2023. The chosen actors are compensated above market rates and will continue to be paid as long as their voices are used, ensuring no celebrity voices are mimicked to protect privacy. Future plans include launching an improved Voice Mode for GPT-4o and introducing additional voices to cater to diverse user preferences.
Scarlett Johansson Hired Lawyers to Push Back on ‘Eerily Similar’ OpenAI Voice - Actress Scarlett Johansson recently accused OpenAI of using an AI voice that sounded "eerily similar" to her own for their ChatGPT chatbot, despite her declining their offer to voice the feature. Johansson stated that she was forced to hire legal counsel to demand the removal of the AI voice, called "Sky," which OpenAI has since replaced with a different voice named "Juniper." The use of AI to mimic celebrities has become a controversial issue in Hollywood, with ongoing litigation against AI companies for using data scraped from the internet to train their software without permission.
ChatGPT can talk, but OpenAI employees sure can’t - OpenAI introduced an update to ChatGPT, featuring a human-like, feminine voice, reminiscent of Scarlett Johansson's AI character in the movie "Her." However, this product release was eclipsed by the departure of OpenAI's co-founder and chief scientist, Ilya Sutskever, and the superalignment team co-leader, Jan Leike. Sutskever had previously been embroiled in a boardroom conflict that temporarily ousted CEO Sam Altman, and Leike's terse resignation raised concerns about the company's commitment to safety. Furthermore, the stringent non-disclosure agreements binding ex-OpenAI employees prevent them from openly criticizing the company or discussing their departure, restricting transparency and seeming at odds with the company's original ethos of open collaboration and accountability in developing AI technologies.
As Nvidia grows stronger, Apple's iPhone continues to struggle - Nvidia's new "Blackwell" GPUs and the GB200 NVL system are expected to strengthen its control over AI architecture in data centers, with significant investment opportunities in AI, driven by cloud providers. D-Wave Quantum saw a rise in stock despite low revenue, driven by new bookings and promising customer use cases, though meaningful cloud usage is still developing. Cybersecurity vendors face customer fatigue from too many tools, leading to potential consolidation benefits for platform-focused companies like CyberArk, CrowdStrike, and Zscaler. Apple's iPhone 15 sales struggle, with significant declines in April sales, though potential AI partnerships could mitigate reliance on hardware sales.
Indian Voters Are Being Bombarded With Millions of Deepfakes. Political Candidates Approve - India's political arena is venturing into new territory with AI-driven tools as elections unfold. Politicians are creating deepfakes—highly convincing digital replicas—for voter outreach, tapping into the deepfake market worth $60 million. Diverse strategies such as personalized AI-generated calls are deployed to connect with voters in multiple languages across the nation. Despite the legal use, the intersection of technology and elections raises concerns about deception and misinformation. There are challenges aligning promised capabilities with actual performance, as AI's limitations sometimes lead to less than perfect interactions. As India's elections press forward, the distinction between genuine and AI-mediated communication continues to blur, stirring debate over the implications of this digital progression in its democracy.
GPT-4o’s Chinese token-training data is polluted by spam and porn websites - Key non-English languages include Russian, Arabic, and Vietnamese, which can result in a cost reduction up to four times for language processing. Analysis of Hindi and Bengali tokens suggested they mostly reflect news content, with fewer issues like those present in Chinese tokens. However, scrutiny of GPT-4o's Chinese tokens indicates a predominance of spam-related terms, suggesting inadequate data cleansing during training. This could be due to the hijacking of legitimate sites by spam content, a phenomenon that's even corrupted indexed content on authority sites like the US National Institutes of Health.
China’s ‘AI-in-a-box’ products threaten Big Tech’s cloud growth strategies - China's "AI-in-a-box" products are challenging the cloud growth strategies of big tech companies like Alibaba, Baidu, and Tencent by enabling companies to run AI applications on-premises instead of using public cloud services. Huawei is leading this trend by partnering with over a dozen AI start-ups to bundle their large language models with Huawei's AI processors and hardware. This approach caters to the Chinese market's preference for private cloud setups, which is significant due to data protection concerns and government regulations. As a result, the Chinese cloud market is seeing a split, with state-owned enterprises and government entities opting for private cloud solutions, potentially hindering the growth of public cloud services.
Awesome Research Papers
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context - Google has released its latest large language model, Gemini 1.5, which boasts a significant breakthrough in long context understanding by increasing the context window to 1 million tokens. This model, Gemini 1.5 Pro, has been optimized for a wide range of tasks and demonstrates performance comparable to Google's largest model, 1.0 Ultra, while requiring less computation. The model's ability to process large amounts of information, including entire biology textbooks and long videos, has been showcased, with users reporting accurate answers to specific questions and impressive handling of complex data. The technical details of Gemini 1.5 are outlined in a 58-page report, highlighting the model's architecture and the innovations that enabled its enhanced performance.
Testing theory of mind in large language models and humans - The paper investigates the theory of mind—the capacity to understand others' mental states—in both humans and large language models (LLMs) such as GPT-4 and LLaMA2. The researchers conducted thorough theory of mind tests encompassing tasks like identifying indirect requests, understanding false beliefs, and recognizing irony and faux pas, on two LLM families (GPT and LLaMA2) and 1,907 human participants. The study revealed GPT-4's human-like performance in most areas except for detecting faux pas, where LLaMA2 outperformed humans. However, LLaMA2's superiority was challenged when the study revealed potential bias in the model. The results showed that LLMs can exhibit behavior similar to human mentalistic inference, underscoring the necessity of systematic testing for a true comparison between human and artificial intelligences. The research stresses the importance of such comparisons for understanding both artificial and human cognitive abilities.
Chameleon: Mixed-ModalEarly-FusionFoundation Models - The paper presents Chameleon, a family of mixed-modal foundation models capable of generating and reasoning with mixed sequences of text and images. These models are designed to handle multimodal input and output, enabling them to process and generate both textual and visual data. The Chameleon models are part of the growing trend in AI research towards developing more advanced and versatile multimodal models that can effectively interact with and process different types of data.
DINO 1.5 Pro - Unleashing the Power of Cutting-Edge Computer Vision Technology - Grounding DINO 1.5 is a powerful open-world object detection model series, building upon its predecessor with increased model size and training dataset for enhanced accuracy. It offers two models: Grounding DINO 1.5 Pro for open-set object detection and Grounding DINO 1.5 Edge for edge computing scenarios, prioritizing efficiency and low latency. The model achieves state-of-the-art zero-shot transfer performance on several benchmarks and significantly improves performance when fine-tuned on downstream tasks.
Awesome New Launches
Google Maps just got an AI Upgrade - Google Maps has introduced several new features, including Immersive View, which allows users to explore places virtually with photorealistic views and access information like weather forecasts and crowd times. The platform also now offers Multi-Search, enabling users to combine words and images to search for information from millions of local businesses. Additionally, Google Maps provides AI-powered suggestions, such as recommending activities for a rainy day based on the user's location.
Introducing the Frontier Safety Framework - Google DeepMind addresses the potential risks of future advanced AI models with the Frontier Safety Framework launching by early 2025. This protocol anticipates and mitigates severe harms that such AI capabilities could cause. The Framework is grounded in the concept of Responsible Capability Scaling and comprises three components: identifying Critical Capability Levels (CCLs) indicating potential for severe harm in domains like autonomy and cybersecurity, evaluating AI models against CCLs with "early warning evaluations," and implementing tailored mitigation plans upon detection of a model reaching a CCL. Initial focus areas include autonomy, biosecurity, cybersecurity, and machine learning R&D. As part of its commitment to responsible AI development aligned with Google’s AI Principles, the company is heavily investing in the science of frontier risk assessment and working with industry, academia, and government to refine the Framework and establish standards for future AI safety.
Improvements to data analysis in ChatGPT - OpenAI is introducing enhancements to data analysis capabilities in ChatGPT, allowing users to interact with tables and charts more effectively and upload files directly from Google Drive and Microsoft OneDrive. These improvements will be available to ChatGPT Plus, Team, and Enterprise users through the new flagship model, GPT-4o, over the coming weeks.
Cool New Tools
One interface, many LLMs - Invisibility - Invisibility provides MacOS users with a comprehensive AI app that incorporates a range of language models, including ChatGPT, Claude, Gemini, Perplexity, and more, all available under a single subscription. Emphasizing the importance of having multiple AI models, Invisibility caters to diverse user needs by assessing and integrating different models based on speed, intelligence, and context window capacity. Users can compare models using a native interface with visual icons that quickly highlight each model's capabilities.
Reply