- Forward Future AI
- Posts
- Models, Models, Everywhere!
Models, Models, Everywhere!
So many new models, so little time!
Tech Giants and Startups Explore New Consumer Devices with Mixed Reviews
The Humane AI Pin, developed by a team of former Apple employees, is a new device intended to diminish the reliance on smartphones, featuring voice controls and laser-projected imagery onto the user's hand. Priced at $700, the AI Pin attaches magnetically to clothing and offers novel interaction methods, though it struggles with practical issues like overheating, slow response times, and a limited app ecosystem. Despite innovative efforts, the AI Pin faces significant user experience challenges, including cumbersome voice control, a problematic passcode system, and unsatisfactory battery life. The tech industry continues to explore potential smartphone alternatives, but the Humane AI Pin's design and functionality shortcomings suggest that smartphones will remain indispensable for the foreseeable future.
Sponsor
Pinecone’s serverless vector database helps you deliver remarkable GenAI applications faster at up to 50x lower cost. Learn more about Pinecone serverless here.
Adobe Premiere Pro is going all-in on AI — testing Sora, Runway and Pika Labs - Adobe is innovating in the video editing space with new AI-driven features for Premiere Pro, harnessing integrations with third-party AI like OpenAI's Sora, Runway, and Pika Labs. These features, which are in internal testing with no set public release date, aim to streamline video production by enabling editors to easily extend clips, add or remove objects from sequences, and generate b-roll via text prompts. The core technology behind these enhancements is Adobe's Firefly Video generative AI model, trained on licensed content. Recognizing the mixed reactions from the creative community regarding AI tools in video production, Adobe emphasizes that the new features are meant to be complementary rather than a replacement for traditional methods, pushing the boundaries of what smaller creators can achieve.
Adobe’s ‘Ethical’ Firefly AI Was Trained on Midjourney Images - Adobe Inc. developed its Firefly AI image generator, touting it as a more ethical option because it was trained predominantly on Adobe Stock images and public domain content, avoiding the controversial practice of scraping the internet like its competitors. However, it was later revealed that Adobe also used AI-generated images from these same competitors in Firefly's training set, a detail not widely disclosed in its marketing. Despite internal concerns about the ethics of using competitor-generated images, Adobe continues to include such content in Firefly’s training data. This practice has sparked debate over the transparency and ethical standards of Adobe’s AI, challenging its claim to be a more responsible alternative in the generative AI space.
A.I. Has a Measurement Problem - The article highlights a significant issue with leading AI tools such as ChatGPT, Gemini, and Claude—there's a lack of understanding of their true intelligence due to the absence of standardized testing before they are released to the public. Unlike other industries, AI companies aren't mandated to have their products certified, leading to reliance on their own often vague claims of improvement. Experts question the reliability of existing tests for AIs. This gap in measurement and evaluation is problematic, as it leaves users uncertain about which AI tool best suits a specific task, such as coding or image generation.
iOS 18: The latest on Apple's plans for on-device AI - Apple is anticipated to showcase new AI developments at WWDC on June 10, with a strong emphasis on privacy and on-device intelligence, as highlighted by Bloomberg's Mark Gurman. The company is expected to introduce AI features that operate solely on the device, avoiding cloud processing, which could enhance speed and privacy. However, this strategy may have its limitations, such as a lack of access to robust cloud infrastructure that facilitates more complex algorithms. There's also speculation about a potential collaboration with Google for cloud-based AI capabilities and questions about device compatibility, as on-device AI might only work on the newest iPhones. The approach raises intriguing considerations on how Apple will navigate its privacy-centric methods with possible partnerships.
TikTok plots using virtual influencers for advertising - TikTok is exploring an AI feature to create virtual influencers for advertising, which could generate both video scripts and avatars based on advertiser prompts, potentially impacting human creators' ad opportunities. The development is part of a broader trend where companies like TikTok integrate generative AI to enhance business efficiency and marketing strategies, despite some industry skepticism about the real value of these AI investments. Testing indicates that these AI-generated videos currently underperform compared to those created by human influencers, suggesting that virtual influencers might initially complement rather than replace human content creators. As the technology progresses, the use of AI-created content and virtual influencers is expanding, raising concerns about the implications for human creators' earnings and content uniqueness in the long term.
OpenAI Researchers, Including Ally of Sutskever, Fired for Alleged Leaking - OpenAI has dismissed two researchers, Leopold Aschenbrenner and Pavel Izmailov, amid allegations of leaking confidential information. The firings, revealed amid internal conflicts over AI safety standards, followed a turbulent period marked by an unsuccessful attempt by Aschenbrenner and others to oust CEO Sam Altman. Both Aschenbrenner and Izmailov had roles on a team focused on AI safety, and their dismissal coincides with significant organizational changes after Altman was reinstated as CEO following an exoneration by OpenAI’s board. This controversy unfolds as OpenAI competes in the development of advanced AI models, with Aschenbrenner also connected to the effective altruism movement, emphasizing the prioritization of addressing AI risks
OpenAI’s Altman pitches ChatGPT Enterprise to large firms, including some Microsoft customers - Sam Altman, CEO of OpenAI, has been actively engaging with top executives from Fortune 500 companies across San Francisco, New York, and London, presenting and pitching OpenAI's AI services tailored for corporate needs, which include the enterprise version of their popular ChatGPT and new AI technologies like APIs and text-to-video models. These events, not previously reported, demonstrate OpenAI's strategy to diversify and expand its revenue sources by tapping into corporate sectors globally, even competing with Microsoft, its major financial supporter. OpenAI aims to persuade corporate clients by offering direct collaboration, access to the latest AI models, and customized AI solutions, despite questions from some attendees about the benefits of paying for ChatGPT Enterprise when they already use Microsoft's services. The company's push towards enterprise-level engagement is part of a broader goal to meet a projected $1 billion revenue target for 2024, leveraging its high adoption rate among major companies and expanding into new product areas like the Sora video creation tool.
Meta trials its AI chatbot across WhatsApp, Instagram and Messenger in India and Africa - Meta is piloting its generative AI-powered chatbot, Meta AI, within WhatsApp, Instagram, and Messenger for users in India and parts of Africa. This move is viewed as an attempt to leverage Meta's extensive user base and compete with other tech giants in the AI space, particularly after witnessing the public's strong reception to OpenAI's technologies. Aimed at retaining users and investors by showcasing innovation, Meta AI can respond to queries and create photorealistic images from text. Yann LeCun, Meta's chief AI scientist, has noted the shift in public acceptance of AI chatbots, influencing Meta's more open approach to releasing models. The integration of Meta AI across its popular platforms targets massive user engagement, while the upcoming release of Llama 3, a new open-source language model, continues Meta's push into AI.
Amazon add Andrew Ng, a leading voice in artificial intelligence, to its board of directors - Amazon has appointed AI expert Andrew Ng to its board of directors as part of its strategic push into generative artificial intelligence, amid growing competition in the AI sector. Ng, a notable figure in AI and founder of AI Fund, brings extensive experience from his previous roles at Baidu and Google. His appointment aligns with Amazon's significant AI investments, including a $4 billion stake in startup Anthropic to develop foundational AI models, and the launch of AI-driven products like the Q chatbot and Rufus shopping assistant. Amazon CEO Andy Jassy emphasized generative AI's potential to become a key pillar of Amazon's future, comparable in impact to cloud computing and the internet.
SAG-AFTRA union secures AI protections for artists in deal with major record labels - The SAG-AFTRA union, representing approximately 160,000 media professionals, has secured AI protections for artists in a deal with major record labels. The agreement, which was unanimously approved by the union's executive committee, includes provisions requiring consent and compensation for using a digital replica of an artist's voice in AI-generated songs. The terms "artist," "singer," and "royalty artist" are restricted to human beings under the accord. Additional aspects of the agreement include enhancements to health and retirement benefits and an increase in the percentage of streaming revenue covered by contributions. The member ratification vote is expected in the coming weeks. This development follows last year's negotiations, where AI became a significant point of contention in the entertainment industry.
Awesome Research Papers
Pre-training Small Base LMs with Fewer Tokens - The paper introduces 'Inheritune,' a method for developing smaller language models by inheriting a few transformer blocks from a larger pre-trained model and training on a significantly smaller data subset. This technique allowed the creation of a 1.5B parameter model trained on only 1B tokens with comparable effectiveness to base models of 1B-2B size, despite using substantially less training data. The approach was validated across various datasets and matched the validation loss of larger models when smaller models using parts of GPT-2 were trained on a full dataset. The results are supported by extensive experiments, and the code is available on their GitHub repository.
OpenEQA: From word models to world models - Recent advancements in large language models (LLMs) showcase their capacity for understanding language, yet they currently lack real-time awareness of the world. Introducing capabilities for these models to "see" through technologies like smart glasses or home robots could unleash new practical uses. The concept, as highlighted by Jitendra Malik, moves beyond text prediction to constructing comprehensive world models, which is a critical step towards achieving artificial general intelligence (AGI). Embodied Question Answering (EQA) serves as a critical benchmarking tool to test if AI truly grasps the physical environment, paralleling how we assess human comprehension through questioning. OpenEQA represents a cutting-edge benchmark for measuring the progress in embodied AI.
Cohere Compass Private Beta: A New Multi-Aspect Embedding Model - Cohere announces the private beta of Cohere Compass, an embedding model tailored for multi-aspect enterprise data like emails, invoices, and log messages, which are often rich in concepts and relationships. Traditional embedding models convert documents into single vectors, struggling with multi-aspect data, resulting in inaccurate search results. Compass promises to improve this by transforming JSON documents into multi-aspect representations stored in vector databases, maintaining context and relationships. The Compass SDK assists in converting multi-aspect data into JSON, ensuring integrity when indexing and searching. An example shows Compass accurately handling a complex GitHub search query, separating aspects such as time, subject, and type.
Grok-1.5 Vision Preview - Grok-1.5V is a newly introduced multimodal model that merges digital and physical world processing with robust text and visual information interpretation, including documents, charts, and real-world images. It demonstrates promising capabilities, especially with RealWorldQA, a benchmark for real-world spatial understanding. Performance stats show Grok-1.5V's varying results: excelling in some areas like document understanding (85.6%) and lagging slightly in others like multi-discipline benchmarks (53.6%) compared to peers. RealWorldQA is a collection of over 700 images that challenge models with practical spatial questions and is available under CC BY-ND 4.0. Future developments aim to enhance multimodal understanding and generation, and the team is recruiting for further advancement.
Reka Core: Our Frontier Class Multimodal Language Model - Reka introduces Core, a large language model with robust multimodal capabilities, processing images, videos, and audio alongside text. Distinctly, Core stands out with a 128K context window for deeper information retention and possesses advanced reasoning, excelling in language, mathematics, and coding tasks to facilitate complex workflows. It's pretrained in 32 languages, offering multilingual support. Flexible deployment options are available, including API, on-premises, or on-device to meet various user needs.
Introducing Idefics2: A Powerful 8B Vision-Language Model for the community - The Idefics2 model is a multimodal AI capable of processing both text and images to generate text responses. With its 8 billion parameters, the open-source model surpasses its predecessor Idefics1 in performance, showcasing state-of-the-art results on Visual Question Answering benchmarks, rivaling larger counterparts, and offering improved OCR functionalities. It is readily accessible and easy to fine-tune via the Hugging Face Transformers library.
WizardLM 2 - The next generation of large language models (LLMs) called WizardLM-2 has been introduced and open-sourced. This family of models aims to enhance performance across various dimensions including complex chat interactions, multilingual capabilities, and reasoning tasks. WizardLM-2 is introduced with 3 Model variants 8×22B, 70B and 7B.
Awesome New Launches
Limitless is a new AI tool for your meetings — and an all-hearing wearable gadget - The Limitless Pendant, created by Dan Siroker's company, is an AI-powered device designed for audio recording in daily life, particularly focusing on meetings. It captures audio via a clip-on pendant or a neck-worn device, integrating with cloud and real-time data to provide searchable access to recorded information on any device. The associated Limitless system also works with emails and calendars for a subscription fee, offering meeting transcription and summaries. The company differentiates itself by aiming for depth over breadth in AI application, with future plans for proactive AI assistance. The wearable tech, with 100-hour battery life and consent-based recording, is set to launch for $99 and positions itself as a practical tool, starting by enhancing meeting efficiency before potentially expanding capabilities.
Reply