Forward Future Daily
Posts
Meta Unleashes LLaMA 3, Changing AI Forever

Meta Unleashes LLaMA 3, Changing AI Forever

LLaMA 3, Atlas 2.0, and so many awesome new papers

Matthew Berman
April 19, 2024

Meta unveils Llama3 - A leap forward in open language modeling

Meta says Llama 3 beats most other models, including Gemini

Meta has introduced Llama 3, the latest advancement of its large language model, which, according to the company, surpasses most existing AI models in performance. The new iteration comes in two sizes, with 8 billion and 70 billion parameters, and is set for release to cloud providers and model libraries. Llama 3 boasts improvements in the diversity of responses, reasoning abilities, and code-writing capabilities. In benchmark tests, Llama 3 outperformed comparable models from Google and other AI developers, showcasing its proficiency in general knowledge and other tasks. Human evaluators also rated Llama 3 above competitors in practical use cases such as advice, summarization, and creative writing. Future developments for Llama 3 point towards even larger models for handling complex patterns and multimodal responses. However, benchmarks and evaluations could be imperfect, as the datasets can overlap with the models' training material. Moreover, Llama 3's capabilities against OpenAI's GPT-4 remain unaddressed by Meta.

Sponsor

Kalshi is the first regulated way to trade on AI developments. Trade on when AGI will be achieved, which LLM will be the best this year, when Sora will be publicly released, and hundreds of other world events.

Mistral, an OpenAI rival in Europe, in talks to raise capital at a $5BN valuation - Mistral, a Paris-based startup developing open-source artificial intelligence, is reportedly seeking to raise several hundred million dollars at a valuation of $5 billion. The company, founded by former DeepMind researcher Arthur Mensch and former Meta AI scientists Timothée Lacroix and Guillaume Lample, has recently begun to generate revenue and raised $415 million in December at a valuation of $2 billion. Mistral's AI models, which are available for free and via APIs, have gained popularity among developers, and the company is positioning itself as a locally based alternative to AI providers based in other regions. The company's founders have stated that Mistral is developing its products in line with stricter European regulations on the safe development of AI software, and the company has experienced backlash over its first open-source model, which lacked safety features.
DeepMind CEO says Google will spend more than $100BN on AI - Google's AI business chief announced that the company plans to invest over $100 billion in artificial intelligence technology development, surpassing Microsoft and OpenAI's potential investment in a $100 billion supercomputer. Hassabis, CEO of Google DeepMind, emphasized Alphabet Inc.'s superior computing power and the company's commitment to AGI (Artificial General Intelligence) since partnering with Google in 2014. The global interest in OpenAI's ChatGPT has shown that the public is ready to embrace AI systems, despite their flaws. Google has made significant strides in AI and ML, including machine learning for Google Search spelling correction, Google Translate, TensorFlow, and AlphaGo's victory over a world champion Go player.
Is robotics about to have its own ChatGPT moment? - Stretch is a 50-pound robot with a mobile base, an extendable arm with a gripper equipped with suction cups, and a camera. It is operated by Henry Evans, who has limited mobility, through a laptop that tracks his head movements, allowing him autonomy in everyday activities such as hair brushing and eating. Stretch's interaction with Henry's granddaughter improved their relationship, engaging in playful activities together. While the robot isn't inherently intelligent, its design allows users to incorporate their own AI models for experimentation. Robotics experts believe the field is nearing a practical breakthrough, with affordable hardware like Stretch and advancements in AI enabling robots to perform complex household tasks, overcoming challenges identified by Moravec's paradox related to precision, perception, and understanding of objects.
Amazon, Google quietly tamp down Generative AI expectations - Major tech companies like Amazon, Google, and Microsoft are investing heavily in generative AI, but representatives are tempering expectations about its capabilities and value. Customers are cautious about increasing spending on AI services due to high costs, accuracy concerns, and difficulty measuring value. Google aims to generate at least $1 billion in revenue this year from AI cloud services, but this includes revenue from services sold for nearly a decade. Salesforce executives also state that generative AI won't significantly contribute to revenue growth this year. Companies are still grappling with questions of value, accuracy, and cost-benefit analysis, with many in the early stages of AI implementation.
Microsoft and G42 partner to accelerate AI innovation in UAE and beyond - Microsoft and UAE-based AI firm G42 have expanded their partnership to advance AI solutions deployment via Microsoft Azure in industries across the Middle East, Central Asia, and Africa. Microsoft will invest $1.5 billion in G42 for a minority stake and appoint its Vice Chair and President to G42’s board. The partnership aims to propel G42’s AI and infrastructure services across various sectors. It is underpinned by a robust security and compliance framework agreed upon by US and UAE governments. G42 reaffirms its reliance on Microsoft Cloud, planning to migrate its technology infrastructure to Azure to leverage its scalability and security. The collaboration also includes G42's contributions to cloud sovereignty in the UAE and making its Arabic language model available on Azure. Additionally, the partnership will foster digital transformation and precision medicine projects, with plans to establish a $1 billion developer fund to nurture AI skills in the region.
New NVIDIA RTX A400 and A1000 GPUs Enhance AI-Powered Design and Productivity Workflows - NVIDIA is enhancing its RTX professional graphics lineup with two new Ampere architecture-based GPUs — the RTX A400 and RTX A1000 — to address the increasing demand for AI-enhanced and ray-tracing workflows. The RTX A400 GPU introduces accelerated AI and ray tracing capabilities to the RTX 400 series with 24 Tensor Cores, supporting real-time, physically accurate 3D rendering and multi-display outputs. Meanwhile, the RTX A1000 brings 72 Tensor Cores and 18 RT Cores to the RTX 1000 series, greatly improving AI processing and graphics tasks for professional workflows, with significant gains in video processing efficiency. Both GPUs offer next-generation features for professionals, from industrial planning to healthcare, boosting productivity in diverse applications. The A400 and A1000 combine power with ecological efficiency, featuring a single-slot design and 50W energy consumption, handling both expansive and compact workstations. The A1000 is available through global partners, while the A400 will hit the market in the coming months.
AMD reveals next gen processors for extreme PC gaming and creator performance - AMD has announced new products for its desktop portfolio, including the AMD Ryzen™ 8000G Series desktop processors for the AM5 platform, which brings immense performance for gamers and creators and unlocks new personal AI experiences with Ryzen AI. The company also introduced the new Ryzen™ 5000 Series desktop processors, offering users more choices when it comes to building a system for productivity, gaming, or content creation. Additionally, partners such as Lenovo, Razer, Asus, and Acer have introduced new AI PCs with AMD Ryzen™ 8040 Series mobile processors.
Baidu says Ernie chatbot now has over 200 million users - Baidu's Ernie Bot, a Chinese-language chatbot, has attracted over 200 million users since its release in August 2023, making it the most widely used ChatGPT-style chatbot in China. The platform has also amassed over 85,000 corporate clients, as reported by CEO Robin Li during a conference in Shenzhen. This surge in users and clients indicates the growing popularity and acceptance of AI chatbots in China. However, domestic AI services like Kimi from Moonshot AI are rapidly closing the gap, highlighting the intensifying competition in this sector.
The top companies for training workers to use AI — including Amazon and GM - Less than 50% of U.S. firms train employees on generative AI, yet LinkedIn's "Top Companies" list highlights those excelling in AI integration and workforce education. Moderna, Verizon, and Bank of America are noted for their AI literacy initiatives. Amazon's "Amazon AI Ready" program aims to train 2 million people globally by 2025, enhancing understanding of generative AI. Similarly, Northrop Grumman provides AI training universally to staff, underlining the importance of development opportunities for employee retention and skill relevance. Meanwhile, experts debate the impact of AI on job security, with some viewing Silicon Valley's promotion of AI as exaggerated optimism.
Microsoft takes down AI Model published by Beijing-based researchers without adequate safety checks - Microsoft's Beijing-based research group published an open source AI model, WizardLM-2, which was taken down hours later due to insufficient safety testing. The model, capable of generating text, suggesting code, translating languages, and solving math problems, was online for several hours, allowing potential downloads before deletion.
Logitech wants you to press its new AI button - Logitech has introduced the Logi AI Prompt Builder, a tool that integrates a dedicated AI button into Logitech mice and keyboards to interact with ChatGPT. This feature offers preset prompts or "recipes" for tasks such as rephrasing text, summarizing content, and even crafting replies to emails with a single button press. However, compatibility seems limited, as not all Logitech devices support the software, including models bought as recent as 2022. The Prompt Builder currently works exclusively with ChatGPT in English, though Logitech plans to expand its capabilities. Accompanying this tool is a new AI edition of the Logitech M750 mouse, which is also designed with a dedicated AI button. The release aims to enhance productivity but also appears to be a strategy for driving sales of Logitech's compatible devices.
Cisco debuts new AI-powered ‘HyperShield’ security system - Cisco has introduced HyperShield, an AI-enhanced security tool designed to fortify IT systems by converting various assets, such as virtual machines and Kubernetes clusters, into security checkpoints to prevent cyberattacks and unauthorized lateral movements. Leveraging eBPF standard and Nvidia's Morpheus AI framework, HyperShield employs data processing units for real-time threat detection and response, and automates processes like network segmentation and vulnerability management. Furthermore, it promises rapid deployment of fixes, with new vulnerabilities tested and solutions distributed network-wide within minutes. This launch happens amid increasing reliance on AI in cybersecurity, a field where Cisco is expanding its reach, demonstrated by the acquisition of Splunk and its commitment to developing AI-driven security, despite facing its own security breaches.
Why Many AI Startups are Consultancies Posing as Software Businesses - Generative AI hasn't taken off with businesses as expected due to high costs and difficulty of use. Startups in the field struggle to make revenue, with some acting as consulting firms. A group of startups has framed themselves as "data curation startups" and help customers generate synthetic data for specific tasks. However, this is a low-margin service and startups feel pressure to bring in customers and revenue now. Investors prefer familiar business models due to competition in AI.
Google merges the Android, Chrome, and hardware divisions - Google has announced a significant reorganization which combines the Android, Chrome, and Google hardware divisions into a single "Platforms and Devices” division led by SVP Rick Osterloh. This consolidation aims to improve product quality, speed up decision-making, and accelerate AI innovation across Google's ecosystem. Sundar Pichai, Google's CEO, emphasizes the merger's alignment with Google's focus on an AI-driven future, which will also encompass a segment of Google Research. The reorganization sees a strategic move towards increased collaboration with Qualcomm, as indicated by Osterloh's engagement with Qualcomm's CEO, despite previous tensions due to Google's development of its own chips. Sameer Samat steps up as president of the Android Ecosystem, leveraging his existing industry relationships, while indications suggest the historical "firewall" between Google's hardware efforts and Android partners will remain in place to alleviate concerns of favoritism.
AI now beats humans at basic tasks — new benchmarks are needed, says major report - The Artificial Intelligence Index Report 2024 by Stanford University reveals the rapid advancement of AI, challenging the relevance of existing benchmarks for system assessment. Advances in AI, such as ChatGPT, demand new evaluation methods for complex cognitive tasks. The AI Index, aiding researchers and policymakers, points to AI's evolving use in science, increased academic publications, and a surge in AI coding projects. However, the impressive performance of AI systems comes with soaring costs and energy consumption. Ethical concerns and regulatory interest are growing, emphasizing the need for standardized assessments to ensure responsible AI use. The report also highlights the potential for a global divide in attitudes towards AI and a future scarcity in training data.

Pinecone’s serverless vector database helps you deliver remarkable GenAI applications faster at up to 50x lower cost. Learn more about Pinecone serverless here.

Awesome Research Papers

Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention - The paper introduces a novel approach to scale Transformer-based Large Language Models (LLMs) for infinitely long inputs with bounded memory and computation. The proposed approach, called Infini-attention, combines a compressive memory mechanism with masked local attention and long-term linear attention in a single Transformer block. The authors demonstrate the effectiveness of their approach on long-context language modeling benchmarks, passkey context block retrieval, and book summarization tasks with 1B and 8B LLMs. The method introduces minimal bounded memory parameters and enables fast streaming inference for LLMs.

How faithful are RAG models? Quantifying the tug-of-war between RAG and LLMs' internal prior - iThe study examines how large language models (LLMs), such as GPT-4, balance their ingrained knowledge with external information retrieved during question-answering tasks. The research finds that when correct information is retrieved, LLMs correct most of their mistakes, achieving 94% accuracy. However, when faced with manipulated documents containing incorrect data, LLMs tend to echo these errors if their internal knowledge on the topic is weak. Conversely, a strong internal knowledge base makes an LLM more likely to disregard false external data. This tension between a model’s prior and new information underscores the importance of the reliability of retrieved sources in the performance of LLMs.

Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length - Megalodon is a new neural architecture designed for efficient sequence modeling, capable of managing unlimited context length. It builds on the Mega model by incorporating several enhancements such as complex exponential moving average (CEMA), timestep normalization, a normalized attention mechanism, and a pre-norm with a two-hop residual setup. In comparative studies, Megalodon outperforms both the traditional Transformer and other sub-quadratic models, like Llama2, in terms of training efficiency with a large dataset consisting of 7 billion parameters and 2 trillion tokens, achieving a middle ground training loss between competing models.

VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time - VASA is a cutting-edge framework designed to create highly realistic talking faces from a single image and an audio clip. Its model, VASA-1, achieves exceptional audio-visual synchronization, including nuanced facial expressions and natural head movements, enhancing the authenticity of generated faces. The model operates in a sophisticated face latent space, developed using video datasets, which allows for expressive and disentangled facial dynamics. VASA outperforms existing methods, providing high-quality video and real-time performance with minimal latency. These advancements enable more lifelike interactions with virtual avatars, mirroring human conversational behaviors.

Fast Abstractive Summarization with Reinforce-Selected Sentence Rewriting - The paper proposes a summarization model that first selects salient sentences and then rewrites them abstractively to generate a concise overall summary. This approach achieves state-of-the-art results on the CNN/Daily Mail dataset and significantly faster inference speed compared to previous long-paragraph encoder-decoder models. The model also demonstrates superior abstractiveness scores and higher scores on the test-only DUC-2002 dataset.

Many-Shot In-Context Learning - The abstract discusses the efficacy of Large Language Models (LLMs) in few-shot in-context learning (ICL) and explores the transition to many-shot ICL, which leverages larger sets of examples for significant performance improvement in various tasks. It introduces two novel ICL approaches to overcome the scarcity of human-generated examples: Reinforced ICL, utilizing model-generated rationales, and Unsupervised ICL, relying solely on domain-specific questions. The findings suggest both methods are effective, especially in complex reasoning. It also highlights many-shot ICL's ability to override pretraining biases and learn complex functions, while noting limitations in using next-token prediction loss to assess ICL performance.

Zamba — Zyphra - Introducing Zamba, a highly performant AI model developed by Zyphra, trained through a two-phase scheme using 1 trillion tokens from open web datasets, enhanced by a second phase of fine-tuning with 50 billion high-quality tokens. Despite using fewer tokens and less computational power compared to similar efforts, it achieves near state-of-the-art results. The seven-member team accomplished this using 128 H100 GPUs over 30 days. Zamba stands out for its open-source release of both the base and annealed models complete with training checkpoints, aimed at fostering transparency and aiding the research community in exploring model training dynamics. Zyphra is preparing a detailed paper on Zamba's dataset and hyperparameters while planning a Huggingface integration, further promoting accessibility to powerful 7B models under an Apache 2.0 open-source license.

Lingo-2 by Wayve AI - Wayve AI's LINGO-2 is a vision-language-action driving model that combines vision, language, and action to explain and control driving behavior. It's the first VLAM tested on public roads, generating real-time driving commentary and controlling vehicles. LINGO-2 has two modules: a vision model and an auto-regressive language model. It offers adaptive driving behavior, real-time AI interrogation, and live driving commentary, allowing it to adjust behavior based on language prompts, predict and respond to queries, and provide real-time explanations for its actions.

Mixtral 8x22B - Cheaper, Better, Faster, Stronger - The Mixtral 8x22B is a state-of-the-art sparse Mixture-of-Experts AI model characterized by using a selective 39B out of a total 141B parameters, enhancing both performance and efficiency. It exhibits multilingual fluency, advanced mathematics, and programming abilities, alongside a large token context for extracting detailed information. Integral to promoting open AI development, Mixtral 8x22B is released under the Apache 2.0 license, allowing unrestricted use. Moreover, its sparse activation outpaces dense 70B models while outstripping other open-weight models. Benchmark comparisons demonstrate Mixtral 8x22B’s superior reasoning, multilingual, and technical task handling capabilities, notably in open-source model performance.

Sponsor

Vultr is empowering the next generation of generative AI startups with access to the latest NVIDIA GPUs.

Try it yourself when you visit getvultr.com/forwardfutureai and use promo code "BERMAN300" for $300 off your first 30 days.

Awesome New Launches

Stable Diffusion 3 API Now Available - Stability AI has launched Stable Diffusion 3 and its Turbo variant on their Developer Platform API, boasting enhanced text-to-image capabilities, outshining competitors like DALL-E 3 and Midjourney v6 in terms of text adherence and typography, as per human preference assessments. The updated Multimodal Diffusion Transformer architecture significantly sharpens text comprehension and spelling in image generation. While currently API-accessible, further refinements are planned before the open release. A future option for self-hosting will be offered to Stability AI members. They also mention an early access opportunity for their Stable Assistant Beta, incorporating the latest image and language model innovations.

Check Out My Other Videos:

Reply

or to participate.