Forward Future Daily
Posts
🧑‍🚀 How AI’s Data Origins Are Fueling Big Tech’s Power Grab

🧑‍🚀 How AI’s Data Origins Are Fueling Big Tech’s Power Grab

Big Tech’s data dominance, OpenAI’s for-profit pivot, GitHub’s free Copilot, Drax’s wood-powered AI energy, and OpenAI’s AI hotline reshape the future of AI and accessibility.

Matthew Berman
December 19, 2024

Good morning, it’s Thursday! Today, we’re cutting through the noise on AI’s latest moves: Drax plans to fuel AI with wood-burning power plants, OpenAI pivots to a for-profit model, and GitHub gives developers free access to Copilot.

Plus, OpenAI’s new AI hotline is live—yes, you can now call ChatGPT from a landline (if you still have one…). Let’s get into it.

🗞️ YOUR DAILY ROLLUP

How AI’s Data Origins Are Consolidating Power in Big Tech

The Recap: A new study by the Data Provenance Initiative reveals that the data powering AI models is overwhelmingly sourced from a narrow set of platforms, particularly in the Western world, leading to a troubling concentration of power in a few dominant tech companies. This trend not only raises questions about fairness but also underscores the systemic barriers for emerging markets and independent developers in shaping AI innovation.

Highlights:

The Data Provenance Initiative audited nearly 4,000 public datasets, revealing over 70% of video data originates from YouTube, giving Alphabet (Google’s parent company) disproportionate control over critical AI training resources.
Data collection methods have shifted from curated, diverse sources in the 2010s to indiscriminately scraping the web, prioritizing scale over quality or diversity since 2018.
Exclusive data-sharing deals between AI giants like OpenAI and publishers further consolidate power, restricting access for smaller players and researchers.
Many datasets contain restrictive licenses, raising legal and ethical concerns, such as unknowingly training models on copyrighted material.
Over 90% of AI training data originates from North America and Europe, with fewer than 4% from Africa, highlighting a stark geographical imbalance.
The dominance of English in datasets risks perpetuating US-centric cultural biases in AI outputs, sidelining diverse global perspectives.
Researchers emphasize that the intentional design of datasets, rather than treating data as a “natural resource,” is crucial to mitigate these disparities.

Forward Future Takeaways:
The findings underscore an urgent need for transparency and equitable access to training data to prevent further concentration of AI development power. Without proactive steps to diversify data sources and ensure representation, AI risks amplifying Western-centric biases while sidelining smaller players and global voices. Policymakers, researchers, and the tech industry must collaborate to enforce ethical standards and foster a more inclusive AI ecosystem that serves the global community fairly. → Read the full article here.

👾 FORWARD FUTURE ORIGINAL

What’s Next for Enterprise AI?

What if you could ask your entire company's knowledge base a question and get an immediate, actionable answer? That's the future of Enterprise AI — and MindsDB is making it happen.

In our latest interview, MindsDB CEO Jorge Torres explores the evolution of enterprise AI, the transformative power of open source, and the real-world innovations redefining business operations. For leaders looking to harness AI or those curious about the next big wave in enterprise technology, Jorge provides valuable insights and guidance.

The Vision Behind MindsDB: Sci-Fi Inspiration Meets Enterprise Challenges

The inspiration for MindsDB stems from the sci-fi book The Player of Games by Iain M. Banks. As Jorge explains, the book introduces intelligent systems, or “Minds,” that collaborate with humans to achieve extraordinary goals. This vision of human-AI collaboration became the foundation for MindsDB’s mission.

“We were inspired by the idea that AI could work alongside humans to solve complex problems,” Jorge says. For enterprises, where data is scattered across countless systems, this collaboration is particularly crucial. MindsDB was built to bridge these silos, enabling businesses to surface answers, automate workflows, and transform raw data into meaningful insights.

Solving the Data Chaos Problem

Jorge frequently encounters what he calls the “data overload paradox.” As he explains, “The longer a company exists, the more data it collects. But sometimes having too much data is the same as having no data at all.”

Many businesses understand the potential of their data but struggle to access it efficiently. → Get the full interview here.

♟️ GOOGLE’S STRATEGY

Sundar Pichai on Google’s AI Strategy, Antitrust, and Future Challenges

The Recap: At the DealBook Summit, Google CEO Sundar Pichai discussed key challenges and opportunities for Google, including its position in the AI race, antitrust lawsuits, and the ethics of data usage in training AI models. Pichai expressed optimism about AI's transformative potential while defending Google's strategy in a rapidly changing tech landscape.

Highlights:

Google leverages proprietary infrastructure, products like YouTube and Gmail, and breakthroughs like Nobel Prize-winning research to lead in compute, data, and algorithms.
Pichai defended Google amidst ongoing lawsuits and hinted that some Alphabet subsidiaries could spin off in the future while maintaining faith in the judicial process.
Pichai expressed optimism about infrastructure improvements under the second Trump administration, seeing potential for AI progress in energy and tech.
Over 25% of new Google code is AI-generated but engineer-reviewed, potentially broadening programming accessibility while boosting productivity.
Pichai advocated leveraging existing regulatory frameworks rather than introducing excessive new laws to govern AI applications.
Pichai emphasized a mission-first workplace culture while addressing Google’s move away from political debates within the company.
Pichai acknowledged evolving business models around data use, suggesting a future where creators are compensated for contributions to AI training sets.

Forward Future Takeaways:
Google’s leadership in AI reflects its vast resources, advancements, and strategic approach but raises critical questions about ethics, fairness, and corporate power in tech. As regulatory scrutiny intensifies and AI's role in reshaping industries deepens, Google’s stance on balancing innovation with responsibility and societal impact will likely set a benchmark for the sector. → Read the full article here.

🛰️ NEWS

Looking Forward

nytimes.com

🛡️ Lockheed Launches Astris AI for Defense AI: Lockheed Martin creates Astris AI to help defense firms adopt artificial intelligence, addressing data sensitivity concerns in the sector.

⚡ Exxon Eyes Data Centers with New Power Plant: Exxon Mobil plans a natural gas power plant with 90% carbon capture to supply data centers, targeting tech-driven clean energy demand.

❤️‍🩹 Skidmore Students Explore AI in Therapy: A psychology seminar dives into AI in mental health, sparking debates on tech's potential to aid therapy and its human limits.

💻 Microsoft Leads NVIDIA AI Chip Purchases: Microsoft snapped up 485,000 NVIDIA Hopper chips in 2024—double Meta’s haul—as AI competition drives massive investment in server infrastructure.

🎨 Millionaire AI Artist Gets a Personality: Botto, a decentralized AI artist, evolves with community-guided taste and now conversational abilities, sparking debate on AI’s role in creativity.

🔬 RESEARCH PAPERS

Google DeepMind Unveils Transformative AI Research at NeurIPS 2024

At NeurIPS 2024, Google DeepMind is showcasing groundbreaking advancements in AI, including tools for adaptive agents, 3D content creation, and more efficient LLM training. Highlights include the introduction of AndroidControl, a dataset with 15,000 demos across 800 apps, enabling better AI agent interaction with user interfaces. CAT3D revolutionizes 3D modeling with single-image generation capabilities, and SDF-Sim simplifies how AI models understand and interact with objects, making applications like robotics more efficient.

Innovations in LLMs include Time-Reversed Language Models for enhanced responses and citations, and JEST, an algorithm optimizing training with minimal data. With over 100 papers and live demos, DeepMind emphasizes safer, smarter AI development while fostering global collaboration. → Read the full story here.

📽️ VIDEO

NVIDIA Unveils Nano Super Computer for Only $249

In Matt's latest video, he showcases Nvidia's Jetson Orin Nano, a groundbreaking AI supercomputer that delivers 70 trillion operations per second for just $249. Designed for edge computing, this device runs large language models locally, empowering robots, IoT devices, and more with powerful AI capabilities without relying on the cloud. Get the full scoop! 👇

🧰 TOOLBOX

Tools for Holidays, Productivity, and Creative Design

Tavus | AI Santa: Tavus' AI Santa offers personalized conversations, spreading festive cheer and customized messages for all ages.

Deta Surf | AIO Platform: Deta Surf merges browsing, file management, and AI assistance, offering seamless organization with local data privacy.

Pika 2.0 | Creativity Redefined: Pika 2.0's Scene Ingredients enable precise digital design, catering to artists with intuitive, community-driven tools.

🗒️ FEEDBACK

Help Us Get Better

What did you think of today's newsletter?

Reply to this email if you have specific feedback to share. We’d love to hear from you.

🤠 THE DAILY BYTE

SwagBot: The Robo-Cowboy Giving Herding a High-Tech Makeover

CONNECT

Stay in the Know

Follow us on X for quick daily updates and bite-sized content.
Subscribe to our YouTube channel for in-depth technical analysis.

Prefer using an RSS feed? Add Forward Future to your feed here.

Thanks for reading today’s newsletter. See you next time!

The Forward Future Team
🧑‍🚀 🧑‍🚀 🧑‍🚀 🧑‍🚀

Reply

or to participate.