OpenAI transcribed Google's YouTube videos to train AI models: Report

OpenAI transcribed Google's YouTube videos to train AI models: Report


OpenAI reportedly transcribed over one million hours of YouTube videos to collect training data for its advanced GPT-4 model, disregarding the Google-owned platform’s copyright rules. According to a report by The New York Times, Microsoft-backed OpenAI used an indigenous speech recognition tool called Whisper to transcribe audio from YouTube videos to yield conversational text, which was then used to train the AI model that powers ChatGPT.


According to the report, makers of ChatGPT internally discussed on how the use of YouTube data for training might be against the platform’s policy. The company, reportedly, opted to use YouTube videos’ data as it had exhausted the reservoir of publicly available data. The report stated that OpenAI’s president, Greg Brockman, personally assisted in selecting videos for transcription.


Google prohibits the use of videos posted on YouTube for applications that are “independent” of the video platform.


In a statement to The Verge, OpenAI spokesperson, Lindsay Held, said that the company uses “unique” datasets for each of its models to “help their understanding of the world”. She added that the company uses  “numerous sources including publicly available data and partnerships for non-public data.”


Commenting on the topic, Google spokesperson, Matt Bryant told The Verge that Google has “seen unconfirmed reports” related to OpenAI using YouTube videos for training AI models. He added that the streaming platform’s “Terms of Service and robots.txt files prohibit unauthorised scraping or downloading of YouTube content.”


Earlier this week, YouTube CEO Neal Mohan in an interview with Bloomberg said that “he has seen reports” related to OpenAI using YouTube videos to train their text-to-video generator Sora. He said that he has no information about the same, but it would be a “clear violation” of the platform’s policies if it did.


According to the report by The New York Times, Google has also used transcribed texts from YouTube videos for training its AI model Gemini. If true, this violates the copyright to the videos, which belongs to the creator who posts the video to the platform. The report stated that Google broadened its terms of service to allow the company to be able to use publicly available Google Docs files, restaurant reviews on Google Maps, and more for training AI models.


 


 


 

First Published: Apr 08 2024 | 12:07 PM IST





Source link

Apple to launch iPad Pro, Air in May; working on foldable models for future

Apple to launch iPad Pro, Air in May; working on foldable models for future


Representative Image: iPad Pro with M2

Apple is reportedly on schedule to launch the iPad Pro and iPad Air on May 6. Later in the year, these two models would be joined by the iPad mini and the iPad as Apple plans to refresh the entire iPad portfolio by the end of 2024. The US-based technology giant is also exploring foldable form factor, but the work on this is in its early stages and not planned for 2024.


iPad Pro and iPad Air: What to expect


Among the core upgrades for the iPad Pro and iPad Air would be the M-series chip. The iPad Pad is expected to boast the latest M3 chip, and the iPad Air would feature the M2 chip.


Bloomberg has reported that Apple is planning to launch the new iPad models on May 6 along with a new Magic Keyboard and an Apple Pencil. According to the report, Apple retail stores are expected to receive product marketing materials for the upcoming iPad Pro and iPad Air models by the end of this week, suggesting that the products might launch in the coming weeks.


According to Bloomberg’s Mark Gurman, the upcoming iPad Pro would likely get a major overhaul in both specifications and design. OLED display is said to be the new standard on the iPad Pro that could result in a price jump, compared to the current-generation model.


iPad Pro (2024): Expected specifications


  • Processor: M3 chipset

  • Display: OLED display (11-inch / 13-inch), thinner bezels

  • Design: Redesigned rear camera module, front camera in landscape orientation

  • Other: MagSafe wireless charging support


iPad Air (2024): Expected specifications


  • Processor: M2 chipset

  • Display: New 12.9-inch display option

  • Design: Redesigned rear camera module

  • Other: Wi-Fi 6E and Bluetooth 5.3 support


iPad and iPad mini: What to expect


Besides iPad Pro and iPad Air, Apple is planning to launch the iPad mini and iPad. These two, however, are set for release later this year. According to Bloomberg’s report, both of these devices are not expected to be major updates. The upcoming iPad mini would likely only get a processor upgrade and no design changes, and the base-line iPad would likely be an affordable version based on the 10th generation model from 2022.


Foldable iPad


Apple is reportedly exploring the prospect of foldable iPads. According to Bloomberg, the company is actively working on this project. Said to be in its early stages of development, Bloomberg reported, Apple is currently figuring out ways to create foldable screens without any visible crease and might even cancel the project if it is unable to solve the problem.


Earlier, analyst Ming Chi Kuo also reported that the company  has a “clear development schedule” for a 20.3-inch foldable MacBook, which he expects to enter mass production by 2027. Another technology insider, Revegnus, posted on X that the upcoming foldable product by Apple will combine the MacBook form factor with the functionality of an iPad. He said that the device would function like a MacBook and can be used as an alternative to the iPad when folded.

First Published: Apr 08 2024 | 11:14 AM IST



Source link

How tech giants cut corners to harvest data for artificial intelligence

How tech giants cut corners to harvest data for artificial intelligence


By Cade Metz, Cecilia Kang, Sheera Frenkel, Stuart A Thompson & Nico Grant

In late 2021, OpenAI faced a supply problem. The artificial intelligence lab exhausted every reservoir of reputable English-language text on the internet as it developed its latest AI system. It needed more data to train the next version of its technology — lots more. So OpenAI researchers created a speech recognition tool called Whisper.


It could transcribe the audio from YouTube videos, yielding new conversational text that would make an AI system smarter. Some OpenAI employees discussed how such a move might go against YouTube’s rules, three people with knowledge of the conversations said.


YouTube, which is owned by Google, prohibits use of its videos for applications that are “independent” of the video platform.


Ultimately, an OpenAI team transcribed more than one million hours of YouTube videos, the people said. The team included Greg Brockman, OpenAI’s president, who personally helped collect the videos, two of the people said.


The texts were then fed into a system called GPT-4, which was widely considered one of the world’s most powerful AI models and was the basis of the latest version of the ChatGPT chatbot. The race to lead AI has become a desperate hunt for the digital data needed to advance the technology.


To obtain that data, tech companies including OpenAI, Google and Meta have cut corners, ignored corporate policies and debated bending the law, according to an examination by The New York Times.


At Meta, which owns Facebook and Instagram, managers, lawyers and engineers last year discussed buying the publishing house Simon & Schuster to procure long works, according to recordings of internal meetings obtained by The Times. They also conferred on gathering copyrighted data from across the internet, even if that meant facing lawsuits. Negotiating licenses with publishers, artists, musicians and the news industry would take too long, they said.


Like OpenAI, Google transcribed YouTube videos to harvest text for its AI models, five people with knowledge of the company’s practices said.

That potentially violated the copyrights to the videos.


Last year, Google also broadened its terms of service. One motivation for the change, according to members of the company’s privacy team and an internal message viewed by The Times, was to allow Google to be able to tap publicly available Google Docs, restaurant reviews on Google Maps and other online material for more of its AI products.


The companies’ actions illustrate how online information — news stories, fictional works, message board posts, Wikipedia articles, computer programmes, photos, podcasts and movie clips — has increasingly become the lifeblood of the booming AI industry.


Creating innovative systems depends on having enough data to teach the technologies to instantly produce text, images, sounds and videos that resemble what a human creates. The most prized data, AI researchers said, is high-quality information, such as published books and articles, which have been carefully written and edited by professionals. For years, the internet — with sites like Wikipedia and Reddit — was a seemingly endless source of data. But as AI advanced, tech firms sought more repositories. Google and Meta, which have billions of users who produce search queries and social media posts every day, were limited by privacy laws and their policies from drawing on much of that content for AI.


Tech companies could run through high-quality data on the internet by 2026, according to Epoch, a research institute.  The firms are using data faster than it is being produced. “The only practical way for these tools to exist is if they can be trained on massive amounts of data without having to license it,” Sy Damle, a lawyer who represents Andreessen Horowitz, a Silicon Valley venture capital firm, said of AI models last year.


©2024 The New York Times News

First Published: Apr 07 2024 | 11:33 PM IST



Source link

Data centre operator Yotta looks to expand its GPU capacity March 2025

Data centre operator Yotta looks to expand its GPU capacity March 2025



Data centre operator Yotta is planning to further expand its compute capacity to 32,000 graphic processor units (GPUs) from Nvidia by March 2025, months after placing orders for 16,000 high-powered GPUs from the US-based chip giant.


“We are having a soft commitment from Nvidia, and these chips will be delivered to us by 2025,” said Sunil Gupta, chief executive officer, Yotta Data Services.


The company has already invested around $1 billion to procure 16,000 H100 GPUs and develop infrastructure for servers, including associated network, storage, software, and additional layers. While Gupta did not give details of the funds required for the additional chips, he added that the capital requirements for the project will be backed by the promoters and they may also look for financing from banks.


The Hiranandani group company’s first GPU-based data centre, which is expected to go live by May 15, 2024, will provide compute capacity for Artificial Intelligence (AI) processing, powered by the first batch of 4096 GPUs it has received.


Gupta, who is bullish on the market demand for GPUs in India, said that the market for high-performance compute in India is expected to grow big, and these were still early days for the sector in India.


On the demand front for GPU servers, he said that Yotta’s first slot of more than 4,000 GPUs was already booked by enterprises and is expected to go live by May 15, 2024.


He also said that apart from serving India, the company was also looking to cater to the global demand.


“We can serve not only our respective country but also the nearby geographies. So our growth pattern in terms of sourcing GPUs possibly will not stop, simply because we are just starting,” he said.


Gupta further said that because of the global shortage of GPUs, there was a demand from European regions as well. “I can see the demand going up globally in the last four to five months, and a bigger part of my sales funnel today is actually requirements from Europe, the APAC region, and from the Middle East,” he added.


The company is aiming to compete with hyperscalers like Amazon and Meta by providing GPU servers at a competitive price of $2 – $2.5 per hour.


Contrasting it with the global rates of GPU servers, Gupta said that the company was looking at making money through economies of scale and right pricing.


“If you go to hyperscalers today, which many of the startups have done, you will get GPUs at somewhere between $9 to $12 per hour. Similarly, in the US, for specialised GPU providers like Corebeam and Lambda, the prices range between $3 to $5. My price point for the same thing is between $2 to $2.5,” Gupta said.


He added that depending on the contract, pricing can vary, where short-term GPU usage might cost more per hour or week, whereas committing to a longer contract with upfront payments could bring the price down to as low as $1.8 per hour of usage.


Gupta said that while traditional cloud and hosting services were typically taken by enterprises, large cloud operators like hyperscalers who wanted co-location services, these were not the market for GPU-servers, at least in the first phase of GPU subscription.


“For the next year or so the GPU subscription was being taken by entities such as startups, or research labs – who wanted to train AI models,” he added.


He also highlighted the need for ‘intermediaries’ in between the data servers and enterprises who can help the companies understand the use cases and changes that AI can bring to the organisation.


“We require companies like Accenture, Deloitte, and others to engage with enterprises, grasp their business situations and challenges, and demonstrate how AI could potentially boost their productivity. Subsequently, they can identify use cases, utilise pre-trained models, train them using enterprise data to tailor specific models for each enterprise, and integrate these models into their existing applications,” he explained.

First Published: Apr 07 2024 | 5:37 PM IST



Source link

AI can drive efficiency, raises mkt concentration concerns: CCI chief Kaur

AI can drive efficiency, raises mkt concentration concerns: CCI chief Kaur


Kaur also stressed that the regulator is closely monitoring these developments to ensure that the competition framework remains robust and capable of addressing these new dynamics | File image


Artificial intelligence and machine learning can drive efficiency and innovation but also raise concerns related to market concentration and potential anti-competitive behaviour, according to CCI chief Ravneet Kaur.


The watchdog CCI, which has the mandate to curb anti-competitive ways and foster fair competition, will soon commission a study to look at all aspects of artificial intelligence (AI).


The request for proposal (RFP) for inviting agencies to conduct the study is expected to be issued in the coming weeks.


In a recent interview with PTI, the Competition Commission of India (CCI) Chairperson Ravneet Kaur said the study will also look at how AI can also be used by the regulator.


“The rise of AI and Machine Learning (ML) presents both opportunities and challenges in the context of competition law.


“These technologies can drive efficiency, personalisation and innovation but also raise concerns related to market concentration and potential anti-competitive behaviour,” she said.


Kaur also stressed that the regulator is closely monitoring these developments to ensure that the competition framework remains robust and capable of addressing these new dynamics.


About the study, she said the aim is to explore how AI is reshaping market dynamics, pinpoint challenges AI poses to existing competition law frameworks and formulate policies that address AI’s implications on competition effectively.


The CCI has been taking enforcement actions and advocacy measures, among other activities to tackle anti-competitive practices.

(Only the headline and picture of this report may have been reworked by the Business Standard staff; the rest of the content is auto-generated from a syndicated feed.)

First Published: Apr 07 2024 | 1:16 PM IST



Source link

China may misuse AI to target polls in countries like India, US: Microsoft

China may misuse AI to target polls in countries like India, US: Microsoft



China is likely to deploy Artificial Intelligence-generated content via social media to sway public opinion to boost its geopolitical interests during elections in countries like India, South Korea and the US, tech giant Microsoft has warned.

Voting for 543 Lok Sabha seats in India will take place between April 19 and June 4, spread across seven phases. South Koreans will go to the polls in a general election on April 10 while the

US will hold the Presidential election on November 5.


“With major elections taking place around the world this year, particularly in India, South Korea and the United States, we assess that China will, at a minimum, create and amplify AI-generated content to benefit its interests,” Clint Watts, General Manager, Microsoft Threat Analysis Center, said in a blog post.


Despite the chances of such content in affecting election results remaining low, China’s increasing experimentation in augmenting memes, videos, and audio will likely continue and may prove more effective down the line, he said.


China will do it along with North Korea, he wrote.


These are among the Microsoft Threat Intelligence insights in the latest East Asia report published on Wednesday by the Microsoft Threat Analysis Center (MTAC).


China is using fake social media accounts to poll voters on what divides them most to sow division and possibly influence the outcome of the US presidential election in its favour.


China has also increased its use of AI-generated content to further its goals around the world.


North Korea has increased its cryptocurrency heists and supply chain attacks to fund and further its military goals and intelligence collection. It has also begun to use AI to make its operations more effective and efficient.


Beijing will celebrate the 75th anniversary of the founding of the People’s Republic of China in October, and North Korea will continue to push forward key advanced weapons programmes, the report said.


“Meanwhile, as populations in India, South Korea, and the United States head to the polls, we are likely to see Chinese cyber and influence actors, and to some extent, North Korean cyber actors, work toward targeting these elections,” it said.


China will, at a minimum, create and amplify AI-generated content that benefits its positions in these high-profile elections, it said.


“While Chinese cyber actors have long conducted reconnaissance of US political institutions, we are prepared to see influence actors interact with Americans for engagement and to potentially research perspectives on US politics,” the report said.


“Finally, as North Korea embarks upon new government policies and pursues ambitious plans for weapons testing, we can expect increasingly sophisticated cryptocurrency heists and supply chain attacks targeted at the defence sector, serving to both funnel money into the regime and facilitate the development of new military capabilities,” it added.

(Only the headline and picture of this report may have been reworked by the Business Standard staff; the rest of the content is auto-generated from a syndicated feed.)

First Published: Apr 06 2024 | 1:37 PM IST



Source link

YouTube
Instagram
WhatsApp