“Harvard Unleashes Humongous AI Training Dataset, Where OpenAI and Microsoft Foot the Bill, At Zero Cost to You!”
“Harvard Is Releasing a Massive Free AI Training Dataset Funded by OpenAI and Microsoft”
“Microsoft and OpenAI created a large-scale dataset to teach AI systems language ‘understanding.’ To do that, they scraped information from much of the internet. But in doing so, they ingested substantial amounts of copyrighted information, including, it turns out, some 136,000 pieces of text from the Harvard Crimson.”
Ah! The delightful marriage of tech giants Microsoft and OpenAI! You know, the ones who cooked up a gigantic dataset in the name of teaching artificial intelligence to ‘comprehend’ human language. The method was fairly simple: just make a virtual run on most of the internet. Although, it does seem like they accidentally stomped over a sea of copyrighted content in the thrilling race to gather data. Among the collateral we find about 136,000 tidbits from Harvard Crimson. Oops!
The publisher of Harvard Crimson, by the way, was pleasantly surprised to see their precious content comfortably residing in the dataset. Surprised in the “Hey, don’t I know you from somewhere?” manner! To quote them, “It’s as if they held a mirror up to the internet and our websites just happened to be reflected.” Quite poetic, eh?
Now, by the generous ethos of OpenAI, this dataset, adorably christened ‘GPT-3’, is publicly available. Not without certain restrictions, of course! So folks, do not rush for a joyous download spree imagining you would suddenly own the internet. Fair play, Microsoft and OpenAI. Keeping the commons to the commoners.
Here’s the cherry on top – the GPT-3, as it turns out, isn’t as impeccable as it touted to be. So brace yourself. To the delight of all the fans waving their ‘Tech Will Not Overcome Human Intelligence’ banners, GPT-3 has its limitations. It fails to discern facts from fiction, occasionally produces biased content and, heaven forbid, even promotes dangerous behaviour.
Tailor-made for this fast-paced digital era, isn’t it? Another manifestation of – if you’re going to err, do it at scale. Stellar work, OpenAI and Microsoft! The world thanks you for this enlightening revelation: with great data, comes great responsibility. And a higher probability of blunders, just to keep things amusing. Let’s not forget some outraged Harvard Crimson editors in the package. Well done, Artificial Intelligence. Well done, indeed.