Friday, February 14, 2025

Artificial Intelligence Progress, Problems, and Perceptions: Two Months’ Worth

Recently, the DeepSeek kerfuffle has dominated the AI news.  But since late December, other things beyond future-success assertions have hit the news.  What are they?

First, “OpenAI Unveils New A.I. That Can ‘Reason’ Through Math and Science Problems” (Cade Metz, The New York Times, December 20th).  Its new product, o3, now in the hands of “safety and security testers, outperformed the industry’s leading A.I. technologies on standardized benchmark tests that rate skills in math, science, coding and logic,” being over 20% more accurate than its predecessor “in a series of common programming tasks.”  That area has received less publicity, but AI has been serving more and more in human programmers’ core roles.  Yet, per OpenAI CEO Sam Altman, “at least one OpenAI programmer could still beat the system on this test,” meaning AI has not taken over superiority yet, and it “can still get things wrong or hallucinate.”  But we’ve seen a big improvement here.

The upcoming main OpenAI system, though, does not look as good, as “The Next Great Leap in AI Is Behind Schedule and Crazy Expensive” (The Wall Street Journal, December 21st).  Author Deepa Seetharaman said about GPT-5, code named Orion, that “it isn’t clear when – or if – it’ll work,” as “there may not be enough data in the world to make it smart enough.”  This product, intended to succeed GPT-4 and its variants, is being developed with what could soon be regarded as the old way of building AI, with ever-more-gigantic datasets and similarly huge amounts of electricity and processing power, in this case costing “around half a billion dollars in computing costs alone.”  It’s now almost two years in the making, and still does not have even a traditionally overoptimistic release date.

Moving to an application, we saw “Platonic Romances and A.I. Clones: 2025 Dating Predictions” (Gina Cherelus, The New York Times, January 3rd).  The author predicted less conventional dating and less use of conventional dating applications.  AI may become “your ultimate wingman” as it may be more frequently used to “write… profiles, edit photos and write entire dialogues… on dating apps,” and “some will even use A.I. clones to do the whole thing for them” as people have done with structurally-similar job applications.  As well, people may “use A.I. dating coaches to practice chats before a date, help them come up with conversation topics and suggest preplanned date ideas in their cities.”  At that point, AI had better be able to produce unique output streams, since it wouldn’t be much fun to be able to anticipate what a prospective romantic partner is next going to say, word-for-word.

Among our president’s immediate proclamations was “Trump announces largest AI infrastructure project ‘in history’ involving Softbank, OpenAI and Oracle” (Brock Dumas, Fox Business, January 21st).  Despite those company’s CEOs joining “Trump from the Roosevelt Room at the White House for the announcement,” there is doubt whether they will meet the cost of “$100 billion, with plans to expand to $500 billion over the next four years” – and if they do, the project may serve only as a framework for capital expenditures they were planning to make anyway.

Another jump in actually-available AI capability occurred with “OpenAI launches Operator – an agent that can use a computer for you” (Will Douglas, MIT Technology Review, January 23rd).  The software “can carry out simple online tasks in a browser, such as booking concert tickets or filling an online grocery order.”  There are already similar tools at Anthropic (Claude 3.5 Sonnet) and Google DeepMind (Mariner).  This one may be a cause for security worry, as there are plenty of ways of initiating physical action from a keyboard, so these apps will need to be constrained somehow.

“When A.I. Passes This Test, Look Out” (Kevin Roose, The New York Times, January 23rd).  It has been produced by the Center for AI Safety and Scale AI, and it is called “”Humanity’s Last Exam.””  It has “roughly 3,000 multiple-choice and short answer questions designed to test A.I. systems’ abilities in areas ranging from analytic philosophy to rocket engineering,” which are “”along the upper range of what one might see in a graduate exam.””  If researchers can sufficiently restrict sharing these questions and their answers, this exam might last a while without AI solution, or it could topple within a year or two.

 A new look at an old subject, “Is Artificial Intelligence Really Worth the Hype?” was written by Jeff Sommer and published February 7th in the New York Times.  Its main cause for concern was about DeepSeek, and, as of this date, investors were “re-evaluating prominent companies swept up in A.I. fever, including Nvidia, Meta, Alphabet, Microsoft, Amazon, Tesla and the private start-up OpenAI.”  In the week since this piece came out, there have been serious concerns voiced about the legitimacy and credibility of DeepSeek’s claim, and, since there is no other focus of unease in this article, we can’t accept it yet.  There are still plentiful reasons to think AI will not even approximate its highest expectations, and there will be more stories with similar titles, but the DeepSeek controversy – and that’s what it is – has not been resolved yet.  Neither has a clear vision of what AI will be doing, and not doing, even a few years from now.  We’ll see – that’s all.

No comments:

Post a Comment