Recently, the DeepSeek kerfuffle has dominated the AI news. But since late December, other things beyond future-success assertions have hit the news. What are they?
First,
“OpenAI Unveils New A.I. That Can ‘Reason’ Through Math and Science Problems”
(Cade Metz, The New York Times, December 20th). Its new product, o3, now in the hands of
“safety and security testers, outperformed the industry’s leading A.I.
technologies on standardized benchmark tests that rate skills in math, science,
coding and logic,” being over 20% more accurate than its predecessor “in a
series of common programming tasks.”
That area has received less publicity, but AI has been serving more and
more in human programmers’ core roles.
Yet, per OpenAI CEO Sam Altman, “at least one OpenAI programmer could
still beat the system on this test,” meaning AI has not taken over superiority
yet, and it “can still get things wrong or hallucinate.” But we’ve seen a big improvement here.
The upcoming main
OpenAI system, though, does not look as good, as “The Next Great Leap in AI Is
Behind Schedule and Crazy Expensive” (The Wall Street Journal, December
21st). Author Deepa Seetharaman
said about GPT-5, code named Orion, that “it isn’t clear when – or if – it’ll
work,” as “there may not be enough data in the world to make it smart enough.” This product, intended to succeed GPT-4 and
its variants, is being developed with what could soon be regarded as the old
way of building AI, with ever-more-gigantic datasets and similarly huge amounts
of electricity and processing power, in this case costing “around half a
billion dollars in computing costs alone.”
It’s now almost two years in the making, and still does not have even a
traditionally overoptimistic release date.
Moving to an
application, we saw “Platonic Romances and A.I. Clones: 2025 Dating
Predictions” (Gina Cherelus, The New York Times, January 3rd). The author predicted less conventional dating
and less use of conventional dating applications. AI may become “your ultimate wingman” as it may
be more frequently used to “write… profiles, edit photos and write entire
dialogues… on dating apps,” and “some will even use A.I. clones to do the whole
thing for them” as people have done with structurally-similar job applications. As well, people may “use A.I. dating coaches
to practice chats before a date, help them come up with conversation topics and
suggest preplanned date ideas in their cities.”
At that point, AI had better be able to produce unique output streams,
since it wouldn’t be much fun to be able to anticipate what a prospective
romantic partner is next going to say, word-for-word.
Among our
president’s immediate proclamations was “Trump announces largest AI
infrastructure project ‘in history’ involving Softbank, OpenAI and Oracle”
(Brock Dumas, Fox Business, January 21st). Despite those company’s CEOs joining “Trump
from the Roosevelt Room at the White House for the announcement,” there is
doubt whether they will meet the cost of “$100 billion, with plans to expand to
$500 billion over the next four years” – and if they do, the project may serve
only as a framework for capital expenditures they were planning to make anyway.
Another jump
in actually-available AI capability occurred with “OpenAI launches Operator –
an agent that can use a computer for you” (Will Douglas, MIT Technology
Review, January 23rd).
The software “can carry out simple online tasks in a browser, such as
booking concert tickets or filling an online grocery order.” There are already similar tools at Anthropic
(Claude 3.5 Sonnet) and Google DeepMind (Mariner). This one may be a cause for security worry,
as there are plenty of ways of initiating physical action from a keyboard, so
these apps will need to be constrained somehow.
“When A.I.
Passes This Test, Look Out” (Kevin Roose, The New York Times, January 23rd). It has been produced by the Center for AI
Safety and Scale AI, and it is called “”Humanity’s Last Exam.”” It has “roughly 3,000 multiple-choice and
short answer questions designed to test A.I. systems’ abilities in areas
ranging from analytic philosophy to rocket engineering,” which are “”along the
upper range of what one might see in a graduate exam.”” If researchers can sufficiently restrict
sharing these questions and their answers, this exam might last a while without
AI solution, or it could topple within a year or two.
A new look at an old subject, “Is Artificial
Intelligence Really Worth the Hype?” was written by Jeff Sommer and published
February 7th in the New York Times. Its main cause for concern was about
DeepSeek, and, as of this date, investors were “re-evaluating prominent
companies swept up in A.I. fever, including Nvidia, Meta, Alphabet, Microsoft,
Amazon, Tesla and the private start-up OpenAI.”
In the week since this piece came out, there have been serious concerns
voiced about the legitimacy and credibility of DeepSeek’s claim, and, since
there is no other focus of unease in this article, we can’t accept it yet. There are still plentiful reasons to think AI
will not even approximate its highest expectations, and there will be more
stories with similar titles, but the DeepSeek controversy – and that’s what it
is – has not been resolved yet. Neither
has a clear vision of what AI will be doing, and not doing, even a few years
from now. We’ll see – that’s all.
No comments:
Post a Comment