Charles Q. Choi writes: “Artificial intelligence chatbots such as ChatGPT and other applications powered by large language models have found widespread use, but are infamously unreliable. A common assumption is that scaling up the models driving these applications will improve their reliability—for instance, by increasing the amount of data they are trained on, or the number of parameters they use to process information. However, more recent and larger versions of these language models have actually become more unreliable, not less, according to a new study,” largely because recent models are less likely to acknowledge if they don’t know an answer for certain.