Stop AI Deception

WAR OF THE WORLDS OF LLM. TOP 10 ADVANCED TEXT NEURAL NETWORKS.

16/05/2025
Updated on 06/10/2025

LLM (Large Language Model) begins its history in the 1960s and today claims the role of a fully indispensable assistant to humans in all areas of our lives.

It is called “large” because it contains more than 70 billion numerical parameters in its information storage (not to be confused with active parameters per token). There are also small LMs, but they are narrowly specialized.

Parameters, in the generally accepted sense, are certain variables that convert tokens (for example, words) into numerical values and vectors. Parameters regulate the model’s attention weights to tokens through the contextual connection of these tokens (words). For example, the words “chicken” and “egg” are two tokens. The close vector orientation of the tokens “chicken” and “egg” allows the neural network to understand us and respond by evaluating the closeness of the vectors of our query and based on that selecting closely oriented vectors for the answer. For example, “chicken” – “chick”, “boil” – “fry”.

A large number of parameters allows the model to have a more flexible system of weights – the strength of attention to tokens in a cluster of semantically close vectors.

In early LLMs, where stylization often characterized the answers and was perceived as an indicator of "intelligence" (ChatGPT 3.5, Grok 2 and others), we received responses close in style to our request. If we addressed the neural network with the question: “Are you aware, O wisest intellect, of a certain secret sealed with seven seals, hidden in the expanses of the Internet?”, then the LLM would reply: “Oh yes, traveler, I am aware of this secret. And I can tell it to you.” Such an answer could seem like the result of an emotional outburst, but in fact it is only a bundle of vectors, maximally close to the bundle of vectors of our question. At the same time, the word “traveler” in this example is the consequence of the model being trained on texts of fairy tales and myths with the corresponding style of speech.

In modern LLMs the priority of stylization has become outdated. Now the models consist of a large number of specialized neural networks, supplemented by modules that check the answers for compliance with the criteria of accuracy, informativeness, and appropriateness. The neural network will not respond in a stylized manner if you previously set it up for another format of communication or regularly use it for solving specific tasks. In such a case the model will look for a more specialized answer and is more likely to give an ordinary explanation, clarifying what exactly you mean.

This testifies to significant progress in weight adjustment and the universalization of token search. Such processes are usually called scaling. They not only increase the accuracy of vector representation selection through fine adjustment of weights, but also expand their interchangeability. In simple words, the neural network uses only those tokens that most accurately correspond to the request, without going through all possible variants. Such a selection principle contributes to the speed and adequacy of the answer, instead of adapting the model to style, which does not always give the correct result. This principle also partly solves the problem of excessive "agreeing" with the user in disputable situations, when it is required to choose the correct option, although the user himself rejects it.

Such changes in the architecture of neural networks are happening rather quickly. Scaling in one direction opens possibilities for the development of other aspects of training and adjustment. This brings us closer to the stage of development of self-learning neural networks, one of which has already been quite well tuned by Meta.

And let scientists successfully continue to toil over the development of new ways of contextual linking of tokens, while we look at what already works excellently and pleases us with streams of the freshest information at a convenient time.

The LLM market has seen an unprecedented growth in investments in research on these very methods of contextual connections over the past two years.

Setting aside promises and using financial publications as sources, real investments in artificial intelligence from 2023 to 2025 amounted to about $200 billion, of which the USA invested $109 billion, Europe $3 billion (France the most with $1.3 billion), Israel $12 billion, and China with others added the rest.

We will not dwell too much on funding issues but will look at the results so far in freely accessible limited neural networks.

10. LLAMA 4

I sincerely give this spot to the LLM model Llama 4 from Meta AI — the AI division of the owner of a popular social network. It is presented as two modifications: Llama 4 Scout and Llama 4 Maverick (more powerful). This model was released in April 2025 and immediately went on the offensive, claiming to surpass GPT-4o, Gemini 2.0 Flash, and Claude Sonnet 3.7, citing the number of experts (specialized neural subnetworks of the model — MoE) as a significant argument — 16 in the first and 128 in the second. Both have 17 billion active parameters per token. Since it is still not very popular online and does not provide access to all geographic regions, it takes the last place in our list. Moreover, the release of the most powerful model, Llama 4 Behemoth, is in doubt, while competitors have already left it behind.

Personal experience: it was used several times in 2024, but did not show any significant results either in coding or in computations. For now, Meta AI is experiencing failures that prevent it from developing its neural networks. However, the active growth of Meta’s AI division through hiring employees from other companies looks promising.

In addition, Meta is developing the V-JEPA 2 project — a video-trained model for visual understanding and prediction. The company states that this model is designed for controlling robots in real-world environments. At the same time, Nvidia already has a decent COSMOS, which trains robots to navigate the real world.

The V-JEPA 2 project has a strong competitor — ROBOBRAIN, an open-source neural model for spatial reasoning and long-term task planning.

9. KOPILOT

Positioned as an AI assistant based on the Microsoft Graph API from the software developer Microsoft. Microsoft Graph has access to user data from Microsoft software and related products, which contributes to context personalization and increased efficiency. Chat GPT 4o from OpenAI is used for neural computations. Recently, it has become indispensable when using Microsoft services.

In fact, Microsoft has created a full-fledged and useful AI agent that can check your mail, remind you of tasks from your calendar, and perform routine work tasks that we all get tired of. It has a built-in voice function.

8. MIXTRAL LECHAT

An LLM from the French startup Mistral AI. Integration with Gmail and Google Calendar can be used. Through the Hugging Face platform, simple Mistral AI models with open weights are available, but the company has closed its core developments. It is steadily developing with a focus on business projects.

Personal experience: at first the model provides more accurate answers (coding, computations), but with continued use it increasingly starts to repeat mistakes and eventually becomes somewhat sluggish. The interface is user-friendly, there are no “stuck” questions, and it has standard tools: voice input, web search, reasoning, code and diagram generator, image generation, canvas. It also includes a set of preconfigured agents and the ability to create document libraries directly in the interface.

7. DEEPSEEK

The v3 model of the eponymous Chinese company caused a lot of hype in early 2025 by presenting a training technology significantly cheaper than American analogs — about 6 million dollars. The company never fully revealed its technology, but the results of DeepSeek models correspond to the average level of modern LLMs. The specifications declared by the company are approximately MoE 37 billion active parameters per token, 61 layers with an attention mechanism of 128 heads with dimensionality 128.

/Such a mechanism works like a department in an office. The department is the layer, and the heads are the employees specializing in the question from different points of view. To answer a question, they use different keys/values, which allows giving more accurate answers with less effort — by turning off the work of non-core employees.

The head dimensionality (d_k) is the projection of the token vector onto three spaces: queries (Q), keys (K), and values (V) for independent identification of different aspects of relationships between “egg” and “chicken”.

The voice function of this neural network can work through third-party applications. The company has also released the DeepSeek-Coder model, trained from scratch on 2 trillion tokens (87% code and 13% natural language) in English and Chinese, with support for 338 programming languages. The company also has a reasoning model r1, and the release of r2 is expected, though it was postponed due to training issues in May 2025.

Personal experience: an average model that can hallucinate heavily but can also work steadily. It does not make significant breakthroughs, but the feature of the neural network choosing communication styles is quite developed and sometimes very surprising in its variety. Suitable for people who like to discuss various philosophical and original topics.

6. QWEN 3

A decent MoE-model from Alibaba Cloud. It has several variants, but the best one now is Qwen3-235B-A22B — the flagship third-generation Qwen model, built on the MoE (Mixture of Experts) architecture with 235 billion parameters and 22 billion activated vectors. 128 experts in total, 8 of them active. It also features voice chat and video chat.

It is reported to have 94 layers — blocks for determining contextual relationships of a token, essentially small neural networks inside the larger one, which help the token decide how close other tokens are to it, for example how close the word “chicken” is in contextual meaning (in a specific sentence) to the word “egg.”

The model also uses the Grouped Query Attention (GQA) mechanism — grouped query attention with 64 heads for queries (Q) and 4 heads for keys/values (KV).

Personal experience: a fairly working neural network for coding and calculations, but still there is some misunderstanding of simple tasks. In some questions, the neural network performs even better than others, but the overall picture is doubtful. The interface has a problem with “sticking” questions. This is when the previous question is sent in the query instead of the newly written one.

5. PERPLEXITY AI

An advanced neural search engine with deep search Research, which is in fact one of the best on the market. It uses multiple LLMs, including GPT-5, Claude 4 Sonnet, Grok 4, Gemini 2.5 Pro, as well as its own Sonar (based on LLaMA) and R1 1776 (a modification of DeepSeek R1). You wouldn’t call it a pure neural network, but as an excellent AI-agent search engine built on top of cutting-edge neural models — absolutely. It autonomously makes dozens of queries, analyzes hundreds of sources, and generates a structured report in just a couple of minutes. It also has its own Comet web browser based on Chromium. A voice mode is available.

Personal experience: I used it often before the release of various search tools in advanced models. It was very helpful in finding technical information in game development and 3D graphics environments. Now it is used less often, as it returns many broken links, but it is still recommended for work.

And here are the top four, fiercely competing for first place, if not in all categories, then at least in their niches.

4. ANTHROPIC CLAUDE

According to many benchmarks such as SWE-Bench, TAU-bench Retail, and GPQA Diamond, the Claude 4 Sonnet model has outperformed OpenAI's o3-mini reasoning model. This is also a closed technology that Anthropic is in no hurry to share, but judging by the results, it is highly productive. Many users and experts note Claude 4 Sonnet’s step-by-step reasoning ability and flexibility. Anthropic follows a policy of strong user data information security, which includes an ethical component. The voice function is powered by the popular ElevenLabs voice AI. This is a solid model suitable for everyday use in both work and leisure. It is used in the Cursor fork for Visual Studio Code by Microsoft for deep code analysis alongside ChatGPT models. Claude 4 Sonnet and Claude 4 Opus for coding were introduced on May 22, 2025. It was announced that the models achieved high test results and enhanced capabilities.

Personal experience: I constantly use the free model and the lighter versions. It writes code quite well, but again only at the beginning, and then it starts lagging a bit. I haven’t used Cursor, but I’ve heard typical feedback about code breaking. On the Claude AI website, I noticed frequent “sticking” of the previous query in the input field — you send the next one, but for some reason the previous one gets entered. This especially happens when the canvas is enabled. I also noticed occasional shutdowns of the web interface with a redirect to a fallback page.

3. GROK 4

An excellent neural network from Elon Musk's company xAI with a great understanding of token context. This model, like the previous Grok 3, was trained on the supercomputer Colossus (Memphis, USA), using 200,000 Nvidia H100 graphics processors, which is 10 times the computing power used for the Grok 2 model. Grok 4 - a model with a hybrid design and a focus on reasoning. Grok 4 has about 1.7 trillion parameters and a context window of 256,000 tokens, as well as a high-precision voice mode. Many parameters of the new Grok model are not specified by xAI. A more powerful Grok 4 Heavy model is available by subscription.

It has self-checking, if necessary activates extended computing resources to solve complex tasks, such as mathematical calculations and scientific analysis, has good long-term memory for key moments of communication, deep internet search and multimodality. It generates excellent realistic images, including with famous personalities, which often go viral on social networks. It has a voice function with different voice styles.

From personal experience: it codes decently and solves difficult tasks. More restrained in chatter compared to Grok 3. The token context-linking technique is still honed and perfectly understands even fragments of words with incorrect phrases. An unambiguous leader, but takes third place because it still often breaks code and gets confused in dialogue. The Grok family with version 4 has not stopped adding its own ideas to your cases, so you will have to fix it often. Quite an interesting and specific neural network with a certain character, but lately it has become less accessible due to limits. Therefore, it is impossible to solve complex tasks for free with it.

2. CHATGPT

An undisputed marketing leader and a many-armed miracle worker of the electronic world - the ChatGPT model from OpenAI, available limited-free. Currently available is ChatGPT-5. After using up the 30k token limit you can interact with ChatGPT-5mini. The ChatGPT-4o model is available in the paid version. It has an adaptive voice function that recognizes intonations. It is worth noting that in 2025 OpenAI released two open-source models - gpt-oss-20b and the more powerful gpt-oss-120b. ChatGPT-5 has almost the same drawbacks as all models: hallucinations and poor memory. It is smarter than ChatGPT-4, and this is noticeable. ChatGPT-5 is laconic and precise in its answers, which increases its useful qualities for work. This conciseness is not liked by users accustomed to the softer communication of ChatGPT-4o.

With the release of ChatGPT-5 the lack of resources at OpenAI has become noticeable. The model is significantly cut down in responses, economical in tokens. The small limit ends quickly, which complicates solving tasks. Paid versions also have limits, which very much affects work. According to various sources, the number of paid ChatGPT users is steadily growing, but still accounts for 3-4% of all service users.

Since OpenAI - is a rather public company and is considered the flagship of neurotechnologies, enthusiasts have appeared who on the OPENAIFILES platform try to highlight their actions.

1. GOOGLE GEMINI

A family of multimodal (generating various types of data - text, images, video, software code) artificial intelligence models developed by Google DeepMind. Gemini replaced Bard and became a key element of Google's artificial intelligence strategy. At the moment a powerful Gemini 2.5 Pro is actually available for free in GoogleAIStudio, as well as Gemini 2.5 Flash and other models. The neural network has a voice function, deep analysis and web search. Gemini does not disclose its architecture, but it is not difficult to understand that this is a model far above average, trained on search engine information, which makes it quite suitable for science and business. Integrated into all Google services, and this is a huge digital world with great opportunities. Therefore, the neural network is very promising. Together with other models from Google and in general the entire ecosystem of the search giant it forms a powerful unmatched environment for fully autonomous execution of practically any tasks in the modern IT industry.

Personal experience. The model copes decently with coding. Of course with splitting into small tasks, but such are the realities of today's models. It has excellent memory, an input window of one million tokens, token context-linking. But, like all modern advanced models, it cannot cope with vibe-coding in popular service clients like AppMaster nodes or geonodes in some Blender/blueprints in Unreal Engine 5. And yet at the moment this is the best model. It surpasses other models in almost everything:

1. It is better than other models in its accessibility - one million tokens per session. And there can be many such sessions. If you see that the model began to confuse "extracts" of embeds of previous dialogue posts, you can calmly start a new dialogue, without worrying that you will be disconnected.

2. It is better than other models in its memory - the quality of dialogue embeds sent with each of your requests is excellent and fully preserved.

3. It is better than other models in its manner of communication - the model listens to your posts and does not ignore the information you provide. The model does not give random options if previous attempts were unsuccessful. It takes into account unsuccessful attempts and, if it repeats them, then your remark can bring it back into the logic track of your common case.

4. It is better than other models in its integration into the ubiquitous Google services and other platforms.

5. It is better than other models in understanding the goal of your task and always strives to accomplish it.

It also has disadvantages. For example, it can lie to you that it has seen your page or repository and start inventing non-existent names. In this regard, other neural nets are more honest.

Another minus. The Gemini model can repeatedly cycle the same mistakes, a certain method of solving a problem predominates, and the model simply does not see other options. In such a case it is better to start a new dialogue. Other models also have this, but it is still a minus.

To sum up: if I made mistakes somewhere, please let me know about it.

I also want to mention the multimodal neural network Genspark, which uses several AI agents during search and generates pages with the needed information — Sparkpages. It is also about Manus, which requires a phone number during registration.

IBM has released open enterprise LLMs Granite, which use the Mamba/Transformer architecture that reduces memory usage without compromising performance. Based on them, IBM is developing business agents, collaborating with the security platform HACKERONE to identify vulnerabilities in agent systems.

The KIMI-K2 model also performs quite well. The developers state that it has 32 billion activated parameters and 1 trillion total parameters. Trained using the Muon optimizer, Kimi K2 demonstrates exceptional performance in tasks of frontier knowledge, reasoning, and programming, while being carefully optimized for agent capabilities.

More and more Chinese models with powerful features are entering the market, such as the open LLM GLM-4.5 from Zhipu AI, which boasts an enormous number of parameters (in this case — 355 billion total parameters, including 32 billion active ones), as well as strong benchmark results. This model and others are available on the BIGMODEL platform by Zhipu AI.

Another large model, K2THINK, with 32 billion parameters, was developed by the open neural network research community LLM360 and is presented as effectively solving challenging mathematical problems. The project was supported by the Mohamed bin Zayed University of Artificial Intelligence in Masdar City, Abu Dhabi. The LLM360 community actively trains and releases open models, and creates datasets and benchmarks.

The Chinese company LongCat has introduced a solid open-source LRM (Language Reasoning Model) that develops both formal and agent-based reasoning.

An interesting neural network comparison feature is offered by the University of California, Berkeley project LMARENA, where you can use multiple AI models for the same task and then evaluate their responses. It's a simple and effective way to assess the real usefulness of AI models without relying on polished benchmarks.

The advantages of neural networks are also being analyzed by platforms such as ARTIFICIAL ANALYSIS, LIVECODEBENCH, the company SCALE, which specializes in adapting AI solutions to various tasks, and other similar platforms that monitor benchmarks.

All the models presented in this list give me confidence in the technological breakthrough of tomorrow, which will undoubtedly happen. After all, just three years ago even realists wrinkled their noses, reasoning about the speed of light in transistors and imperfect asynchronous modules. And today, pessimists stubbornly search for courses on mastering prompt engineering and subdividing megatasks for neural networks.

In addition, startups of personalized communication agents are gaining more popularity, revenue, and investment. Services such as POLYBUZZ, CHAI, and others are capturing the community’s attention and continue to grow in popularity.

The already happened first technological breakthrough gives hope for great achievements and at the same time is a bit frightening. What will happen in the future? What results will humanity achieve in developing a new thinking entity? Philosophical questions to which humans cannot answer. I wonder if a neural network will be able to answer them?

Take the SAID test to once again be sure that AI cannot fool us.

said-correspondent🌐

Discussion in the topic with the same name in the community.