DeepSeek V3:The $5.5M Trained Model Beats GPT-4o & Llama 3.1
Model
Arena-Hard
AlpacaEval 2.0
DeepSeek-V2.5-0905
76.2
50.5
Qwen2.5-72B-Instruct
81.2
49.1
LLaMA-3.1 405B
69.3
40.5
GPT-4o-0513
80.4
51.1
Claude-Sonnet-3.5-1022
85.2
52.0
DeepSeek-V3
85.5
70.0
Arena-Hard Performance:
DeepSeek-V3 ranks highest with 85.5, narrowly surpassing Claude-Sonnet-3.5 (85.2) and significantly outperforming DeepSeek-V2.5 (76.2).
This shows its exceptional ability to generate well-rounded, context-aware responses in difficult scenarios.
AlpacaEval 2.0 Performance:
DeepSeek-V3 leads with 70.0, far ahead of Claude-Sonnet-3.5 (52.0), the second-best performer.
This demonstrates significant improvements in user preference and overall quality of open-ended outputs, showcasing better alignment with user expectations.
Comparison with Competitors:
Qwen2.5 (Arena-Hard: 81.2, AlpacaEval: 49.1):
Performs reasonably well on Arena-Hard but falls behind significantly in user preference, indicating weaker alignment with user-friendly response styles.
GPT-4-0513 (Arena-Hard: 80.4, AlpacaEval: 51.1):
Competitive on both metrics but doesn’t match the user-centered quality of DeepSeek-V3.
LLaMA-3.1 (Arena-Hard: 69.3, AlpacaEval: 40.5):
Scores lower on both benchmarks, highlighting weaker open-ended generation capabilities.
Here are the Aider Polyglot Benchmark Results, which evaluate models on their ability to complete tasks correctly. The evaluation is divided into two output formats:
Diff-like format (shaded bars): Tasks where outputs resemble code diffs or small updates.
Whole format (solid bars): Tasks requiring the generation of an entire response.
Key Observations
Top Performers:
o1-2024-11-12 (Tingli) leads the benchmark with nearly 65% accuracy in the whole format, showing exceptional performance across tasks.
DeepSeek Chat V3 Preview and Claude-3.5 Sonnet-2024-1022 follow closely, with scores in the range of 40–50%, demonstrating solid task completion in both formats.
Mid-Performers:
Gemini+exp-1206 and Claude-3.5 Haiku-2024-1022 score moderately in both formats, highlighting balanced but average performance.
DeepSeek Chat V2.5 and Flash-2.0 sit in the lower mid-range, showing weaker task resolution abilities compared to the leading models.
Lower Performers:
y-lightning, Qwen2.5-Coder 32B-Instruct, and GPT-4o-mini 2024-07-18 have the lowest scores, with accuracies under 10–15%. This indicates significant limitations in handling both diff-like and whole format tasks.
Format Comparison:
Models generally perform slightly better in the Whole format than the Diff-like format, implying that full-response generation is handled better than smaller, incremental changes.
The shaded bars (diff-like format) are consistently lower than their whole-format counterparts, indicating a consistent gap in this specific capability.
DeepSeek Chat V3 Preview’s Position:
Ranks among the top three performers.
Scores around 50% in the whole format and slightly lower in the diff-like format.
This shows strong capabilities in handling complete task generation but leaves room for improvement in diff-like tasks.
Insights:
The benchmark highlights the diverse strengths and weaknesses of the evaluated models.
Models like o1-2024-11-12 show dominance across both task formats, whereas others like DeepSeek Chat V3 Preview excel primarily in full-task generation.
Lower performers indicate a need for optimization in both nuanced and broader task-handling capabilities.
This ultimately reflects the versatility and specialized strengths of different AI systems in completing benchmark tasks.
DeepSeek V3’s Chat Website & API Platform
You can interact with DeepSeek-V3 through the official website: DeepSeek Chat.
Additionally, they offer an OpenAI-Compatible API on the DeepSeek Platform: Link. There is an API cost to it and it depends on the tokens:
How to Run DeepSeek V3?
If you prefer not to use the chat UI and want to directly work with the model, there’s an alternative for you. The model, DeepSeek-V3, has all its weights released on Hugging Face. You can access the SafeTensor files there.
Model Size and Hardware Requirements:
Firstly, the model is massive, with 671 billion parameters, making it challenging to run on standard consumer-grade hardware. If your hardware isn’t powerful enough, it’s recommended to use the DeepSeek platform for direct access. Wait for a Hugging Face Space if one becomes available.
How to Run Locally?
If you have sufficient hardware, you can run the model locally using the DeepSeek-Infer Demo, SGLang, LMDeploy, TensorRT-LLM, vLLM, AMD GPU, Huawei Ascend NPU.
Convert the model to a quantized version to reduce memory requirements, which is particularly helpful for lower-end systems.
Here’s how you can convert FP8 weights to BF16:
Conversion script if you need bf16
cd inference
python fp8_cast_bf16.py --input-fp8-hf-path /path/to/fp8_weights --output-bf16-hf-path /path/to/bf16_weights
Setup Process with DeepSeek-Infer Demo
Hugging Face’s transformers library does not directly support the model yet. To set it up, you’ll need to:
If you have powerful hardware, try running the model locally to fully explore its potential.
Otherwise, use the DeepSeek.com chat UI or their API platform for seamless access.
LLM DeepSeek Plugin
You can also use llm-deepseek plugin to access the model:
# Step 1: Install the llm package
!pip install llm
# Step 2: Install the DeepSeek plugin
!llm install llm-deepseek
# Step 3: Set the DeepSeek API key
!llm keys set deepseek
# Paste your API key when prompted
# Step 4: Run the DeepSeek Chat Model
!llm -m deepseek-chat 'five great names for a pet ocelot'
# Step 5: Run the DeepSeek Coder Model
!llm -m deepseek-coder 'how to reverse a linked list in python'
DeepSeek V3 Experimentations
On the DeepSeek platform, you can explore the web and engage in deep reasoning to solve complex problems, similar to OpenAI’s approach.
Here are some prompts and outputs:
First Experiment
Prompt: Generate an SVG of a kid and a football
Output:
Here’s a simple SVG representation of a kid playing with a football. You can copy and paste this code into an HTML file or an SVG viewer to see the image:
Explanation:
The kid is represented with simple shapes: a circle for the head, a rectangle for the body, and lines for the arms and legs.
The football is an ellipse with two white lines to mimic the classic football design.
You can adjust the colours, sizes, and positions to customize the image further.
Let me know if you'd like to enhance or modify this!
Second Experiment
Prompt: Create the classic snake game
Output:
Creating the classic Snake game involves using HTML, CSS, and JavaScript . Below is a simple implementation:
Snake Game
body {
display: flex;
justify-content: center;
align-items: center;
height: 100vh;
margin: 0;
background-color: #000;
}
canvas {
border: 1px solid #fff;
}
JavaScript (snake.js)const canvas = document.getElementById("gameCanvas");const ctx = canvas.getContext("2d");const gridSize = 20;const tileCount = canvas.width / gridSize;let snake = [{ x: 10, y: 10 }];let direction = { x: 0, y: 0 };let food = { x: Math.floor(Math.random() * tileCount), y: Math.floor(Math.random() * tileCount) };let score = 0;function gameLoop() { update(); draw(); setTimeout(gameLoop, 100);}function update() { const head = { x: snake[0].x + direction.x, y: snake[0].y + direction.y }; // Check for collision with walls or itself if (head.x < 0 || head.x >= tileCount || head.y < 0 || head.y >= tileCount || snake.some(segment => segment.x === head.x && segment.y === head.y)) { resetGame(); return; } snake.unshift(head); // Check if snake eats the food if (head.x === food.x && head.y === food.y) { score++; food = { x: Math.floor(Math.random() * tileCount), y: Math.floor(Math.random() * tileCount) }; } else { snake.pop(); }}function draw() { // Clear the canvas ctx.fillStyle = "#000"; ctx.fillRect(0, 0, canvas.width, canvas.height); // Draw the snake ctx.fillStyle = "#0F0"; snake.forEach(segment => ctx.fillRect(segment.x * gridSize, segment.y * gridSize, gridSize, gridSize)); // Draw the food ctx.fillStyle = "#F00"; ctx.fillRect(food.x * gridSize, food.y * gridSize, gridSize, gridSize); // Draw the score ctx.fillStyle = "#FFF"; ctx.font = "20px Arial"; ctx.fillText("Score: " + score, 10, 30);}function resetGame() { snake = [{ x: 10, y: 10 }]; direction = { x: 0, y: 0 }; score = 0; food = { x: Math.floor(Math.random() * tileCount), y: Math.floor(Math.random() * tileCount) };}window.addEventListener("keydown", e => { switch (e.key) { case "ArrowUp": if (direction.y === 0) direction = { x: 0, y: -1 }; break; case "ArrowDown": if (direction.y === 0) direction = { x: 0, y: 1 }; break; case "ArrowLeft": if (direction.x === 0) direction = { x: -1, y: 0 }; break; case "ArrowRight": if (direction.x === 0) direction = { x: 1, y: 0 }; break; }});gameLoop();How It Works:Canvas Setup: The game is drawn on a Snake Movement: The snake is represented as an array of segments. The head moves based on the direction, and the tail follows.Food: A random position is generated for the food. When the snake eats the food, it grows, and the score increases.Collision Detection: The game checks if the snake collides with the walls or itself, and resets if it does.Controls: Arrow keys are used to change the snake's direction.How to Run:Save the HTML and JavaScript code into files (index.html and snake.js).Open the index.html file in a browser.Use the arrow keys to control the snake.Let me know if you'd like to add more features or improve the game!
Third Experiment
Prompt: How to clone the git repository?
Output
Firstly, it thinks for 6 seconds and then gives the output
Cloning a Git repository involves creating a copy of an existing repository from a remote location (such as GitHub, GitLab, or Bitbucket) to your local machine. This copy includes all the project files, commit history, and branches.Here are the steps to clone a Git repository:1. Install Git Ensure that Git is installed on your machine. If it's not installed, download and install it from the official Git website.2. Open Terminal or Command Prompt Open your terminal (Linux/Mac) or command prompt (Windows).3. Navigate to the Desired Directory Use the cd command to navigate to the directory where you want to clone the repository.bash Copy cd /path/to/directory 4. Clone the Repository Use the git clone command followed by the URL of the repository.bash Copy git clone https://github.com/username/repository.git Replace https://github.com/username/repository.git with the actual URL of the repository you want to clone.5. Enter the Cloned Directory After cloning, navigate into the repository directory.bash Copy cd repository 6. Verify the Clone You can verify that the repository was cloned correctly by checking the remote URL.bash Copy git remote -v This should display the URL of the remote repository.Example bash Copy # Navigate to the desired directory cd ~/Documents# Clone the repository git clone https://github.com/username/repository.git# Enter the repository directory cd repository# Verify the remote URL git remote -v Optional: Cloning Using SSH If you have set up SSH keys, you can clone the repository using an SSH URL:bash Copy git clone [email protected]:username/repository.git Optional: Shallow Clone If you only need a part of the history (e.g., the latest commit), you can perform a shallow clone:bash Copy git clone --depth 1 https://github.com/username/repository.git This is useful for large repositories where you don't need the full history.Conclusion You have now successfully cloned a Git repository to your local machine. You can start working on the project, make changes, and commit them as needed.
All the outputs are up to the mark, and the output speed was quite impressive. Additionally, DeepSeek allows you to ask reasoning questions to the model, making it a versatile and efficient tool for complex problem-solving and in-depth analysis.
We will give you a full-fledged comparison of this model in the upcoming articles!
Conclusion
DeepSeek V3 stands as a monumental achievement in the evolution of large-scale AI models, combining unprecedented scale with unmatched efficiency. With its innovative architecture, cost-effective training, and impressive 685 billion parameters, DeepSeek V3 redefines what’s possible in the AI space. The model’s ability to excel in diverse benchmarks, outperforming both open-source and closed-source competitors, highlights its extraordinary capabilities.
Not only does DeepSeek V3 deliver state-of-the-art performance in tasks like coding, reasoning, and mathematical problem-solving, but it also democratizes access to cutting-edge AI with its open-source availability. Developers, researchers, and businesses alike can leverage its immense power, supported by a permissive license that fosters innovation and collaboration.
By achieving exceptional results with a training cost of just $5.5 million, DeepSeek V3 proves that scalability and efficiency can coexist, setting a new standard for the future of AI development. This release marks a significant leap forward, not just for DeepSeek, but for the entire AI community, paving the way for breakthroughs in machine learning, natural language processing, and beyond.
Hi, I am Pankaj Singh Negi - Senior Content Editor | Passionate about storytelling and crafting compelling narratives that transform ideas into impactful content. I love reading about technology revolutionizing our lifestyle.
We use cookies essential for this site to function well. Please click to help us improve its usefulness with additional cookies. Learn about our use of cookies in our Privacy Policy & Cookies Policy.
Show details
Cookies
This site uses cookies to ensure that you get the best experience possible. To learn more about how we use cookies, please refer to our Privacy Policy & Cookies Policy.
brahmaid
It is needed for personalizing the website.
csrftoken
This cookie is used to prevent Cross-site request forgery (often abbreviated as CSRF) attacks of the website
Identityid
Preserves the login/logout state of users across the whole site.
sessionid
Preserves users' states across page requests.
g_state
Google One-Tap login adds this g_state cookie to set the user status on how they interact with the One-Tap modal.
MUID
Used by Microsoft Clarity, to store and track visits across websites.
_clck
Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.
_clsk
Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.
SRM_I
Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.
SM
Use to measure the use of the website for internal analytics
CLID
The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.
SRM_B
Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.
_gid
This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.
_ga_#
Used by Google Analytics, to store and count pageviews.
_gat_#
Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.
collect
Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.
AEC
cookies ensure that requests within a browsing session are made by the user, and not by other sites.
G_ENABLED_IDPS
use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.
test_cookie
This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.
_we_us
this is used to send push notification using webengage.
WebKlipperAuth
used by webenage to track auth of webenagage.
ln_or
Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.
JSESSIONID
Use to maintain an anonymous user session by the server.
li_rm
Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.
AnalyticsSyncHistory
Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.
lms_analytics
Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.
liap
Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.
visit
allow for the Linkedin follow feature.
li_at
often used to identify you, including your name, interests, and previous activity.
s_plt
Tracks the time that the previous page took to load
lang
Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings
s_tp
Tracks percent of page viewed
AMCV_14215E3D5995C57C0A495C55%40AdobeOrg
Indicates the start of a session for Adobe Experience Cloud
s_pltp
Provides page name value (URL) for use by Adobe Analytics
s_tslv
Used to retain and fetch time since last visit in Adobe Analytics
li_theme
Remembers a user's display preference/theme setting
li_theme_set
Remembers which users have updated their display / theme preferences
We do not use cookies of this type.
_gcl_au
Used by Google Adsense, to store and track conversions.
SID
Save certain preferences, for example the number of search results per page or activation of the SafeSearch Filter. Adjusts the ads that appear in Google Search.
SAPISID
Save certain preferences, for example the number of search results per page or activation of the SafeSearch Filter. Adjusts the ads that appear in Google Search.
__Secure-#
Save certain preferences, for example the number of search results per page or activation of the SafeSearch Filter. Adjusts the ads that appear in Google Search.
APISID
Save certain preferences, for example the number of search results per page or activation of the SafeSearch Filter. Adjusts the ads that appear in Google Search.
SSID
Save certain preferences, for example the number of search results per page or activation of the SafeSearch Filter. Adjusts the ads that appear in Google Search.
HSID
Save certain preferences, for example the number of search results per page or activation of the SafeSearch Filter. Adjusts the ads that appear in Google Search.
DV
These cookies are used for the purpose of targeted advertising.
NID
These cookies are used for the purpose of targeted advertising.
1P_JAR
These cookies are used to gather website statistics, and track conversion rates.
OTZ
Aggregate analysis of website visitors
_fbp
This cookie is set by Facebook to deliver advertisements when they are on Facebook or a digital platform powered by Facebook advertising after visiting this website.
fr
Contains a unique browser and user ID, used for targeted advertising.
bscookie
Used by LinkedIn to track the use of embedded services.
lidc
Used by LinkedIn for tracking the use of embedded services.
bcookie
Used by LinkedIn to track the use of embedded services.
aam_uuid
Use these cookies to assign a unique ID when users visit a website.
UserMatchHistory
These cookies are set by LinkedIn for advertising purposes, including: tracking visitors so that more relevant ads can be presented, allowing users to use the 'Apply with LinkedIn' or the 'Sign-in with LinkedIn' functions, collecting information about how visitors use the site, etc.
li_sugr
Used to make a probabilistic match of a user's identity outside the Designated Countries
MR
Used to collect information for analytics purposes.
ANONCHK
Used to store session ID for a users session to ensure that clicks from adverts on the Bing search engine are verified for reporting purposes and for personalisation
We do not use cookies of this type.
Cookie declaration last updated on 24/03/2023 by Analytics Vidhya.
Cookies are small text files that can be used by websites to make a user's experience more efficient. The law states that we can store cookies on your device if they are strictly necessary for the operation of this site. For all other types of cookies, we need your permission. This site uses different types of cookies. Some cookies are placed by third-party services that appear on our pages. Learn more about who we are, how you can contact us, and how we process personal data in our Privacy Policy.