Introduction
Mistral NeMo is a pioneering open-source large language model developed by Mistral AI in collaboration with NVIDIA, designed to deliver state-of-the-art natural language processing capabilities. This model, boasting 12 billion parameters, offers a large context window of up to 128k tokens. While it’s smaller and more efficient than its predecessor, Mistral 7B, Mistral NeMo still provides impressive performance, particularly in reasoning, world knowledge, and coding accuracy. This article explores the features, applications, and implications of Mistral Nemo.
Overview
- Mistral NeMo, a collaboration between Mistral AI and NVIDIA, is a cutting-edge open-source language model with 12 billion parameters and a 128k token context window.
- It is more efficient and performs better in reasoning, world knowledge, and coding accuracy than its predecessor, Mistral 7B.
- Excels in multiple languages, including English, French, German, and Spanish, support complex multi-turn conversations.
- It uses the Tekken tokenizer, which is more efficient at compressing text and source code in over 100 languages than previous models.
- For various applications, it is available on Hugging Face, Mistral AI’s API, Vertex AI, and the Mistral AI website.
- It is suitable for tasks like text generation and translation, and measures are in place to reduce bias and enhance safety, though user discretion is advised.
Mistral Nemo: A Multilingual Model
Designed for global, multilingual applications, this model excels in function calling and boasts a large context window. It performs exceptionally well in English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, and Hindi, marking a significant step towards making advanced AI models accessible to people in all languages. Mistral NeMO has undergone advanced fine-tuning and alignment, making it significantly better at following precise instructions, reasoning, handling multi-turn conversations, and generating code compared to Mistral 7B. With a 128k context length, Mistral NeMO can maintain long-term dependencies and understand complex, multi-turn conversations, setting it apart in various applications.
Tokenizer
Mistral NeMo incorporates Tekken, a new tokenizer based on Tiktoken, trained on over 100 languages. It compresses natural language text and source code more efficiently than the SentencePiece tokenizer used in previous Mistral models. Tekken is approximately 30% more efficient at compressing source code in Chinese, Italian, French, German, Spanish, and Russian. Additionally, it is 2x and 3x more efficient at compressing Korean and Arabic, respectively. Compared to the Llama 3 tokenizer, Tekken outperforms in compressing text for about 85% of all languages.
How to access Mistral Nemo?
You can access and use Mistral Nemo LLM by:
1. Hugging Face
Model Hub: Mistral NeMo is available on the Hugging Face Model Hub. To use it, follow these steps:
from transformers import AutoModelForCausalLM, AutoTokenizer
# Load the model
model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-Nemo")
# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-Nemo")
2. Mistral AI’s Official API:
Mistral AI offers an API for interacting with their models. To get started, sign up for an account and obtain your API key.
import requests
API_URL = "https://api.mistral.ai/v1/chat/completions"
API_KEY = "your_api_key_here"
headers = {
"Authorization": f"Bearer {API_KEY}",
"Content-Type": "application/json",
}
data = {
"model": "mistral-small",
"messages": [{"role": "user", "content": "Hello! How are you?"}],
"temperature": 0.7,
}
response = requests.post(API_URL, headers=headers, json=data)
print(response.json())
3. Vertex AI
Google Cloud’s Vertex AI provides a managed service for deploying Mistral NeMo. Here’s a brief overview of the deployment process:
- Import the model from the Model Hub within the Vertex AI console.
- After importing, create an endpoint and deploy the model.
- Once deployed, utilize the AI Platform Predict service to send requests to your model.
4. Directly from Mistral AI
You can also access Mistral Nemo directly from the official Mistral AI website. The website provides a chat interface for interacting with the model.
Using Mistral chat
You can access Mistral LLM here: Mistral Chat
Set the model to Nemo, and you’re good to prompt.
I asked, “What are agents?” and received a detailed and comprehensive response. You can try it for yourself with different questions.
Using Mistral Nemo with Vertex AI
First, install httpx and google-auth and get your project ID ready. Now, enable and manage Mistral Nemo in Vertex AI.
pip install httpx google-auth
Imports
import os
import httpx
import google.auth
from google.auth.transport.requests import Request
- os: Provides a way to use operating system-dependent functionality like reading or writing to environment variables.
- httpx: A library for making HTTP requests, similar to requests but with more features and support for asynchronous operations.
- google.auth: A library to handle Google authentication.
- google.auth.transport.requests.Request: A class that provides methods to refresh Google credentials using HTTP requests.
Set the Environment Variables
os.environ['GOOGLE_PROJECT_ID'] = ""
os.environ['GOOGLE_REGION'] = ""
- os.environ: This is used to set environment variables for the Google Cloud Project ID and Region. These should be filled with appropriate values.
Function: get_credentials()
def get_credentials():
credentials, project_id = google.auth.default(
scopes=["https://www.googleapis.com/auth/cloud-platform"]
)
credentials.refresh(Request())
return credentials.token
- google.auth.default(): Fetches the default Google Cloud credentials, optionally specifying scopes.
- credentials.refresh(Request()): Refreshes the credentials to ensure they are up-to-date.
- return credentials.token: Returns the OAuth 2.0 token that is used to authenticate API requests.
Function: build_endpoint_url()
def build_endpoint_url(
region: str,
project_id: str,
model_name: str,
model_version: str,
streaming: bool = False,
):
base_url = f"https://{region}-aiplatform.googleapis.com/v1/"
project_fragment = f"projects/{project_id}"
location_fragment = f"locations/{region}"
specifier = "streamRawPredict" if streaming else "rawPredict"
model_fragment = f"publishers/mistralai/models/{model_name}@{model_version}"
url = f"{base_url}{"https://www.analyticsvidhya.com/".join([project_fragment, location_fragment, model_fragment])}:{specifier}"
return url
- base_url: Constructs the base URL for the API endpoint using the Google Cloud region.
- project_fragment, location_fragment, model_fragment: Constructs different parts of the URL based on project ID, location (region), and model details.
- specifier: Chooses between streamRawPredict (for streaming responses) and rawPredict (for non-streaming).
- url: Builds the full endpoint URL by concatenating the base URL with project, location, and model details.
Retrieve Google Cloud Project ID and Region
project_id = os.environ.get("GOOGLE_PROJECT_ID")
region = os.environ.get("GOOGLE_REGION")
- os.environ.get(): Retrieves the Google Cloud Project ID and Region from the environment variables.
Retrieve Google Cloud Credentials
access_token = get_credentials()
- Calls the get_credentials function to obtain an access token for authentication.
Define Model and Streaming Options
model = "mistral-nemo"
model_version = "2407"
is_streamed = False # Change to True to stream token responses
- model: The name of the model to use.
- model_version: The version of the model to use.
- is_streamed: A flag indicating whether to stream responses or not.
Build URL
url = build_endpoint_url(
project_id=project_id,
region=region,
model_name=model,
model_version=model_version,
streaming=is_streamed
)
- Calls the build_endpoint_url function to construct the URL for making the API request.
headers = {
"Authorization": f"Bearer {access_token}",
"Accept": "application/json",
}
- Authorization: Contains the Bearer token for authentication.
- Accept: Specifies that the client expects a JSON response.
Define POST Payload
data = {
"model": model,
"messages": [{"role": "user", "content": "Who is the best French painter?"}],
"stream": is_streamed,
}
- model: The model to be used in the request.
- messages: The input message or query for the model.
- stream: Whether to stream responses or not.
Make the API Call
with httpx.Client() as client:
resp = client.post(url, json=data, headers=headers, timeout=None)
print(resp.text)
- httpx.Client(): Creates a new HTTP client session.
- client.post(url, json=data, headers=headers, timeout=None): Sends a POST request to the specified URL with the JSON payload and headers. The timeout=None means there is no timeout limit for the request.
- print(resp.text): Prints the response from the API call.
My question was, “Who is the best French painter?” The model responded with a detailed answer, including 5 renowned painters and their backgrounds.
Conclusion
Mistral Nemo is a robust and versatile open-source language model created by Mistral AI, which is making notable strides in natural language processing. Boasting multilingual support and the efficient Tekken tokenizer, Nemo excels in numerous tasks, presenting an appealing option for developers desiring high-quality language tools with minimal resource requirements. Available through Hugging Face, Mistral AI’s API, Vertex AI, and the Mistral AI website, Nemo’s accessibility allows users to leverage its capabilities across multiple platforms.
Frequently Asked Questions
Ans. Mistral Nemo is an advanced language model crafted by Mistral AI to generate and interpret text that resembles human language, depending on the inputs it gets.
Ans. Mistral Nemo is notable for its rapid response times and efficiency. It combines quick processing with precise results, thanks to its training on a broad dataset that enables it to handle diverse subjects effectively.
Ans. Mistral Nemo is versatile and can handle a range of tasks, such as generating text, translating languages, answering questions, and more. It can also assist with creative writing or coding tasks.
Ans. Mistral AI has implemented measures to reduce bias and enhance safety in Mistral Nemo. Yet, as with all AI models, it might occasionally produce biased or inappropriate outputs. Users should use it responsibly and review its responses critically, with ongoing improvements being made by Mistral AI.
Ans. You can access it through an API to integrate it into your applications. It is also available on platforms like Hugging Face Spaces, or you can run it locally if you have the required setup.