LLMSharp.OpenAi.Tokenizer 2.0.3

dotnet add package LLMSharp.OpenAi.Tokenizer --version 2.0.3                
NuGet\Install-Package LLMSharp.OpenAi.Tokenizer -Version 2.0.3                
This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.
<PackageReference Include="LLMSharp.OpenAi.Tokenizer" Version="2.0.3" />                
For projects that support PackageReference, copy this XML node into the project file to reference the package.
paket add LLMSharp.OpenAi.Tokenizer --version 2.0.3                
#r "nuget: LLMSharp.OpenAi.Tokenizer, 2.0.3"                
#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.
// Install LLMSharp.OpenAi.Tokenizer as a Cake Addin
#addin nuget:?package=LLMSharp.OpenAi.Tokenizer&version=2.0.3

// Install LLMSharp.OpenAi.Tokenizer as a Cake Tool
#tool nuget:?package=LLMSharp.OpenAi.Tokenizer&version=2.0.3                

LLMSharp Tokenizers

build and test CodeQL

  • LLMSharp.Anthropic.Tokenizer : Unofficial implementation of tokenizer for Anthropic claude in dotnet. Install this nuget package for Encoding using Claude Tokenizer.
  • LLMSharp.OpenAi.Tokenizer : Unofficial implementation of tokenizer for GPT-3.5/GPT-4 models in dotnet. Install this nuget package for Encoding using GPT Chat Completions Model Tokenizer.

Usage

  • Install the latest version of nuget package
dotnet add package LLMSharp.Anthropic.Tokenizer

dotnet add package LLMSharp.OpenAi.Tokenizer
  • Create an instance of the tokenizer
// Claude Tokenizer
using LLMSharp.Anthropic.Tokenizer;

var tokenizer = new ClaudeTokenizer();


// OpenAi ChatCompletion Models Tokenizer
using LLMSharp.OpenAi.Tokenizer;

var tokenizer = new OpenAiChatCompletionsTokenizer();
  • Encode : tokenizes a given text, this is the default implementation that throws an exception if the text contains any special tokens
var encodedTokens = tokenizer.Encode("hello world");
  • CountTokens : count tokens in a given text, this is the default implementation that throws an exception if the text contains any special tokens
var tokenCount = tokenizer.CountTokens("hello world");
  • EncodeWithSpecialTokens : tokenizes a given text, including all or specific special tokens
// passing 'null' for allowedSpecial , will help tokenize all special tokens
var encodedBytes = tokenizer.EncodeWithSpecialTokens(
    text:"<META_START>some data<META_END>",
    allowedSpecial: null,
    disallowedSpecial: null);


// passing an array of strings for allowedSpecial , will help tokenize only those special tokens
// any other special tokens found in the text will throw an exception
var encodedBytes = tokenizer.EncodeWithSpecialTokens(
    text:"<META_START>some data<META_END>",
    allowedSpecial: new string[]{"<META_START>", "<META_END>"},
    disallowedSpecial: null);
  • CountWithSpecialTokens : count tokens in a given text, including all or specific special tokens
var tokenCount = tokenizer.CountWithSpecialTokens(
    text:"<META_START>some data<META_END>",
    allowedSpecial: new string[]{"<META_START>", "<META_END>"},
    disallowedSpecial: null);

Benchmarks

Encoding and CountTokens for 4200 tokens (~16 KB) of text

Linux


BenchmarkDotNet v0.13.7, Ubuntu 22.04.3 LTS (Jammy Jellyfish)
Intel Xeon Platinum 8370C CPU 2.80GHz, 1 CPU, 1 logical core and 1 physical core
.NET SDK 7.0.110
  [Host]   : .NET 7.0.10 (7.0.1023.36801), X64 RyuJIT AVX2
  .NET 6.0 : .NET 6.0.21 (6.0.2123.36801), X64 RyuJIT AVX2
  .NET 7.0 : .NET 7.0.10 (7.0.1023.36801), X64 RyuJIT AVX2


Method Job Runtime StringToEncode Mean
OpenAiChatCompletionsTokenizerEncode .NET 6.0 .NET 6.0 Con(...)e.\n [16926] 1.328 ms
OpenAiChatCompletionsTokenizerEncode .NET 7.0 .NET 7.0 Con(...)e.\n [16926] 1.239 ms
OpenAiChatCompletionsTokenizerCountTokens .NET 6.0 .NET 6.0 Con(...)e.\n [16926] 1.274 ms
OpenAiChatCompletionsTokenizerCountTokens .NET 7.0 .NET 7.0 Con(...)e.\n [16926] 1.142 ms
ClaudeTokenizerEncode .NET 6.0 .NET 6.0 Con(...)e.\n [16926] 1.343 ms
ClaudeTokenizerEncode .NET 7.0 .NET 7.0 Con(...)e.\n [16926] 1.188 ms
ClaudeTokenizerCountTokens .NET 6.0 .NET 6.0 Con(...)e.\n [16926] 1.270 ms
ClaudeTokenizerCountTokens .NET 7.0 .NET 7.0 Con(...)e.\n [16926] 1.160 ms

macOS


BenchmarkDotNet v0.13.7, macOS Ventura 13.4.1 (c) (22F770820d) [Darwin 22.5.0]
Apple M2, 1 CPU, 8 logical and 8 physical cores
.NET SDK 7.0.304
  [Host]   : .NET 7.0.7 (7.0.723.27404), Arm64 RyuJIT AdvSIMD
  .NET 6.0 : .NET 6.0.21 (6.0.2123.36311), Arm64 RyuJIT AdvSIMD
  .NET 7.0 : .NET 7.0.7 (7.0.723.27404), Arm64 RyuJIT AdvSIMD


Method Job Runtime StringToEncode Mean
OpenAiChatCompletionsTokenizerEncode .NET 6.0 .NET 6.0 Con(...)e.\n [16926] 1,133.5 μs
OpenAiChatCompletionsTokenizerEncode .NET 7.0 .NET 7.0 Con(...)e.\n [16926] 738.2 μs
OpenAiChatCompletionsTokenizerCountTokens .NET 6.0 .NET 6.0 Con(...)e.\n [16926] 1,071.3 μs
OpenAiChatCompletionsTokenizerCountTokens .NET 7.0 .NET 7.0 Con(...)e.\n [16926] 709.5 μs
ClaudeTokenizerEncode .NET 6.0 .NET 6.0 Con(...)e.\n [16926] 1,186.3 μs
ClaudeTokenizerEncode .NET 7.0 .NET 7.0 Con(...)e.\n [16926] 703.5 μs
ClaudeTokenizerCountTokens .NET 6.0 .NET 6.0 Con(...)e.\n [16926] 1,143.9 μs
ClaudeTokenizerCountTokens .NET 7.0 .NET 7.0 Con(...)e.\n [16926] 711.3 μs

Windows


BenchmarkDotNet v0.13.7, Windows 11 (10.0.22621.2134/22H2/2022Update/SunValley2)
Intel Core i7-9700K CPU 3.60GHz (Coffee Lake), 1 CPU, 8 logical and 8 physical cores
.NET SDK 7.0.400
  [Host]   : .NET 7.0.10 (7.0.1023.36312), X64 RyuJIT AVX2
  .NET 6.0 : .NET 6.0.21 (6.0.2123.36311), X64 RyuJIT AVX2
  .NET 7.0 : .NET 7.0.10 (7.0.1023.36312), X64 RyuJIT AVX2


Method Job Runtime StringToEncode Mean
OpenAiChatCompletionsTokenizerEncode .NET 6.0 .NET 6.0 Con(...).\r\n [17157] 1.270 ms
OpenAiChatCompletionsTokenizerEncode .NET 7.0 .NET 7.0 Con(...).\r\n [17157] 1.226 ms
OpenAiChatCompletionsTokenizerCountTokens .NET 6.0 .NET 6.0 Con(...).\r\n [17157] 1.212 ms
OpenAiChatCompletionsTokenizerCountTokens .NET 7.0 .NET 7.0 Con(...).\r\n [17157] 1.138 ms
ClaudeTokenizerEncode .NET 6.0 .NET 6.0 Con(...).\r\n [17157] 1.266 ms
ClaudeTokenizerEncode .NET 7.0 .NET 7.0 Con(...).\r\n [17157] 1.174 ms
ClaudeTokenizerCountTokens .NET 6.0 .NET 6.0 Con(...).\r\n [17157] 1.242 ms
ClaudeTokenizerCountTokens .NET 7.0 .NET 7.0 Con(...).\r\n [17157] 1.156 ms
Product Compatible and additional computed target framework versions.
.NET net5.0 was computed.  net5.0-windows was computed.  net6.0 was computed.  net6.0-android was computed.  net6.0-ios was computed.  net6.0-maccatalyst was computed.  net6.0-macos was computed.  net6.0-tvos was computed.  net6.0-windows was computed.  net7.0 was computed.  net7.0-android was computed.  net7.0-ios was computed.  net7.0-maccatalyst was computed.  net7.0-macos was computed.  net7.0-tvos was computed.  net7.0-windows was computed.  net8.0 was computed.  net8.0-android was computed.  net8.0-browser was computed.  net8.0-ios was computed.  net8.0-maccatalyst was computed.  net8.0-macos was computed.  net8.0-tvos was computed.  net8.0-windows was computed. 
.NET Core netcoreapp2.0 was computed.  netcoreapp2.1 was computed.  netcoreapp2.2 was computed.  netcoreapp3.0 was computed.  netcoreapp3.1 was computed. 
.NET Standard netstandard2.0 is compatible.  netstandard2.1 was computed. 
.NET Framework net461 was computed.  net462 was computed.  net463 was computed.  net47 was computed.  net471 was computed.  net472 was computed.  net48 was computed.  net481 was computed. 
MonoAndroid monoandroid was computed. 
MonoMac monomac was computed. 
MonoTouch monotouch was computed. 
Tizen tizen40 was computed.  tizen60 was computed. 
Xamarin.iOS xamarinios was computed. 
Xamarin.Mac xamarinmac was computed. 
Xamarin.TVOS xamarintvos was computed. 
Xamarin.WatchOS xamarinwatchos was computed. 
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.

NuGet packages

This package is not used by any NuGet packages.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version Downloads Last updated
2.0.3 3,355 10/4/2023
2.0.2 3,184 9/6/2023
2.0.1 261 8/25/2023
2.0.0 141 8/22/2023
1.0.1 153 8/21/2023