LLMSharp.OpenAi.Tokenizer
2.0.1
See the version list below for details.
dotnet add package LLMSharp.OpenAi.Tokenizer --version 2.0.1
NuGet\Install-Package LLMSharp.OpenAi.Tokenizer -Version 2.0.1
<PackageReference Include="LLMSharp.OpenAi.Tokenizer" Version="2.0.1" />
paket add LLMSharp.OpenAi.Tokenizer --version 2.0.1
#r "nuget: LLMSharp.OpenAi.Tokenizer, 2.0.1"
// Install LLMSharp.OpenAi.Tokenizer as a Cake Addin #addin nuget:?package=LLMSharp.OpenAi.Tokenizer&version=2.0.1 // Install LLMSharp.OpenAi.Tokenizer as a Cake Tool #tool nuget:?package=LLMSharp.OpenAi.Tokenizer&version=2.0.1
LLMSharp Tokenizers
- LLMSharp.Anthropic.Tokenizer : Unofficial implementation of tokenizer for Anthropic claude in dotnet. Install this nuget package for Encoding using Claude Tokenizer.
- LLMSharp.OpenAi.Tokenizer : Unofficial implementation of tokenizer for GPT-3.5/GPT-4 models in dotnet. Install this nuget package for Encoding using GPT Chat Completions Model Tokenizer.
Usage
- Install the latest version of nuget package
dotnet add package LLMSharp.Anthropic.Tokenizer
dotnet add package LLMSharp.OpenAi.Tokenizer
- Create an instance of the tokenizer
// Claude Tokenizer
using LLMSharp.Anthropic.Tokenizer;
var tokenizer = new ClaudeTokenizer();
// OpenAi ChatCompletion Models Tokenizer
using LLMSharp.OpenAi.Tokenizer;
var tokenizer = new OpenAiChatCompletionsTokenizer();
- Encode : tokenizes a given text, this is the default implementation that throws an exception if the text contains any special tokens
var encodedTokens = tokenizer.Encode("hello world");
- CountTokens : count tokens in a given text, this is the default implementation that throws an exception if the text contains any special tokens
var tokenCount = tokenizer.CountTokens("hello world");
- EncodeWithSpecialTokens : tokenizes a given text, including all or specific special tokens
// passing 'null' for allowedSpecial , will help tokenize all special tokens
var encodedBytes = tokenizer.EncodeWithSpecialTokens(
text:"<META_START>some data<META_END>",
allowedSpecial: null,
disallowedSpecial: null);
// passing an array of strings for allowedSpecial , will help tokenize only those special tokens
// any other special tokens found in the text will throw an exception
var encodedBytes = tokenizer.EncodeWithSpecialTokens(
text:"<META_START>some data<META_END>",
allowedSpecial: new string[]{"<META_START>", "<META_END>"},
disallowedSpecial: null);
- CountWithSpecialTokens : count tokens in a given text, including all or specific special tokens
var tokenCount = tokenizer.CountWithSpecialTokens(
text:"<META_START>some data<META_END>",
allowedSpecial: new string[]{"<META_START>", "<META_END>"},
disallowedSpecial: null);
Benchmarks
Encoding and CountTokens for 4200 tokens (~16 KB) of text
Linux
BenchmarkDotNet v0.13.7, Ubuntu 22.04.3 LTS (Jammy Jellyfish)
Intel Xeon Platinum 8370C CPU 2.80GHz, 1 CPU, 1 logical core and 1 physical core
.NET SDK 7.0.110
[Host] : .NET 7.0.10 (7.0.1023.36801), X64 RyuJIT AVX2
.NET 6.0 : .NET 6.0.21 (6.0.2123.36801), X64 RyuJIT AVX2
.NET 7.0 : .NET 7.0.10 (7.0.1023.36801), X64 RyuJIT AVX2
Method | Job | Runtime | StringToEncode | Mean |
---|---|---|---|---|
OpenAiChatCompletionsTokenizerEncode | .NET 6.0 | .NET 6.0 | Con(...)e.\n [16926] | 1.328 ms |
OpenAiChatCompletionsTokenizerEncode | .NET 7.0 | .NET 7.0 | Con(...)e.\n [16926] | 1.239 ms |
OpenAiChatCompletionsTokenizerCountTokens | .NET 6.0 | .NET 6.0 | Con(...)e.\n [16926] | 1.274 ms |
OpenAiChatCompletionsTokenizerCountTokens | .NET 7.0 | .NET 7.0 | Con(...)e.\n [16926] | 1.142 ms |
ClaudeTokenizerEncode | .NET 6.0 | .NET 6.0 | Con(...)e.\n [16926] | 1.343 ms |
ClaudeTokenizerEncode | .NET 7.0 | .NET 7.0 | Con(...)e.\n [16926] | 1.188 ms |
ClaudeTokenizerCountTokens | .NET 6.0 | .NET 6.0 | Con(...)e.\n [16926] | 1.270 ms |
ClaudeTokenizerCountTokens | .NET 7.0 | .NET 7.0 | Con(...)e.\n [16926] | 1.160 ms |
macOS
BenchmarkDotNet v0.13.7, macOS Ventura 13.4.1 (c) (22F770820d) [Darwin 22.5.0]
Apple M2, 1 CPU, 8 logical and 8 physical cores
.NET SDK 7.0.304
[Host] : .NET 7.0.7 (7.0.723.27404), Arm64 RyuJIT AdvSIMD
.NET 6.0 : .NET 6.0.21 (6.0.2123.36311), Arm64 RyuJIT AdvSIMD
.NET 7.0 : .NET 7.0.7 (7.0.723.27404), Arm64 RyuJIT AdvSIMD
| Method | Job | Runtime | StringToEncode | Mean | | OpenAiChatCompletionsTokenizerEncode | .NET 6.0 | .NET 6.0 | Con(...)e.\n [16926] | 1,133.5 μs | | OpenAiChatCompletionsTokenizerEncode | .NET 7.0 | .NET 7.0 | Con(...)e.\n [16926] | 738.2 μs | | | | | | | | OpenAiChatCompletionsTokenizerCountTokens | .NET 6.0 | .NET 6.0 | Con(...)e.\n [16926] | 1,071.3 μs | | OpenAiChatCompletionsTokenizerCountTokens | .NET 7.0 | .NET 7.0 | Con(...)e.\n [16926] | 709.5 μs | | | | | | | | ClaudeTokenizerEncode | .NET 6.0 | .NET 6.0 | Con(...)e.\n [16926] | 1,186.3 μs | | ClaudeTokenizerEncode | .NET 7.0 | .NET 7.0 | Con(...)e.\n [16926] | 703.5 μs | | | | | | | | ClaudeTokenizerCountTokens | .NET 6.0 | .NET 6.0 | Con(...)e.\n [16926] | 1,143.9 μs | | ClaudeTokenizerCountTokens | .NET 7.0 | .NET 7.0 | Con(...)e.\n [16926] | 711.3 μs |
Windows
BenchmarkDotNet v0.13.7, Windows 11 (10.0.22621.2134/22H2/2022Update/SunValley2)
Intel Core i7-9700K CPU 3.60GHz (Coffee Lake), 1 CPU, 8 logical and 8 physical cores
.NET SDK 7.0.400
[Host] : .NET 7.0.10 (7.0.1023.36312), X64 RyuJIT AVX2
.NET 6.0 : .NET 6.0.21 (6.0.2123.36311), X64 RyuJIT AVX2
.NET 7.0 : .NET 7.0.10 (7.0.1023.36312), X64 RyuJIT AVX2
Method | Job | Runtime | StringToEncode | Mean |
---|---|---|---|---|
OpenAiChatCompletionsTokenizerEncode | .NET 6.0 | .NET 6.0 | Con(...).\r\n [17157] | 1.270 ms |
OpenAiChatCompletionsTokenizerEncode | .NET 7.0 | .NET 7.0 | Con(...).\r\n [17157] | 1.226 ms |
OpenAiChatCompletionsTokenizerCountTokens | .NET 6.0 | .NET 6.0 | Con(...).\r\n [17157] | 1.212 ms |
OpenAiChatCompletionsTokenizerCountTokens | .NET 7.0 | .NET 7.0 | Con(...).\r\n [17157] | 1.138 ms |
ClaudeTokenizerEncode | .NET 6.0 | .NET 6.0 | Con(...).\r\n [17157] | 1.266 ms |
ClaudeTokenizerEncode | .NET 7.0 | .NET 7.0 | Con(...).\r\n [17157] | 1.174 ms |
ClaudeTokenizerCountTokens | .NET 6.0 | .NET 6.0 | Con(...).\r\n [17157] | 1.242 ms |
ClaudeTokenizerCountTokens | .NET 7.0 | .NET 7.0 | Con(...).\r\n [17157] | 1.156 ms |
Product | Versions Compatible and additional computed target framework versions. |
---|---|
.NET | net5.0 was computed. net5.0-windows was computed. net6.0 was computed. net6.0-android was computed. net6.0-ios was computed. net6.0-maccatalyst was computed. net6.0-macos was computed. net6.0-tvos was computed. net6.0-windows was computed. net7.0 was computed. net7.0-android was computed. net7.0-ios was computed. net7.0-maccatalyst was computed. net7.0-macos was computed. net7.0-tvos was computed. net7.0-windows was computed. net8.0 was computed. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. net9.0 was computed. net9.0-android was computed. net9.0-browser was computed. net9.0-ios was computed. net9.0-maccatalyst was computed. net9.0-macos was computed. net9.0-tvos was computed. net9.0-windows was computed. |
.NET Core | netcoreapp2.0 was computed. netcoreapp2.1 was computed. netcoreapp2.2 was computed. netcoreapp3.0 was computed. netcoreapp3.1 was computed. |
.NET Standard | netstandard2.0 is compatible. netstandard2.1 was computed. |
.NET Framework | net461 was computed. net462 was computed. net463 was computed. net47 was computed. net471 was computed. net472 was computed. net48 was computed. net481 was computed. |
MonoAndroid | monoandroid was computed. |
MonoMac | monomac was computed. |
MonoTouch | monotouch was computed. |
Tizen | tizen40 was computed. tizen60 was computed. |
Xamarin.iOS | xamarinios was computed. |
Xamarin.Mac | xamarinmac was computed. |
Xamarin.TVOS | xamarintvos was computed. |
Xamarin.WatchOS | xamarinwatchos was computed. |
-
.NETStandard 2.0
- Google.Protobuf (>= 3.24.1)
NuGet packages
This package is not used by any NuGet packages.
GitHub repositories
This package is not used by any popular GitHub repositories.