Lokad.Tokenizers
0.1.0
Prefix Reserved
dotnet add package Lokad.Tokenizers --version 0.1.0
NuGet\Install-Package Lokad.Tokenizers -Version 0.1.0
<PackageReference Include="Lokad.Tokenizers" Version="0.1.0" />
paket add Lokad.Tokenizers --version 0.1.0
#r "nuget: Lokad.Tokenizers, 0.1.0"
// Install Lokad.Tokenizers as a Cake Addin #addin nuget:?package=Lokad.Tokenizers&version=0.1.0 // Install Lokad.Tokenizers as a Cake Tool #tool nuget:?package=Lokad.Tokenizers&version=0.1.0
Overview
Lokad.Tokenizers is a C#/.NET library that provides tokenization functionalities similar to the rust-tokenizers
library. It is designed to work with various tokenization models, including the XLMRobertaTokenizer
model used for multilingual-e5-large
(text embedding).
Installation
To install Lokad.Tokenizers, you can use the NuGet package manager:
> Install-Package Lokad.Tokenizers
Usage
Here is an example of how to use the XLMRobertaTokenizer
:
using Lokad.Tokenizers.Tokenizer;
// ...
var vocab_path = TestUtils.DownloadFileToCache("https://cdn.huggingface.co/xlm-roberta-large-finetuned-conll03-english-sentencepiece.bpe.model");
// Create an instance of the XLMRobertaTokenizer
var xlmRobertaTokenizer = new XLMRobertaTokenizer(vocab_path, false);
// Define the input text to be tokenized
var inputText = "Hello, world!";
// Tokenize the input text
var result = xlmRobertaTokenizer.Encode(inputText, null, 128, TruncationStrategy.LongestFirst, 0);
// Access the tokenized output
var tokenIds = result.TokenIds;
var tokenOffsets = result.TokenOffsets;
// Process the tokenized output as needed
// ...
Contributing
Contributions are welcome! Please feel free to submit a pull request or open an issue on GitHub.
License
MIT License
References
Product | Versions Compatible and additional computed target framework versions. |
---|---|
.NET | net7.0 is compatible. net7.0-android was computed. net7.0-ios was computed. net7.0-maccatalyst was computed. net7.0-macos was computed. net7.0-tvos was computed. net7.0-windows was computed. net8.0 is compatible. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. |
-
net7.0
- Google.Protobuf (>= 3.25.3)
-
net8.0
- Google.Protobuf (>= 3.25.3)
NuGet packages
This package is not used by any NuGet packages.
GitHub repositories
This package is not used by any popular GitHub repositories.
Version | Downloads | Last updated |
---|---|---|
0.1.0 | 133 | 10/1/2024 |