Edi.WordFilter 2.2.0

There is a newer version of this package available.
See the version list below for details.

dotnet add package Edi.WordFilter --version 2.2.0

NuGet\Install-Package Edi.WordFilter -Version 2.2.0

This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.

<PackageReference Include="Edi.WordFilter" Version="2.2.0" />

For projects that support PackageReference, copy this XML node into the project file to reference the package.

paket add Edi.WordFilter --version 2.2.0

The NuGet Team does not provide support for this client. Please contact its maintainers for support.

#r "nuget: Edi.WordFilter, 2.2.0"

#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.

// Install Edi.WordFilter as a Cake Addin
#addin nuget:?package=Edi.WordFilter&version=2.2.0

// Install Edi.WordFilter as a Cake Tool
#tool nuget:?package=Edi.WordFilter&version=2.2.0

The NuGet Team does not provide support for this client. Please contact its maintainers for support.

Edi.WordFilter

Basic word filter used in my blog system to filter dirty words (e.g. insulting languages, impertinent words)

Install

dotnet add package Edi.WordFilter

Usage

Prepare a text file with banned words, for example splitted by |. Like this:

fuck|shit|ass

Use it like this

var wordFilterDataFilePath = $"{AppDomain.CurrentDomain.GetData(Constants.DataDirectory)}\\BannedWords.txt";
var filter = new TrieTreeWordFilter(wordFilterDataFilePath);

var output = filter.FilterContent("Go fuck yourself and eat some shit!");
// output: Go **** yourself and eat some ****!

Design Details

TrieTreeWordFilter

This is written by ChatGPT, it's faster than my HashTableWordFilter.

The FilterContent() method in the TrieTreeWordFilter class is used to filter sensitive words from a given string content. It uses a Trie data structure to efficiently find and replace sensitive words with asterisks (*).

Here's a step-by-step explanation of how it works:

It initializes a result array with the same length as the input content. This array will hold the filtered content.
It sets two pointers, slowIndex and fastIndex, to track the start and end of a potential sensitive word in the content.
It then enters a loop that continues until fastIndex has traversed the entire content.
Inside the loop, it checks if the current character exists as a child in the current Trie node. If it does, it means the current character could be part of a sensitive word.
If the current Trie node marks the end of a word (node.IsEndOfWord is true), it means a complete sensitive word has been found. The method then replaces all characters of this word in the result array with asterisks (*).
If the current Trie node does not mark the end of a word, it means the current character could be part of a longer sensitive word. The method then moves to the next character and continues the loop.
If the current character does not exist as a child in the current Trie node, it means the current character is not part of a sensitive word. The method then copies this character to the result array.
After the loop, it copies any remaining characters in the content to the result array.
Finally, it returns the result array as a new string, which is the filtered content.

This method is efficient for filtering sensitive words, especially when there is a large set of words to filter. However, it assumes that the Trie tree has been properly initialized with all the sensitive words.

Trie Tree

A Trie, also known as a prefix tree, is a tree-like data structure that is used to store a collection of strings. Each node of the Trie represents a character of a string and the root of the Trie represents an empty string or the start of a string. The strings are stored in a way that all the descendants of a node have a common prefix of the string associated with that node. Here's a simple example of how a Trie might look when storing the words "car", "cat", and "dog":

root
├── c
│   ├── a
│   │   ├── r
│   │   └── t
└── d
    └── o
        └── g

In this example, each path from the root to a node represents a string. For instance, the path from the root to the node 'r' represents the string "car".

Tries are particularly useful for operations that involve prefix matching, such as autocomplete features in text editors or web browsers, as they allow for efficient retrieval of all keys with a given prefix. They are also used in word filtering, as in the FilterContent() method you asked about earlier.

However, Tries can be memory-intensive, as each node may need to store pointers to many children. There are variations of the Trie data structure, such as the compressed Trie (also known as a Radix tree or Patricia tree), which help to mitigate this issue by merging nodes with a single child.

HashTableWordFilter (obsolete)

Split disharmony word into Dictionary, Key points to the first character, Value points to the next character where Value itself is the Key of the next Dictionary. When filtering content, begin search with the first Dictionary, if matching double side, then it is a disharmony word.

For example, given disharmony word "FUCK,FS,ABC", the following structure is created:

Each blue box represents a Dictionary, each character of the disharmony word is stored as the Key and pointing to the next Dictionary.

For example, if user input "FUCK FAKE", the flow is:

"F" can be found in the first level of Dictionary(H0), "U" can be found in the Dictionary(H1) where H1's value is reffered to, like this, C and K can be found in H2 and H3, so "FUCK" is a disharmony word.

For the word "FAKE", although "F" can be found in Dictionary(H0), but H0 does not have a value pointing to "A", and "A" also don't have "K" sits in it's value, and "K" also does not exists in the first level of Dictionary, so "FAKE" is not a disharmony word.

免责申明

此项目(Edi.WordFilter)及其配套组件均为免费开源的产品，仅用于学习交流，并且不直接向中国提供服务，中国用户请于下载后立即删除。

任何中国境内的组织及个人不得使用此项目(Edi.WordFilter)及其配套组件构建任何形式的面向中国境内用户的网站或服务。

不可用于任何违反中华人民共和国(含台湾省)或使用者所在地区法律法规的用途。

因为作者即本人仅完成代码的开发和开源活动(开源即任何人都可以下载使用)，从未参与用户的任何运营和盈利活动。

且不知晓用户后续将程序源代码用于何种用途，故用户使用过程中所带来的任何法律责任即由用户自己承担。

《开源软件有漏洞，作者需要负责吗？是的！》

Product	Compatible and additional computed target framework versions.
.NET	net6.0 is compatible. net6.0-android was computed. net6.0-ios was computed. net6.0-maccatalyst was computed. net6.0-macos was computed. net6.0-tvos was computed. net6.0-windows was computed. net7.0 was computed. net7.0-android was computed. net7.0-ios was computed. net7.0-maccatalyst was computed. net7.0-macos was computed. net7.0-tvos was computed. net7.0-windows was computed. net8.0 is compatible. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed.

Product

.NET

Compatible target framework(s)

Included target framework(s) (in package)

Learn more about Target Frameworks and .NET Standard.

net6.0
- No dependencies.
net8.0
- No dependencies.

NuGet packages (1)

Showing the top 1 NuGet packages that depend on Edi.WordFilter:

Package	Downloads
MoongladePure.Comments Package Description	1.2K

GitHub repositories

This package is not used by any popular GitHub repositories.

Version	Downloads	Last updated
2.3.0	186	1/3/2024
2.2.0	112	12/17/2023
2.1.0	90	12/15/2023
2.0.1	87	12/14/2023
2.0.0	89	12/14/2023
1.8.0	122	10/11/2023
1.7.1	87	10/10/2023
1.7.0	2,336	11/9/2022
1.6.1	2,199	8/6/2022
1.6.0	1,624	10/21/2021
1.5.0	399	9/14/2021
1.4.0	3,305	12/18/2020
1.3.1	537	11/27/2020
1.3.0	548	11/11/2020
1.2.3	2,378	2/25/2020
1.2.2	644	12/5/2019
1.2.1	617	10/21/2019
1.2.0	1,097	4/23/2019
1.1.0	909	12/2/2018
1.0.0	692	11/7/2018

Total 18.1K

Current version 112

Per day average 9

Filter