Lofcz.Forks.HtmlToOpenXml 3.2.3

dotnet add package Lofcz.Forks.HtmlToOpenXml --version 3.2.3                
NuGet\Install-Package Lofcz.Forks.HtmlToOpenXml -Version 3.2.3                
This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.
<PackageReference Include="Lofcz.Forks.HtmlToOpenXml" Version="3.2.3" />                
For projects that support PackageReference, copy this XML node into the project file to reference the package.
paket add Lofcz.Forks.HtmlToOpenXml --version 3.2.3                
#r "nuget: Lofcz.Forks.HtmlToOpenXml, 3.2.3"                
#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.
// Install Lofcz.Forks.HtmlToOpenXml as a Cake Addin
#addin nuget:?package=Lofcz.Forks.HtmlToOpenXml&version=3.2.3

// Install Lofcz.Forks.HtmlToOpenXml as a Cake Tool
#tool nuget:?package=Lofcz.Forks.HtmlToOpenXml&version=3.2.3                

Latest version Download Counts MIT License

What is HtmlToOpenXml?

HtmlToOpenXml is a small .Net library that convert simple or advanced HTML to plain OpenXml components. This program has started in 2009, initially to convert user's comments into Word.

This library supports both .Net Framework 4.6.2, .NET Standard 2.0 and .NET 8 which are all LTS.

Depends on DocumentFormat.OpenXml and AngleSharp.

Official Nuget Package

See Also

Supported Html tags

Refer to w3schools’ tag list to see their meaning

  • a
  • h1-h6
  • abbr and acronym
  • b, i, u, s, del, ins, em, strike, strong
  • br and hr
  • img, figcaption and svg
  • table, td, tr, th, tbody, thead, tfoot, caption and col
  • cite
  • div, span, time, font and p
  • pre
  • sub and sup
  • ul, ol and li
  • dd and dt
  • q, blockquote, dfn
  • article, aside, section are considered like div

Javascript (script), CSS style, meta, comments, buttons and input controls are ignored. Other tags are treated like div.

In v1 and v2, Javascript (script), CSS style, meta, comments and other not supported tags does not generate an error but are ignored.

Html Parser

In v3, the parsing of the Html relies on AngleSharp package, which follows the W3C specifications and actively supports Html5.

In v1 and v2, the parsing of the Html was done using a custom Regex-based enumerator and was more flexible, but leaving a complex code, hard to maintain.

How to implement or debug features

My reference bibles cover both OpenXml and HTML:

Open MS Word or Apple Pages and design your expected output. Save as a DOCX file, then rename as a ZIP. Extract the content and inspect those files: document.xml, numbering.xml (for list) and styles.xml.

Acknowledgements

Thank you to all contributors that share their bug fixes (in no particular order): scwebgroup, ddforge, daviderapicavoli, worstenbrood, jodybullen, BenBurns, OleK, scarhand, imagremlin, antgraf, mdeclercq, pauldbentley, xjpmauricio, jairoXXX, giorand, bostjanKlemenc, AaronLS, taishmanov. And thanks to David Podhola for the Nuget package.

Logo provided with the permission of Enhanced Labs Design Studio.

Support

This project is open source and I do my best to support it in my spare time. I'm always happy to receive Pull Request and grateful for the time you have taken. Please target branch dev only. If you have questions, don't hesitate to get in touch with me!

Product Compatible and additional computed target framework versions.
.NET net5.0 was computed.  net5.0-windows was computed.  net6.0 was computed.  net6.0-android was computed.  net6.0-ios was computed.  net6.0-maccatalyst was computed.  net6.0-macos was computed.  net6.0-tvos was computed.  net6.0-windows was computed.  net7.0 was computed.  net7.0-android was computed.  net7.0-ios was computed.  net7.0-maccatalyst was computed.  net7.0-macos was computed.  net7.0-tvos was computed.  net7.0-windows was computed.  net8.0 is compatible.  net8.0-android was computed.  net8.0-browser was computed.  net8.0-ios was computed.  net8.0-maccatalyst was computed.  net8.0-macos was computed.  net8.0-tvos was computed.  net8.0-windows was computed. 
.NET Core netcoreapp2.0 was computed.  netcoreapp2.1 was computed.  netcoreapp2.2 was computed.  netcoreapp3.0 was computed.  netcoreapp3.1 was computed. 
.NET Standard netstandard2.0 is compatible.  netstandard2.1 was computed. 
.NET Framework net461 was computed.  net462 is compatible.  net463 was computed.  net47 was computed.  net471 was computed.  net472 was computed.  net48 was computed.  net481 was computed. 
MonoAndroid monoandroid was computed. 
MonoMac monomac was computed. 
MonoTouch monotouch was computed. 
Tizen tizen40 was computed.  tizen60 was computed. 
Xamarin.iOS xamarinios was computed. 
Xamarin.Mac xamarinmac was computed. 
Xamarin.TVOS xamarintvos was computed. 
Xamarin.WatchOS xamarinwatchos was computed. 
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.

NuGet packages (1)

Showing the top 1 NuGet packages that depend on Lofcz.Forks.HtmlToOpenXml:

Package Downloads
Html2DocxCore

Package Description

GitHub repositories

This package is not used by any popular GitHub repositories.

Version Downloads Last updated
3.2.3 423 11/13/2024
3.2.2 132 11/7/2024

# Changelog
## 3.2.2
- Supports a feature to disable heading numbering #175
- Support center image with margin auto #171
- Support deprecrated align attribute for block #171
- Fix parsing of style attribute with a key with no value
- Improve parsing of style attribute to avoid an extra call to HtmlDecode
- Extend support of nested list for non-W3C compliant html #173
## 3.2.1
- Fix indentation of numbering list #166
- Bordered container must render its content with one bordered frame #168
- Fix serialisation of the "Harvard" style for lower-roman list
- Fix ParseHeader/Footer where input with multiple paragraphs output only the latest
- Ensure to apply default style for paragraphs, to avoid a paragraph between 2 list is mis-guessed
## 3.2.0
- Add new public API to allow parsing into Header and Footer #162. Some API methods as been flagged as obsolete with a clear message of what to use instead.
This is not a breaking changes as it keep existing behaviour.
- Add support for `SVG` format (either from img src or the SVG node tag)
- Automatically create the `_top` bookmark if needed
- Fix a crash when a hyperlink contains both `img` and `figcation`
- Fix a crash when `li` is empty #161
## 3.1.1
- Fix respecting layout with `div`/`p` ending with line break #158
- Prevent crash when header/footer is incomplete and parsing image #159
- Fix combining 2 runs separated by a break, 2nd line should not be prefixed by a space
## 3.1.0
- Fix table Cell borders are wrongly applied on the run #156
- Correctly handle RTL layout for text, list, table and document scope #86 #66
- Support property line-height #52
- Fallback to `background` style attribute as many users use this simplified attribute version
- In `HtmlDomExpression.CreateFromHtmlNode`, use the correct casting to `IElement` rather than `IHtmlElement`, to prevent crash if `svg` node is encountered
## 3.0.1
- Ensure to count existing images from header and footer too #113
- Preserve line break pre for OSX/Windows
- Prevent a crash when the provided style is missing its type
- Defensive code to avoid 2 rowSpan+colSpan with a cell in between to crash #59
## 3.0.0
- AngleSharp is now the backend parser for Html
- Refactoring to use the Interpreter/Composite design pattern, which ease the code maintenance
- Lots of new unit test cases (190+)
- Rewriting of `list` (correct handling of nested style, restarting numbers and consecutive)
- Rewriting of `table` (row span, col span, col tags driving styles)
- Parallel download of images at early stage of the parsing.
## 2.4.2
- Fix signing the assembly
- Enable Nullable reference types
- support latest version of OpenXML SDK (3.1.0) which introduces breaking changes, but also support embedding SVG and JPEG2000 files.
- fix caching the provisioned images
- drop support for .Net Standard 1.3
## 2.4.0 and 2.4.1
do not use as the signing assembly was in failure #138
## 2.3.0
- better table border style
- keep processing html even if downloading image generates an error
- support for styling OL, UL and LI elements
## 2.2.0
- support latest version of OpenXML SDK (2.12.0) which introduces an API to add an OpenXmlElement to the correct XSD order
- restore support for .NET 4.6+, Net Standard 1.3+
- use cleaner name for base-64 images description
## 2.1.0
- support latest version of OpenXML SDK (2.11.0+) which fix fatal issue
- drop support for .NET 4.0, .Net Standard 1.4
## 2.0.3
- optimize number of nested list numbering (thanks to BenGraf)
- fix an issue where some styles weren't being applied
- fix reading JPEG images with SOF2 progressive DCT encoding
## 2.0.2
- fix nested list numbering
## 2.0.1
- fix manual provisioning of images
- img respect both border attribute and border style attribute
## 2.0.0
This brings .Net Core support:
- better inline styling
- numbering list with nested list is more stable
- allow parsing unit with decimals
- color can be either rgb(a), hsl(a), hex or named color.
- parser is more stable
## Pre 1.6.0
- imported from codeplex.com