GroupDocs.Parser 22.6.0

.NET Standard 2.0 .NET Framework 2.0
Install-Package GroupDocs.Parser -Version 22.6.0
dotnet add package GroupDocs.Parser --version 22.6.0
<PackageReference Include="GroupDocs.Parser" Version="22.6.0" />
For projects that support PackageReference, copy this XML node into the project file to reference the package.
paket add GroupDocs.Parser --version 22.6.0
The NuGet Team does not provide support for this client. Please contact its maintainers for support.
#r "nuget: GroupDocs.Parser, 22.6.0"
#r directive can be used in F# Interactive, C# scripting and .NET Interactive. Copy this into the interactive tool or source code of the script to reference the package.
// Install GroupDocs.Parser as a Cake Addin
#addin nuget:?package=GroupDocs.Parser&version=22.6.0

// Install GroupDocs.Parser as a Cake Tool
#tool nuget:?package=GroupDocs.Parser&version=22.6.0
The NuGet Team does not provide support for this client. Please contact its maintainers for support.

Document Parser .NET API

Version 22.6.0 Nuget

banner

Product Page | Docs | Demos | API Reference | Examples | Blog | Search | Free Support | Temporary License

This text parser on-premise API works well to search & extract formatted text as well as the raw text from a variety of documents of supported file formats.

Document Parser Processing Features

  • Parse documents by user-defined templates.
  • Extract plain and structured text.
  • Extract text areas with coordinates, text styles, and other information.
  • Search text by a keyword or regular expression; extract text around that word.
  • Extract HTML or Markdown (MD) formatted text for a fast preview.
  • Increase performance by extracting raw text.
  • Extract formatted text, metadata, images, containers, and attachments.
  • Extract table of contents for some supported document formats.
  • Parse form data from PDF documents.

Parse Document by Template

Word Processing: DOC, DOT, DOCX, DOCM, DOTX, DOTM, ODT, OTT, RTF, TXT
Spreadsheet: XLS, XLT, XLSX, XLSM, XLSB, XLTX, XLTM, ODS, OTS, XLA, XLAM, NUMBERS
Presentation: PPS, POT, PPTX, PPTM, POTX, POTM, PPSX, PPSM, ODP, OTP
Portable: PDF

Extract Text (Accurate)

Word Processing: DOC, DOT, DOCX, DOCM, DOTX, DOTM, ODT, OTT, RTF
Spreadsheet: XLS, XLT, XLSX, XLSM, XLSB, XLTX, XLTM, ODS, OTS, CSV, XLA, XLAM, NUMBERS
Presentation: PPS, POT, PPTX, PPTM, POTX, POTM, PPSX, PPSM, ODP, OTP
Email: EML, EMLX, MSG
Markup: XHTML, MHTML, MD, XML
eBook: CHM, EPUB, FB2
Portable: PDF
OneNote: ONE
Databases: Databases are supported via ADO.NET. To work with the corresponding database format install its database provider.

Extract Text (Raw)

Spreadsheet: XLS, XLT, XLSX, XLSM, XLTX, XLTM, XLA, XLAM
Presentation: PPT, PPS, POT, PPTX, PPTM, POTX, POTM, PPSX, PPSM
Portable: PDF

Extract Structured Text and Formatted Text

Word Processing: DOC, DOT, DOCX, DOCM, DOTX, DOTM, ODT, OTT, RTF
Spreadsheet: XLS, XLT, XLSX, XLSM, XLTX, XLTM, XLA, XLAM
Presentation: PPT, PPS, POT, PPTX, PPTM, POTX, POTM, PPSX, PPSM, ODP, OTP
Email: EML, EMLX, MSG
Markup: MD (Formatted Text is Not supported)
eBook: CHM, EPUB, FB2

Please visit the Supported Document Formats for more details.

Platform Independence

GroupDocs.Parser for .NET does not require any external software or third-party tool to be installed. GroupDocs.Parser for .NET supports any 32-bit or 64-bit operating system where .NET or Mono framework is installed. The other details are as follows:

Microsoft Windows: Microsoft Windows Desktop (x86, x64) (XP & up), Microsoft Windows Server (x86, x64) (2000 & up), Windows Azure
Mac OS: Mac OS X
Linux: Linux (Ubuntu, OpenSUSE, CentOS and others)
Development Environments: Microsoft Visual Studio (2010 & up), Xamarin.Android, Xamarin.IOS, Xamarin.Mac, MonoDevelop 2.4 and later.
Supported Frameworks: GroupDocs.Conversion for .NET supports .NET and Mono frameworks.

Get Started

Are you ready to give GroupDocs.Parser for .NET a try? Simply execute Install-Package GroupDocs.Parser from Package Manager Console in Visual Studio to fetch & reference GroupDocs.Parser assembly in your project. If you already have GroupDocs.Parser for .Net and want to upgrade it, please execute Update-Package GroupDocs.Parser to get the latest version.

Please check the GitHub Repository for other common usage scenarios.

Extract all Images and Save them in PNG Format via C# Code

// create an instance of Parser class
using(Parser parser = new Parser(Constants.SampleZip)) {
    // extract images from document
    IEnumerable < PageImageArea > images = parser.GetImages();
    // check if images extraction is supported
    if (images == null) {
        Console.WriteLine("Page images extraction isn't supported");
        return;
    }
    // create the options to save images in PNG format
    ImageOptions options = new ImageOptions(ImageFormat.Png);
    int imageNumber = 0;
    // iterate over images
    foreach(PageImageArea image in images) {
        // save the image to the png file
        image.Save(imageNumber.ToString() + ".png", options);
        imageNumber++;
    }
}

Product Page | Docs | Demos | API Reference | Examples | Blog | Search | Free Support | Temporary License

Product Versions
.NET net5.0 net5.0-windows net6.0 net6.0-android net6.0-ios net6.0-maccatalyst net6.0-macos net6.0-tvos net6.0-windows
.NET Core netcoreapp2.0 netcoreapp2.1 netcoreapp2.2 netcoreapp3.0 netcoreapp3.1
.NET Standard netstandard2.0 netstandard2.1
.NET Framework net20 net35 net40 net403 net45 net451 net452 net46 net461 net462 net463 net47 net471 net472 net48
MonoAndroid monoandroid
MonoMac monomac
MonoTouch monotouch
Tizen tizen40 tizen60
Xamarin.iOS xamarinios
Xamarin.Mac xamarinmac
Xamarin.TVOS xamarintvos
Xamarin.WatchOS xamarinwatchos
Compatible target framework(s)
Additional computed target framework(s)
Learn more about Target Frameworks and .NET Standard.

NuGet packages (1)

Showing the top 1 NuGet packages that depend on GroupDocs.Parser:

Package Downloads
Conholdate.Total

Conholdate.Total for .NET is a complete package to work with a large number of file formats from Microsoft Word, Excel, PowerPoint, Outlook, Project, Visio, Adobe Acrobat, Illustrator, Photoshop, AutoCAD, OpenOffice and many more. Conholdate.Total for .NET allows you to use any API released under Aspose and GroupDocs for .NET in order to create, convert, read, edit, update and print popular document formats. Moreover, you may view, annotate, watermark, assemble, classify, search, redact, parse, merge and compare documents without needing to install the native applications. It helps you in file format manipulation and document automation via simple API. Conholdate.Total for .NET also includes specialized APIs to read and create barcodes, extract text from images using OCR as well as extract human marked data from questioners, surveys, quizzes, MCQ papers and feedback forms.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version Downloads Last updated
22.6.0 2,669 6/7/2022
22.2.0 14,307 2/25/2022
21.5.0 21,926 5/31/2021
21.2.0 4,737 2/22/2021
20.12.0 4,281 12/30/2020
20.10.0 21,726 10/27/2020
20.8.0 2,798 8/19/2020
20.6.1 2,534 6/30/2020
20.6.0 1,205 6/19/2020
20.5.0 2,136 5/8/2020
20.3.0 2,493 3/19/2020
20.1.0 1,888 1/31/2020
19.12.0 1,269 12/27/2019
19.11.0 829 11/22/2019
19.9.0 510 9/27/2019
19.5.0 684 5/29/2019
18.12.0 843 12/11/2018
18.11.0 620 11/8/2018
18.10.0 681 10/10/2018
18.9.0 622 9/5/2018
18.8.0 727 8/7/2018
18.7.0 782 7/3/2018
18.5.0 869 5/23/2018