Scraping Framework containing :
- a web client able to simulate a web browser.
- an HtmlAgilityPack extension to select elements using css selector (like JQuery)
Turn unstructured HTML pages into structured data. The OpenScraping library can extract information from HTML pages using a JSON config file with xPath rules. It can scrape even multi-level complex objects such as tables and forum posts.
A .NET Standard library to extract the main content of a web page based on a port of the Readability library by Mozilla. It also determine and gather metadata about the content, such as language, author, main image, etc.
SgmlReader for Portable Library.
SgmlReader is "SGML" markup language parser, and derived from System.Xml.XmlReader in .NET CLR.
But, most popular usage the "HTML" parser. (It's scraper!!)
/* Use SgmlReader in Html parse mode. */
XDocument document = SgmlReader.Parse(stream);
Done!
dcsoup is a .NET library for working with real-world HTML. It provides a very convenient API for extracting and manipulating data, using the best of DOM, CSS, and jquery-like methods.
This library is basically a port of jsoup, a Java HTML parser library. see also: http://jsoup.org/
API reference is...
More information
Search Results via SERP API. Hash, JSON, and HTML format supported for Google, Bing, Baidu, Yandex, Ebay, Google Product, Youtube, Wallmart and more...
libvideo (aka VideoLibrary) is a modern .NET library for downloading YouTube videos. It is portable to most platforms and is very lightweight.
Find us on GitHub at https://github.com/omansak/libvideo
This client library enables working with Robots.txt.
Key Features:
- Parse robots.txt into Typed object.
- Lookup Allowed/Disallowed/Crawldelay based on User-Agent.
- Traverse sitemap in robots.txt for urls.
For More info see:...
More information
ExcavatorSharp is a multi-threaded server for scraping web data. It converts HTML code into a structured array of data. The library allows data scraping from multiple sites in parallel mode, within a single running application. Create scraping tasks and perform data extraction on a schedule.
The...
More information