Scraping Framework containing :
- a web client able to simulate a web browser.
- an HtmlAgilityPack extension to select elements using css selector (like JQuery)
Turn unstructured HTML pages into structured data. The OpenScraping library can extract information from HTML pages using a JSON config file with xPath rules. It can scrape even multi-level complex objects such as tables and forum posts.
dcsoup is a .NET library for working with real-world HTML. It provides a very convenient API for extracting and manipulating data, using the best of DOM, CSS, and jquery-like methods.
This library is basically a port of jsoup, a Java HTML parser library. see also: http://jsoup.org/
API reference is...
More information
libvideo is a fast, clean way to download YouTube videos. It is fully portable and has no dependencies.
Find us on GitHub at https://github.com/omansak/libvideo
Iron WebScraper is a C# web scraping library, allowing developers to simulate & automate human browsing behavior to extract content, files & images from web applications as native .Net objects. Iron Web Scraper manages politeness & multithreading in the background, leaving a developer’s own...
More information
SgmlReader for Portable Library.
SgmlReader is "SGML" markup language parser, and derived from System.Xml.XmlReader in .NET CLR.
But, most popular usage the "HTML" parser. (It's scraper!!)
/* Use SgmlReader in Html parse mode. */
XDocument document = SgmlReader.Parse(stream);
Done!
ExcavatorSharp is a multi-threaded server for scraping web data. It converts HTML code into a structured array of data. The library allows data scraping from multiple sites in parallel mode, within a single running application. Create scraping tasks and perform data extraction on a schedule.
The...
More information