CrawlSharp 1.0.0

There is a newer version of this package available.
See the version list below for details.
dotnet add package CrawlSharp --version 1.0.0                
NuGet\Install-Package CrawlSharp -Version 1.0.0                
This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.
<PackageReference Include="CrawlSharp" Version="1.0.0" />                
For projects that support PackageReference, copy this XML node into the project file to reference the package.
paket add CrawlSharp --version 1.0.0                
#r "nuget: CrawlSharp, 1.0.0"                
#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.
// Install CrawlSharp as a Cake Addin
#addin nuget:?package=CrawlSharp&version=1.0.0

// Install CrawlSharp as a Cake Tool
#tool nuget:?package=CrawlSharp&version=1.0.0                

<img src="https://github.com/jchristn/CrawlSharp/blob/main/assets/favicon.png" width="256" height="256">

CrawlSharp

NuGet Version NuGet

CrawlSharp is a library and integrated webserver for crawling basic web content.

New in v1.0.x

  • Initial release

Bugs, Feedback, or Enhancement Requests

Please feel free to start an issue or a discussion!

Simple Example, Embedded

Embedding CrawlSharp into your application is simple and requires minimal configuration. Refer to the Test project for a full example.

using CrawlSharp;

Settings settings = new Settings();
settings.Crawl.StartUrl = "http://www.mywebpage.com";

WebCrawler crawler = new WebCrawler(settings);

await foreach (WebResource resource in crawler.Crawl())
  Console.WriteLine(resource.Status + ": " + resource.Url);

Web Resources

Objects crawled using CrawlSharp have the following properties:

  • Url - the URL from which the resource was retrieved
  • ParentUrl - the URL from which the Url was identified
  • Depth - the depth level at which the Url was identified
  • Status - the HTTP status code returned when retrieving the Url
  • ContentLength - the content length of the body returned when retrieving Url
  • ContentType - the content type returned while retrieving Url
  • Headers - a NameValueCollection with the headers returned while retrieving Url
  • Data - a byte[] containing the data returned while retrieving Url

REST API

CrawlSharp includes a project called CrawlSharp.Server which allows you to deploy a RESTful front-end for CrawlSharp. Refer to REST_API.md and also the Postman collection in the root of this repository for details.

CrawlSharp.Server will by default listen on host localhost and port 8000, meaning it will not accept requests from outside of the machine.

To change this, specify the hostname as the first argument and the port as the second, i.e. dotnet CrawlSharp.Server myhostname.com 8888.

$ dotnet CrawlSharp.Server 

                          _     _  _
   ___ _ __ __ ___      _| |  _| || |_
  / __| '__/ _` \ \ /\ / / | |_  ..  _|
 | (__| | | (_| |\ V  V /| | |_      _|
  \___|_|  \__,_| \_/\_/ |_|   |_||_|

(c)2025 Joel Christner


Usage:
  crawlsharp [hostname] [port]

Where:
  [hostname] is the hostname or IP address on which to listen
  [port] is the port number, greater than or equal to zero, and less than 65536

NOTICE
------
Configured to listen on local address 'localhost'
Service will not receive requests from outside of localhost

Webserver started on http://localhost:8000/

2025-03-01 20:39:17 joel-laptop Info [CrawlSharpServer] server started

Refer to REST_API.md for more information about using the RESTful API.

Running in Docker

A Docker image is available in Docker Hub under jchristn/crawlsharp. Use the Docker Compose start (compose-up.sh and compose-up.bat) and stop (compose-down.sh and compose-down.bat) scripts in the Docker directory if you wish to run within Docker Compose.

Version History

Please refer to CHANGELOG.md for version history.

Product Compatible and additional computed target framework versions.
.NET net8.0 is compatible.  net8.0-android was computed.  net8.0-browser was computed.  net8.0-ios was computed.  net8.0-maccatalyst was computed.  net8.0-macos was computed.  net8.0-tvos was computed.  net8.0-windows was computed.  net9.0 was computed.  net9.0-android was computed.  net9.0-browser was computed.  net9.0-ios was computed.  net9.0-maccatalyst was computed.  net9.0-macos was computed.  net9.0-tvos was computed.  net9.0-windows was computed. 
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.

NuGet packages

This package is not used by any NuGet packages.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version Downloads Last updated
1.0.2 159 3/4/2025
1.0.1 145 3/3/2025
1.0.0 66 3/2/2025

Initial release