CrawlSharp 1.0.2
dotnet add package CrawlSharp --version 1.0.2
NuGet\Install-Package CrawlSharp -Version 1.0.2
<PackageReference Include="CrawlSharp" Version="1.0.2" />
paket add CrawlSharp --version 1.0.2
#r "nuget: CrawlSharp, 1.0.2"
// Install CrawlSharp as a Cake Addin #addin nuget:?package=CrawlSharp&version=1.0.2 // Install CrawlSharp as a Cake Tool #tool nuget:?package=CrawlSharp&version=1.0.2
<img src="https://raw.githubusercontent.com/jchristn/CrawlSharp/refs/heads/main/assets/icon.png" width="256" height="256">
CrawlSharp
CrawlSharp is a library and integrated webserver for crawling basic web content.
New in v1.0.x
- Initial release
Bugs, Feedback, or Enhancement Requests
Please feel free to start an issue or a discussion!
Simple Example, Embedded
Embedding CrawlSharp into your application is simple and requires minimal configuration. Refer to the Test
project for a full example.
using CrawlSharp;
Settings settings = new Settings();
settings.Crawl.StartUrl = "http://www.mywebpage.com";
WebCrawler crawler = new WebCrawler(settings);
await foreach (WebResource resource in crawler.Crawl())
Console.WriteLine(resource.Status + ": " + resource.Url);
Web Resources
Objects crawled using CrawlSharp have the following properties:
Url
- the URL from which the resource was retrievedParentUrl
- the URL from which theUrl
was identifiedDepth
- the depth level at which theUrl
was identifiedStatus
- the HTTP status code returned when retrieving theUrl
ContentLength
- the content length of the body returned when retrievingUrl
ContentType
- the content type returned while retrievingUrl
Headers
- aNameValueCollection
with the headers returned while retrievingUrl
Data
- abyte[]
containing the data returned while retrievingUrl
REST API
CrawlSharp includes a project called CrawlSharp.Server
which allows you to deploy a RESTful front-end for CrawlSharp. Refer to REST_API.md
and also the Postman collection in the root of this repository for details.
CrawlSharp.Server
will by default listen on host localhost
and port 8000
, meaning it will not accept requests from outside of the machine.
To change this, specify the hostname as the first argument and the port as the second, i.e. dotnet CrawlSharp.Server myhostname.com 8888
.
$ dotnet CrawlSharp.Server
_ _ _
___ _ __ __ ___ _| | _| || |_
/ __| '__/ _` \ \ /\ / / | |_ .. _|
| (__| | | (_| |\ V V /| | |_ _|
\___|_| \__,_| \_/\_/ |_| |_||_|
(c)2025 Joel Christner
Usage:
crawlsharp [hostname] [port]
Where:
[hostname] is the hostname or IP address on which to listen
[port] is the port number, greater than or equal to zero, and less than 65536
NOTICE
------
Configured to listen on local address 'localhost'
Service will not receive requests from outside of localhost
Webserver started on http://localhost:8000/
2025-03-01 20:39:17 joel-laptop Info [CrawlSharpServer] server started
Refer to REST_API.md
for more information about using the RESTful API.
Running in Docker
A Docker image is available in Docker Hub under jchristn/crawlsharp
. Use the Docker Compose start (compose-up.sh
and compose-up.bat
) and stop (compose-down.sh
and compose-down.bat
) scripts in the Docker
directory if you wish to run within Docker Compose.
Version History
Please refer to CHANGELOG.md
for version history.
Product | Versions Compatible and additional computed target framework versions. |
---|---|
.NET | net8.0 is compatible. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. net9.0 was computed. net9.0-android was computed. net9.0-browser was computed. net9.0-ios was computed. net9.0-maccatalyst was computed. net9.0-macos was computed. net9.0-tvos was computed. net9.0-windows was computed. |
-
net8.0
- HtmlAgilityPack (>= 1.11.74)
- RestWrapper (>= 3.1.4)
- SerializationHelper (>= 2.0.3)
NuGet packages
This package is not used by any NuGet packages.
GitHub repositories
This package is not used by any popular GitHub repositories.
Initial release