Orobouros 1.1.3

dotnet add package Orobouros --version 1.1.3                
NuGet\Install-Package Orobouros -Version 1.1.3                
This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.
<PackageReference Include="Orobouros" Version="1.1.3" />                
For projects that support PackageReference, copy this XML node into the project file to reference the package.
paket add Orobouros --version 1.1.3                
#r "nuget: Orobouros, 1.1.3"                
#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.
// Install Orobouros as a Cake Addin
#addin nuget:?package=Orobouros&version=1.1.3

// Install Orobouros as a Cake Tool
#tool nuget:?package=Orobouros&version=1.1.3                

The Orobouros Framework

Orobouros is a C# framework for scraping the web. Many attempts to do this have been created in various languages, but a different approach is taken with Orobouros due to the patented OrobourosModule™ system that allows any person to write their own plugin for any website.

Installation

Orobouros is available as a NuGet package and from the Github Actions page. Keep in mind the pre-compiled builds on GitHub do not include dependencies. If you prefer the .NET CLI, you can also simply run:

dotnet add package Orobouros

On its own, Orobouros does nothing and needs modules to function. A list of publically available modules for download is listed on the GitHub repository.

Building

If you insist on compiling this yourself, all you need is .NET 8 Core. I would not recommend taking advantage of the tests, as they require specific configurations I use in debugging.

Development

Take a look at the TestModule project included in this repo to get a general idea on how to use this framework. XML annotations are also provided. At some point I will create a wiki with relevant information, but for now the core functionality takes priority. Obfuscated code is allowed (as in the framework won't refuse to execute it) but incredibly discouraged due to malware concerns. If you really feel the need to keep your source code hidden, just don't share your module.

Example code to submit a scrape request to the loaded module stack:

ScrapingManager.InitializeModules(); // Only call this once at the entry point of your application
List<ModuleContent> requestedInfo = new List<ModuleContent> { ModuleContent.Text }; // Content you want to request from the modules. How this is handled is entirely dependent on the module's developer.
ModuleData? data = ScrapingManager.ScrapeURL("https://www.test.com/posts/posthere", requestedInfo); // Perform scrape request and wait for the returned data.
ScrapingManager.FlushSupplementaryMethods(); // Stop background methods. This should be called at least once when the application is exiting.

Example code to return a simple line of text from a module's scrape method:

ModuleData data = new ModuleData();
ProcessedScrapeData exampleInstance = new ProcessedScrapeData(ModuleContent.Text, parameters.URL, "Hello World!");
data.Content.Add(exampleInstance);
return data;

Please consult the XML documentation or the TestModule project for further code examples.

This repository holds no responsibility over any modules programmers develop for this framework. No copywritten content is included in this repo and will never be. If someone has made a module for your website and you don't like it, I cannot help you. You must get in contact with them to resolve such matters. This also applies to potentially illicit/illegal content scraped with modules created by the community.

TODO:

  • Dynamic module loading
  • Raw HTTP support
  • Downloader service
  • Attribute scanning
  • Custom attributes
  • Module init method
  • Module supplementary methods
  • Module scrape method
  • Module options
  • Module return data
  • Module GUIDs
  • Custom library support
  • Referenced library support
  • SQLite support
  • Dynamic database support
  • Website API support (separate from raw HTTP)
  • Cross-module support
  • XML annotations
  • Module security checks
  • Module sanity checks
  • Multiple modules for same website support
  • Improved module error handling
  • SQlite module integration
  • Public module downloader tool
  • Better data sanitizing & JSON storage
  • General framework configuration class
  • Cross-language support (extremely advanced)
  • Data language translation toolkit
  • Overhaul download class & integrate better
  • Logging overhaul
  • Module developer web toolkit
  • Framework-level exception handling
  • Bulk data downloading functions (stored in RAM)

Credits

  • Branden Stober - Main Project Lead
  • ImSoupp - Reflection Help & Database Help
  • CTAG - Database Help
Product Compatible and additional computed target framework versions.
.NET net8.0 is compatible.  net8.0-android was computed.  net8.0-browser was computed.  net8.0-ios was computed.  net8.0-maccatalyst was computed.  net8.0-macos was computed.  net8.0-tvos was computed.  net8.0-windows was computed. 
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.

NuGet packages

This package is not used by any NuGet packages.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version Downloads Last updated
1.1.3 149 4/3/2024
1.1.2 123 3/25/2024
1.1.1 111 3/24/2024
1.1.0 121 3/11/2024
1.0.12 148 3/10/2024
1.0.11 124 3/10/2024
1.0.10 148 3/8/2024
1.0.9 125 3/6/2024
1.0.8 126 3/5/2024
1.0.7 121 3/5/2024
1.0.6 120 3/4/2024
1.0.5 118 3/4/2024
1.0.4 113 3/4/2024
1.0.3 119 3/4/2024
1.0.2 112 3/3/2024
1.0.1 119 3/3/2024
1.0.0 126 3/3/2024