webscrapper

Web Scraper

A simple and modern web application that allows users to extract HTML content from any public webpage. Built with React, this tool provides a clean interface for scraping web pages and downloading the results as text files.

Overview

Web Scraper is a lightweight React application designed to fetch and download HTML content from any publicly accessible URL. The application features a responsive design, dark/light mode toggle, and robust error handling to provide a smooth user experience.

Tech Stack

The application is built using the following technologies:

Features

Core Functionality

User Experience

Screenshots

Light Mode

Web Scraper Light Mode

Dark Mode

Web Scraper Dark Mode

Project Structure

The project follows a standard Create React App structure:

webscrapper/
├── public/
│   ├── index.html          # HTML template
│   ├── manifest.json       # PWA manifest
│   ├── favicon.ico         # Application icon
│   └── ...                 # Other static assets
├── src/
│   ├── App.js              # Main application component
│   ├── App.css             # Application styles
│   ├── index.js            # React entry point
│   ├── index.css           # Global styles
│   ├── serviceWorker.js    # Service worker for PWA
│   └── downloadjs/         # Legacy download utility (not in use)
├── package.json            # Dependencies and scripts
└── README.md              # This file

Installation and Setup

Prerequisites

Installation Steps

  1. Clone the repository to your local machine:
    git clone https://github.com/pappater/webscrapper.git
    cd webscrapper
    
  2. Install the required dependencies:
    npm install
    
  3. Start the development server:
    npm start
    

The application will automatically open in your default browser at http://localhost:3000.

Usage Guide

Basic Usage

  1. Enter a valid URL in the input field (e.g., https://example.com)
  2. Click the “Scrape” button to initiate the scraping process
  3. Wait for the scraping to complete (loading indicator will be displayed)
  4. Once complete, click the “Download” button to save the HTML content
  5. The file will be downloaded with a name based on the URL

Theme Toggle

Click the theme toggle button in the top-right corner to switch between dark and light modes. The application remembers your system’s color scheme preference on first load.

Error Handling

The application provides clear error messages for common issues:

Available Scripts

In the project directory, you can run the following commands:

npm start

Runs the application in development mode. Open http://localhost:3000 to view it in the browser. The page will reload automatically when you make changes.

npm run build

Builds the application for production to the build folder. The build is optimized and minified for best performance.

npm test

Launches the test runner in interactive watch mode.

npm run deploy

Deploys the built application to GitHub Pages.

Deployment

This application is configured for deployment to GitHub Pages.

Deployment Steps

  1. Ensure all changes are committed
  2. Build and deploy the application:
    npm run deploy
    
  3. The application will be available at: https://pappater.github.io/webscrapper

The deployment process automatically builds the application and pushes it to the gh-pages branch.

Live Demo

The application is ready for deployment. After merging this PR, run the following command to deploy:

npm run deploy

The application will then be available at: https://pappater.github.io/webscrapper

For detailed deployment instructions, see DEPLOYMENT.md

API Configuration

The application uses the Zenscrape API for web scraping functionality. The current implementation includes a demo API key with limited usage.

For Production Use

To use this application in production:

  1. Obtain your own API key from Zenscrape
  2. Replace the API key in src/App.js
  3. Consider implementing rate limiting and caching
  4. Set up your own CORS proxy or configure appropriate CORS headers

CORS Proxy Note

The application uses cors-anywhere.herokuapp.com as a CORS proxy for development. This service has rate limits and is not recommended for production use. Consider setting up your own proxy server for production deployments.

Browser Compatibility

The application is compatible with modern browsers:

Mobile browsers are fully supported with responsive design.

Contributing

Contributions are welcome! To contribute to this project:

  1. Fork the repository
  2. Create a new branch for your feature (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add some amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

Contribution Guidelines

Known Limitations

Future Enhancements

Potential improvements for future versions:

License

This project is licensed under the MIT License. See the LICENSE file for details.

MIT License Summary

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files, to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software.

Acknowledgments

Support

For issues, questions, or suggestions:

Version History

Version 0.1.0 (Current)


Built with care by the Web Scraper team. Happy scraping!