A simple and modern web application that allows users to extract HTML content from any public webpage. Built with React, this tool provides a clean interface for scraping web pages and downloading the results as text files.
Web Scraper is a lightweight React application designed to fetch and download HTML content from any publicly accessible URL. The application features a responsive design, dark/light mode toggle, and robust error handling to provide a smooth user experience.
The application is built using the following technologies:
The project follows a standard Create React App structure:
webscrapper/
├── public/
│ ├── index.html # HTML template
│ ├── manifest.json # PWA manifest
│ ├── favicon.ico # Application icon
│ └── ... # Other static assets
├── src/
│ ├── App.js # Main application component
│ ├── App.css # Application styles
│ ├── index.js # React entry point
│ ├── index.css # Global styles
│ ├── serviceWorker.js # Service worker for PWA
│ └── downloadjs/ # Legacy download utility (not in use)
├── package.json # Dependencies and scripts
└── README.md # This file
git clone https://github.com/pappater/webscrapper.git
cd webscrapper
npm install
npm start
The application will automatically open in your default browser at http://localhost:3000.
https://example.com)Click the theme toggle button in the top-right corner to switch between dark and light modes. The application remembers your system’s color scheme preference on first load.
The application provides clear error messages for common issues:
In the project directory, you can run the following commands:
npm startRuns the application in development mode. Open http://localhost:3000 to view it in the browser. The page will reload automatically when you make changes.
npm run buildBuilds the application for production to the build folder. The build is optimized and minified for best performance.
npm testLaunches the test runner in interactive watch mode.
npm run deployDeploys the built application to GitHub Pages.
This application is configured for deployment to GitHub Pages.
npm run deploy
https://pappater.github.io/webscrapperThe deployment process automatically builds the application and pushes it to the gh-pages branch.
The application is ready for deployment. After merging this PR, run the following command to deploy:
npm run deploy
The application will then be available at: https://pappater.github.io/webscrapper
For detailed deployment instructions, see DEPLOYMENT.md
The application uses the Zenscrape API for web scraping functionality. The current implementation includes a demo API key with limited usage.
To use this application in production:
src/App.jsThe application uses cors-anywhere.herokuapp.com as a CORS proxy for development. This service has rate limits and is not recommended for production use. Consider setting up your own proxy server for production deployments.
The application is compatible with modern browsers:
Mobile browsers are fully supported with responsive design.
Contributions are welcome! To contribute to this project:
git checkout -b feature/amazing-feature)git commit -m 'Add some amazing feature')git push origin feature/amazing-feature)Potential improvements for future versions:
This project is licensed under the MIT License. See the LICENSE file for details.
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files, to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software.
For issues, questions, or suggestions:
Built with care by the Web Scraper team. Happy scraping!