HiAnimez.to Scraper: An Anime Data Extractor API in Python

Don’t want to read everything first? Here’s the fastest way to get a scraper working right now.

  1. 1. Install the libraries:
pip install requests beautifulsoup4 lxml

2. Copy this script and run it:

import requests
from bs4 import BeautifulSoup

# Fetch the page
url = "https://books.toscrape.com/"
response = requests.get(url)

# Parse the HTML
soup = BeautifulSoup(response.text, "lxml")

# Extract book titles and prices
for book in soup.select("article.product_pod"):
    title = book.select_one("h3 a")["title"]
    price = book.select_one(".price_color").text.strip()
    print(f"{title} - {price}")

3. Run it:

python scraper.py

Expected output:

A Light in the Attic — £51.77
Tipping the Velvet — £53.74
Soumission — £50.10
...

What Is Web Scraping?

Every time you visit a website, your browser downloads a bunch of HTML, CSS, and JavaScript and turns it into the page you see on screen. Web scraping is simply the process of writing a program that does the same thing but instead of displaying the page, it reads it and pulls out the data you care about.

Before You Start: Is It Legal?

This is the first question every beginner should ask and the honest answer is: it depends.

Here are a few ground rules to keep you on the right side of things:

  • Check robots.txt – Visit https://example.com/robots.txt. This file tells you which parts of a site the owner doesn’t want bots to access. Respect it.
  • Read the Terms of Service – Some sites explicitly prohibit scraping. If they do, don’t scrape them.
  • Don’t overload servers – Add delays between your requests. A flood of rapid requests can crash a server and land you in legal trouble.
  • Don’t scrape personal data – Names, emails, and private information are protected in many countries.

Tools You’ll Need

Python is the go-to language for web scraping, thanks to its simple syntax and powerful libraries. Here’s what we’ll use in this guide:

LibraryWhat it does
requestsFetches the raw HTML of a web page
BeautifulSoupParses HTML and lets you search through it
lxmlA fast HTML parser used alongside BeautifulSoup

Install them all in one command:

pip install requests beautifulsoup4 lxml

Fetching a Web Page

The first step is downloading the HTML of the page you want to scrape. The requests library makes this dead simple.

import requests

url = "https://books.toscrape.com/"
response = requests.get(url)

print(response.status_code)  # 200 means success
print(response.text[:500])   # Print first 500 characters of HTML

A status code of 200 means everything went fine. If you see 403, the site is blocking you. If you see 404, the page doesn’t exist.

Parsing HTML with BeautifulSoup

Raw HTML is messy and hard to work with directly. BeautifulSoup turns it into a structured object you can navigate easily.

from bs4 import BeautifulSoup

soup = BeautifulSoup(response.text, "lxml")

# Get the page title
title = soup.find("title")
print(title.text)  # e.g., "All products | Books to Scrape"

Think of soup as a smart document you can ask questions like: “Find me all the <h3> tags” or “Give me every link on this page.”

Finding Elements

BeautifulSoup gives you two main tools for finding elements:

find() returns the first match

first_book = soup.find("article", class_="product_pod")
print(first_book)

find_all() returns all matches as a list

all_books = soup.find_all("article", class_="product_pod")
print(f"Found {len(all_books)} books on this page")

CSS Selectors with select()

If you’re familiar with CSS, you can use selectors directly:

# Select all <h3> tags inside an article tag
titles = soup.select("article.product_pod h3 a")
for t in titles:
    print(t["title"])  # Get the "title" attribute

Extract the Data You Want

Let’s put it all together and extract the title and price of every book on the page.

import requests
from bs4 import BeautifulSoup

url = "https://books.toscrape.com/"
response = requests.get(url)
soup = BeautifulSoup(response.text, "lxml")

books = []

for article in soup.select("article.product_pod"):
    title = article.select_one("h3 a")["title"]
    price = article.select_one(".price_color").text.strip()
    rating = article.select_one("p.star-rating")["class"][1]  # e.g. "Three"

    books.append({
        "title": title,
        "price": price,
        "rating": rating,
    })

for book in books[:5]:
    print(book)

Output:

{'title': 'A Light in the Attic', 'price': '£51.77', 'rating': 'Three'}
{'title': 'Tipping the Velvet', 'price': '£53.74', 'rating': 'One'}
...

Handling Multiple Pages

Most real-world sites spread data across multiple pages. Here’s how to loop through them automatically.

import requests
from bs4 import BeautifulSoup
import time

BASE_URL = "https://books.toscrape.com/catalogue/"
all_books = []
page = 1

while True:
    url = f"{BASE_URL}page-{page}.html"
    response = requests.get(url)

    # Stop if the page doesn't exist
    if response.status_code == 404:
        break

    soup = BeautifulSoup(response.text, "lxml")

    for article in soup.select("article.product_pod"):
        title = article.select_one("h3 a")["title"]
        price = article.select_one(".price_color").text.strip()
        all_books.append({"title": title, "price": price})

    print(f"Scraped page {page} - {len(all_books)} books so far")
    page += 1
    time.sleep(1)  # Be polite - wait 1 second between requests

print(f"\nTotal books scraped: {len(all_books)}")

Saving Your Data

Once you’ve collected data, you’ll want to save it. CSV and JSON are the most common formats.

Save as CSV

import csv

with open("books.csv", "w", newline="", encoding="utf-8") as f:
    writer = csv.DictWriter(f, fieldnames=["title", "price"])
    writer.writeheader()
    writer.writerows(all_books)

print("Saved to books.csv")

Save as JSON

import json

with open("books.json", "w", encoding="utf-8") as f:
    json.dump(all_books, f, indent=2, ensure_ascii=False)

print("Saved to books.json")

Common Pitfalls & How to Avoid Them

Getting Blocked

Sites detect bots by looking at your request headers. Fix this by faking a browser identity:

headers = {
    "User-Agent": (
        "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
        "AppleWebKit/537.36 (KHTML, like Gecko) "
        "Chrome/120.0.0.0 Safari/537.36"
    )
}
response = requests.get(url, headers=headers)

JavaScript-Rendered Pages

Some sites load content dynamically via JavaScript so requests gets back an empty page. In that case, you’ll need Playwright or Selenium to control a real browser.

pip install playwright
playwright install chromium

Fragile Selectors

If a site changes its HTML layout, your selectors will break. Add error handling so your scraper doesn’t crash:

price_tag = article.select_one(".price_color")
price = price_tag.text.strip() if price_tag else "N/A"

Missing or None values

Always check if an element exists before accessing its text or attributes, or you’ll get an AttributeError.

A Complete, Reusable Scraper Template

Here’s a clean template you can adapt for almost any scraping project:

import requests
import json
import time
import logging
from bs4 import BeautifulSoup

logging.basicConfig(level=logging.INFO, format="%(levelname)s | %(message)s")
log = logging.getLogger(__name__)

HEADERS = {
    "User-Agent": (
        "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
        "AppleWebKit/537.36 (KHTML, like Gecko) "
        "Chrome/120.0.0.0 Safari/537.36"
    )
}

def fetch(url: str, retries: int = 3):
    for attempt in range(1, retries + 1):
        try:
            time.sleep(1.5)
            r = requests.get(url, headers=HEADERS, timeout=15)
            r.raise_for_status()
            return BeautifulSoup(r.text, "lxml")
        except requests.RequestException as e:
            log.warning("Attempt %d/%d failed: %s", attempt, retries, e)
            if attempt == retries:
                return None
            time.sleep(attempt * 2)

def scrape():
    data = []
    # --- your scraping logic here ---
    return data

if __name__ == "__main__":
    results = scrape()
    with open("output.json", "w", encoding="utf-8") as f:
        json.dump(results, f, indent=2, ensure_ascii=False)
    log.info("Done! Saved %d records.", len(results))

Summary

Web scraping with Python boils down to three steps:

  1. Fetch the page HTML using requests.
  2. Parse the HTML using BeautifulSoup.
  3. Extract the data using CSS selectors or find()/find_all().

Add polite delays, handle errors gracefully, and always check the site’s terms before you start. With just these tools, you can collect virtually any publicly available data on the web.

Related blog posts