The next Gen of AI Scrapers IS HERE!

The next Gen of AI Scrapers IS HERE!

Introduction to AI Scraping Tools

Overview of Universal Web Scraper and AI Scraper

  • The speaker introduces the topic, discussing the universal web scraper previously covered in videos.
  • Acknowledges viewer interest in an AI scraper capable of handling pagination and actions like logging in or filtering before scraping.
  • Clarifies that implementing these features into the existing universal web scraper would require significant effort.

Introducing Agent QL

Features of Agent QL

  • Introduces Agent QL as a tool that can perform actions and scrape data effectively, unlike the universal web scraper.
  • Mentions that this video is not sponsored, although there was an offer for sponsorship; it will serve as a review.

Getting Started with Agent QL

Setting Up API Key

  • Guides viewers to obtain an API key from agentql.com by logging in and creating a new key.
  • Instructs users to add the Agent QL extension from the Chrome Store for easier access during scraping tasks.

Using Agent QL for Data Extraction

Semantic Element Detection

  • Demonstrates how to find elements semantically using simple commands instead of complex XPaths.
  • Shows how to search for specific listings (e.g., dentists in Nashville), emphasizing ease of use.

Scraping Data

  • Explains how to scrape all relevant data from a page by defining desired fields such as business name, phone number, address, etc.
  • Highlights functionality allowing users to exclude unwanted elements (like ads) during scraping.

Setting Up Your Coding Environment

Project Initialization

  • Advises on setting up a new project in VS Code and creating a virtual environment for coding.
  • Recommends storing the API key securely within the project files after copying it from the website.

Coding with Agent QL

Importing Necessary Libraries

  • Lists essential libraries needed: agent ql, Playwright's sync API, OS module, and dotenv for environment variables.

Important Code Structure

  • Emphasizes three critical lines of code necessary for initializing Playwright with headless mode set to false.

Integrating with Agent QL Functions

  • Describes wrapping browser pages within Agent QL functions to utilize its capabilities effectively when querying data.

Querying Data Using Defined Syntax

Executing Queries

How to Create a Universal Web Scraper

Introduction to the Scraper

  • The speaker demonstrates a web scraper that opens a website, scrapes data, and displays it in the console. This process is achieved with minimal code.
  • The scraper is described as universal; it can adapt to different websites as long as the same elements are present.

Automating Data Entry

  • The next step involves navigating to Yellow Pages, entering search criteria (business type and location), and clicking the "find" button.
  • A function called query elements is introduced, which retrieves all relevant elements from the webpage for interaction.

Filling Out Search Criteria

  • The speaker outlines how to structure input data within a multi-line string using curly brackets for organization.
  • Variables are created for business type and address, allowing dynamic input into the search fields before executing a click on the find button.

Pagination and Data Scraping

  • After submitting the search form, pagination logic is implemented to scrape multiple pages of results (limited to three pages).
  • The script identifies and clicks on the "next page" button using page.query element, ensuring smooth navigation through results.

Finalizing Data Collection

  • A timeout of 2 seconds is added between requests to prevent being flagged as a bot by the website due to rapid scraping.
  • Results are printed out with page numbers for clarity during execution. An output file (output.txt) captures scraped data for review.

Conclusion of Scraping Process

  • The scraper successfully navigates through multiple pages, demonstrating its functionality despite minor loading issues on some pages.
Video description

Hello Everyone, This is a library I used and found interesting and wanted to share with you. ________ ๐Ÿ‘‡ Links ๐Ÿ‘‡ ________ ๐Ÿค Discord: https://discord.gg/jUe948xsv4 ๐Ÿ’ผ ๐—Ÿ๐—ถ๐—ป๐—ธ๐—ฒ๐—ฑ๐—œ๐—ป: https://www.linkedin.com/in/reda-marzouk-rpa/ ๐Ÿ“ธ ๐—œ๐—ป๐˜€๐˜๐—ฎ๐—ด๐—ฟ๐—ฎ๐—บ: https://www.instagram.com/redamarzouk.rpa/ ๐Ÿค– ๐—ฌ๐—ผ๐˜‚๐—ง๐˜‚๐—ฏ๐—ฒ: https://www.youtube.com/@redamarzouk/videos ๐ŸŒ Website: https://www.automation-campus.com/