How to Actually Scrape Twitter/X Data with n8n
How to Scrape Unlimited Tweets from X
Introduction to the Workflow
- The speaker introduces a workflow designed for scraping an unlimited number of tweets from X, emphasizing its utility for market analysis, competitor insights, and staying updated on industry trends.
- The resources, including the workflow and Google Sheet template, are offered for free in exchange for joining a community.
Live Demo of the Workflow
- A live demonstration begins with the workflow setup; it scrapes tweets by running through a specified count (default is three).
- The process involves setting variables and pagination before looping back to continue scraping until the set limit is reached.
- After completing the runs, 58 tweets are successfully scraped and displayed in a Google Sheet with relevant data such as Tweet ID, URL, content, likes, retweets, replies, quotes, views, and date.
Analyzing Scraped Data
- The speaker navigates through the Google Sheet showcasing individual tweets along with their metrics like views and engagement.
- Emphasis is placed on verifying that real information is being captured accurately in the data sheet.
Setting Up API Calls
- Transitioning into API setup details; Twitter's API will be used for accessing tweet data.
- Pricing structure of Twitter's API is discussed—15 cents per 1,000 tweets—and an incentive link providing $6 credit upon signup is mentioned.
Understanding API Documentation
- Overview of different endpoints available within Twitter’s API documentation: user actions (tweets/followers), tweet endpoints (ID retrieval), and advanced search functionalities.
- Simplification of navigating API documentation; copying curl commands can streamline setting up HTTP requests in workflows.
Importing Curl Commands
- Instructions on importing curl commands into new workflows to auto-populate necessary fields for making requests efficiently.
- Explanation of method types (GET/POST), endpoint structures based on functionality (e.g., advanced search vs. user info).
This structured markdown file provides a comprehensive overview of how to scrape tweets using a specific workflow while detailing key steps involved in setting up APIs effectively.
API Key Configuration and Twitter Search Setup
Understanding API Key Authorization
- The authorization process involves a key-value pair where the key is
x-api-keyand the value is your specific API key. This header format is crucial for authentication.
- Users are instructed to access their dashboard, click on their profile, and copy the API key for use in subsequent steps.
Setting Up Authentication in NN
- Instead of manually entering the API key each time, users can save it under the authentication tab for future use. This streamlines the process significantly.
- The configuration requires selecting "header" as the credential type since that aligns with documentation requirements. Users must input
x-api-keyas the key and paste their copied API key as its value.
Configuring Query Parameters for Twitter Search
- Two required fields need to be filled:
query(the search term) andquery type(either latest or top tweets). For example, searching for "open Ai" retrieves relevant tweets.
- The query type can be set to either "latest" or "top," affecting which tweets are returned based on performance metrics like views and likes.
Executing a Test Search
- After setting up parameters, users can initiate a test search. The demo shows how to retrieve recent tweets about trending topics such as "Manis."
- Pagination through results is discussed; each page typically returns around 20 tweets. Users can adjust settings if they want more results.
Analyzing Returned Data
- Upon executing a search, users receive data containing tweet IDs and URLs. They are encouraged to validate this information by checking external sources like Google.
- A demonstration of refining data extraction follows, with instructions on accessing additional resources via community links provided in video descriptions.
Additional Resources
- Viewers are directed to join a free school community for code snippets related to workflow setups. Links will guide them to downloadable resources that enhance understanding of workflows used in demonstrations.
Join the Community for Hands-On Learning
Overview of the Paid Community
- The speaker encourages joining a paid community for a more hands-on approach to learning about building agents, vector databases, APIs, and HTTP requests.
- The community is designed for learners at all levels, not just experts, aiming to simplify complex topics.
Extracting Twitter Data
Initial Data Extraction
- The speaker discusses configuring a code node to extract data from tweets, mentioning that they can pull various objects like tweet IDs and URLs.
- A total of 23 tweets were extracted with details such as content, like count, and view count; one tweet notably gained significant views.
Storing Data in Google Sheets
- The next step involves using a Google Sheets node to append rows with the extracted Twitter data.
- The process includes mapping columns in the Google Sheet to corresponding values from the extracted data.
Formatting and Verifying Data
Preparing Data for Submission
- The speaker explains how they formatted dates in a more human-readable way before sending data to Google Sheets.
- After hitting play on the workflow, all 23 tweets are successfully populated into the sheet with clickable links for verification.
Implementing Scheduled Triggers
Expanding Data Collection
- Discussion shifts towards automating data scraping by setting up scheduled triggers to collect more tweets daily (e.g., 100 tweets).
Understanding Pagination
- The speaker references API documentation regarding cursor-based pagination necessary for retrieving additional pages of results.
Setting Up Cursor Logic
Configuring Cursor Parameters
- Explanation of how cursors work in pagination; each request returns a new cursor value used for subsequent requests.
Initializing Count Variable
- A hardcoded count variable is set to one initially; this will be used throughout the scraping process alongside cursor values.
Making API Calls
Structuring API Requests
- Details on constructing an API call similar to previous examples but now including cursor parameters for fetching top tweets based on queries.
Understanding Dynamic Cursor Management in API Calls
Overview of the Process
- The discussion begins with an explanation of how to retrieve top results from OpenAI, highlighting the use of a cursor and a counter that increments with each run.
- The concept of looping back is introduced, emphasizing the importance of referencing the most recent cursor rather than an absolute node for accurate data retrieval.
Data Extraction and Counting
- The speaker details extracting tweets from three runs, noting that run one returned 18 tweets, run two returned 20, and run three also returned 20, totaling 58 tweets added to Google Sheets.
- A simple conditional check is implemented to end the process when the count reaches three. This relies on referencing the counter node for accuracy.
Setting Up Variables Dynamically
- The process involves setting items to one for clarity and dynamically adjusting the counter based on previous counts during each run.
- Each run's counter is incremented by one dynamically while ensuring that it references either JSON count or cursor correctly.
Importance of Dynamic References
- The speaker explains how dynamic settings can be confusing but are crucial for maintaining up-to-date information throughout multiple runs.
- Emphasis is placed on using dollar sign JSON notation for flexibility in referencing immediate nodes versus absolute references.
Conclusion and Recommendations
- The speaker encourages viewers to download templates and explore them hands-on to better understand dynamic variable management in API calls.
- A reminder about using dollar sign JSON for flexible referencing concludes this section, along with an invitation for viewer feedback on future content.