Scraping
All API requests must be authenticated with an API key sent in the x-api-key
HTTP header. You can obtain an API key by signing up for an account at rocketscraper.com/signup.
POST /scrape
Extracts structured data from a webpage according to your specified schema.
Authentication
Include your API key in the x-api-key
HTTP header:
x-api-key: YOUR_API_KEY
Request Parameters
Parameter | Type | Required | Description |
---|---|---|---|
url | string | Yes | The URL of the webpage to scrape |
schema | object | Yes | The structure definition for the data to be extracted |
task_description | string | No | Additional instructions for the AI system (recommended for complex tasks such as summarization, sentiment analysis, translation, etc.) |
Schema Types
The schema defines the structure of the data you want to extract. Supported data types include:
Type | Description |
---|---|
boolean | Represents a true or false value |
integer | Represents an integer value |
number | Represents any numeric value, including integers and floating-point numbers |
string | Represents a sequence of characters |
array | Represents an ordered list of items |
object | Represents a JSON object, which is a collection of key-value pairs |
Schema Best Practices
When defining your schema, use descriptive field names that clearly communicate your extraction requirements to the AI. The more specific and descriptive your field names are, the better the AI can understand and fulfill your requirements.
Examples of Good vs Basic Field Names
Basic Field | Better Field Name | Description |
---|---|---|
price | currentSalePriceUSD | Specifies currency and price type |
date | publicationDateISO | Indicates expected date format |
description | productShortDescription | Clarifies the type and length of description |
rating | averageUserRatingOutOf5 | Specifies the rating scale |
features | technicalSpecifications | More precise about the expected content |
Example with Descriptive Fields
{
"productName": "string",
"manufacturerBrandName": "string",
"currentSalePriceUSD": "number",
"originalRetailPriceUSD": "number",
"productShortDescription": "string",
"technicalSpecifications": [{
"specificationName": "string",
"specificationValue": "string"
}],
"averageUserRatingOutOf5": "number",
"totalUserReviews": "integer",
"inStockStatus": "boolean",
"estimatedShippingDaysRange": {
"minimum": "integer",
"maximum": "integer"
}
}
Basic Example
Here's a basic example of scraping product information. The AI model performs information extraction by analyzing the webpage content and identifying the requested data points based on context, layout, and semantic understanding - no CSS selectors or XPath required:
- Python
- Node.js
- Curl
from rocketscraper import RocketClient
try:
client = RocketClient('YOUR_API_KEY')
schema = {
"title": "string",
"price": "number",
"inStock": "boolean"
}
result = client.scrape('https://example.com/product', schema)
print(result)
except Exception as e:
print(f"Error: {e}")
import { createRocketClient } from 'rocketscraper';
try {
const client = createRocketClient({ apiKey: 'YOUR_API_KEY' });
const schema = {
title: 'string',
price: 'number',
inStock: 'boolean'
};
const result = await client.scrape({
url: 'https://example.com/product',
schema
});
console.log(result);
} catch (error) {
console.error('Error:', error.message);
}
curl -X POST \
https://api.rocketscraper.com/scrape \
-H 'x-api-key: YOUR_API_KEY' \
-H 'Content-Type: application/json' \
-d '{
"url": "https://example.com/product",
"schema": {
"title": "string",
"price": "number",
"inStock": "boolean"
}
}'
Example Response
{
"title": "Wireless Bluetooth Headphones",
"price": 79.99,
"inStock": true
}
Advanced Example with Task Description
The task_description
parameter allows you to provide detailed instructions to guide the AI system in complex extraction and analysis tasks. While simple data extraction might work well with just a schema, adding a task description becomes invaluable when dealing with nuanced requirements or when the desired output requires multiple processing steps.
Task descriptions are particularly effective for:
Text Summarization and Analysis When extracting article content, you can guide the AI to focus on specific aspects like key findings, methodology, and implications. For example, you might instruct the system to "Create a three-paragraph summary where the first paragraph covers the main announcement, the second details the methodology, and the third discusses potential industry impact."
Sentiment Analysis with Custom Parameters Rather than getting a simple positive/negative classification, you can specify exactly how sentiment should be evaluated. For instance: "Analyze sentiment by considering technical specifications, user reviews, and price-to-feature ratio, with extra weight given to professional reviewer opinions."
Language Translation with Context When dealing with multilingual content, you can provide context-specific translation instructions like "Translate product descriptions while maintaining technical terminology in English" or "Adapt idiomatic expressions to target culture while preserving the original meaning."
Complex Data Relationships For websites where related information isn't directly connected, you can guide the AI to make logical connections. For example: "Cross-reference product specifications with compatibility information listed in different sections of the page, and create a consolidated compatibility matrix."
Custom Formatting and Validation Rules You can specify exact formatting requirements: "Extract prices across different currencies, normalize them to USD using current exchange rates, and format them with exactly two decimal places."
Here's an example showing how to use task descriptions for complex scraping tasks like news summarization:
- Python
- Node.js
- Curl
from rocketscraper import RocketClient
try:
client = RocketClient('YOUR_API_KEY')
schema = {
"title": "string",
"content": "string",
"summary": "string",
"sentiment": "string",
"key_points": [
{
"description": "string"
}
]
}
task_description = """
Extract and analyze the article content following these steps:
1. Create a concise 3-sentence summary that covers:
- Main announcement or finding
- Key technical details or methodology
- Potential impact or implications
2. Analyze the overall sentiment considering:
- Language tone and word choice
- Reported outcomes and implications
- Expert opinions and quotes
Return either 'positive', 'negative', or 'neutral'
3. Extract 3-5 key points that:
- Highlight major findings
- Include relevant statistics or data
- Capture significant implications
"""
result = client.scrape(
'https://example.com/news-article',
schema,
task_description=task_description
)
print(result)
except Exception as e:
print(f"Error: {e}")
import { createRocketClient } from 'rocketscraper';
try {
const client = createRocketClient({ apiKey: 'YOUR_API_KEY' });
const schema = {
title: 'string',
content: 'string',
summary: 'string',
sentiment: 'string',
key_points: [
{
description: 'string'
}
]
};
const taskDescription = `
Extract and analyze the article content following these steps:
1. Create a concise 3-sentence summary that covers:
- Main announcement or finding
- Key technical details or methodology
- Potential impact or implications
2. Analyze the overall sentiment considering:
- Language tone and word choice
- Reported outcomes and implications
- Expert opinions and quotes
Return either 'positive', 'negative', or 'neutral'
3. Extract 3-5 key points that:
- Highlight major findings
- Include relevant statistics or data
- Capture significant implications
`;
const result = await client.scrape({
url: 'https://example.com/news-article',
schema,
task_description: taskDescription
});
console.log(result);
} catch (error) {
console.error('Error:', error.message);
}
curl -X POST \
https://api.rocketscraper.com/scrape \
-H 'x-api-key: YOUR_API_KEY' \
-H 'Content-Type: application/json' \
-d '{
"url": "https://example.com/news-article",
"schema": {
"title": "string",
"content": "string",
"summary": "string",
"sentiment": "string",
"key_points": "array"
},
"task_description": "Extract the main article content. Then: 1) Create a concise 3-sentence summary focusing on the key findings and implications, 2) Analyze the overall sentiment (positive/negative/neutral) based on language and outcomes, 3) Extract 3-5 key points as bullet points"
}'
Example Response
{
"title": "Breaking News: Tech Innovation",
"content": "Silicon Valley startup TechCorp unveiled a groundbreaking quantum computing breakthrough today. The new technology promises to solve complex calculations in seconds that would take traditional computers years to process. Early testing shows the system operating at unprecedented efficiency levels, with potential applications ranging from drug discovery to climate modeling.",
"summary": "A groundbreaking quantum computing breakthrough was announced by TechCorp. The new system can perform complex calculations exponentially faster than traditional computers. Early testing demonstrates exceptional efficiency with wide-ranging potential applications.",
"sentiment": "positive",
"key_points": [
{
"description": "TechCorp announced a groundbreaking quantum computing breakthrough"
},
{
"description": "The new technology promises to solve complex calculations exponentially faster than traditional computers"
},
{
"description": "Early testing demonstrates exceptional efficiency with wide-ranging potential applications"
}
]
}
Response Format
The API response will always match the structure defined in your schema, returning extracted data in the exact format and types you specified. Any fields that cannot be extracted will return null.