Scraping

All API requests must be authenticated with an API key sent in the x-api-key HTTP header. You can obtain an API key by signing up for an account at rocketscraper.com/signup.

POST /scrape

Extracts structured data from a webpage according to your specified schema.

Authentication

Include your API key in the x-api-key HTTP header:

x-api-key: YOUR_API_KEY

Request Parameters

Parameter	Type	Required	Description
`url`	string	Yes	The URL of the webpage to scrape
`schema`	object	Yes	The structure definition for the data to be extracted
`task_description`	string	No	Additional instructions for the AI system (recommended for complex tasks such as summarization, sentiment analysis, translation, etc.)

Schema Types

The schema defines the structure of the data you want to extract. Supported data types include:

Type	Description
`boolean`	Represents a true or false value
`integer`	Represents an integer value
`number`	Represents any numeric value, including integers and floating-point numbers
`string`	Represents a sequence of characters
`array`	Represents an ordered list of items
`object`	Represents a JSON object, which is a collection of key-value pairs

Schema Best Practices

When defining your schema, use descriptive field names that clearly communicate your extraction requirements to the AI. The more specific and descriptive your field names are, the better the AI can understand and fulfill your requirements.

Examples of Good vs Basic Field Names

Basic Field	Better Field Name	Description
`price`	`currentSalePriceUSD`	Specifies currency and price type
`date`	`publicationDateISO`	Indicates expected date format
`description`	`productShortDescription`	Clarifies the type and length of description
`rating`	`averageUserRatingOutOf5`	Specifies the rating scale
`features`	`technicalSpecifications`	More precise about the expected content

Example with Descriptive Fields

{
  "productName": "string",
  "manufacturerBrandName": "string",
  "currentSalePriceUSD": "number",
  "originalRetailPriceUSD": "number",
  "productShortDescription": "string",
  "technicalSpecifications": [{
    "specificationName": "string",
    "specificationValue": "string"
  }],
  "averageUserRatingOutOf5": "number",
  "totalUserReviews": "integer",
  "inStockStatus": "boolean",
  "estimatedShippingDaysRange": {
    "minimum": "integer",
    "maximum": "integer"
  }
}

Basic Example

Here's a basic example of scraping product information. The AI model performs information extraction by analyzing the webpage content and identifying the requested data points based on context, layout, and semantic understanding - no CSS selectors or XPath required:

Python
Node.js
Curl

from rocketscraper import RocketClient

try:
    client = RocketClient('YOUR_API_KEY')
    
    schema = {
        "title": "string",
        "price": "number",
        "inStock": "boolean"
    }
    
    result = client.scrape('https://example.com/product', schema)
    print(result)
except Exception as e:
    print(f"Error: {e}")

import { createRocketClient } from 'rocketscraper';

try {
  const client = createRocketClient({ apiKey: 'YOUR_API_KEY' });
  
  const schema = {
    title: 'string',
    price: 'number',
    inStock: 'boolean'
  };
  
  const result = await client.scrape({ 
    url: 'https://example.com/product',
    schema
  });
  console.log(result);
} catch (error) {
  console.error('Error:', error.message);
}

curl -X POST \
  https://api.rocketscraper.com/scrape \
  -H 'x-api-key: YOUR_API_KEY' \
  -H 'Content-Type: application/json' \
  -d '{
    "url": "https://example.com/product",
    "schema": {
      "title": "string",
      "price": "number",
      "inStock": "boolean"
    }
  }'

Example Response

{
  "title": "Wireless Bluetooth Headphones",
  "price": 79.99,
  "inStock": true
}

Advanced Example with Task Description

The task_description parameter allows you to provide detailed instructions to guide the AI system in complex extraction and analysis tasks. While simple data extraction might work well with just a schema, adding a task description becomes invaluable when dealing with nuanced requirements or when the desired output requires multiple processing steps.

Task descriptions are particularly effective for:

Text Summarization and Analysis When extracting article content, you can guide the AI to focus on specific aspects like key findings, methodology, and implications. For example, you might instruct the system to "Create a three-paragraph summary where the first paragraph covers the main announcement, the second details the methodology, and the third discusses potential industry impact."

Sentiment Analysis with Custom Parameters Rather than getting a simple positive/negative classification, you can specify exactly how sentiment should be evaluated. For instance: "Analyze sentiment by considering technical specifications, user reviews, and price-to-feature ratio, with extra weight given to professional reviewer opinions."

Language Translation with Context When dealing with multilingual content, you can provide context-specific translation instructions like "Translate product descriptions while maintaining technical terminology in English" or "Adapt idiomatic expressions to target culture while preserving the original meaning."

Complex Data Relationships For websites where related information isn't directly connected, you can guide the AI to make logical connections. For example: "Cross-reference product specifications with compatibility information listed in different sections of the page, and create a consolidated compatibility matrix."

Custom Formatting and Validation Rules You can specify exact formatting requirements: "Extract prices across different currencies, normalize them to USD using current exchange rates, and format them with exactly two decimal places."

Here's an example showing how to use task descriptions for complex scraping tasks like news summarization:

Python
Node.js
Curl

from rocketscraper import RocketClient

try:
    client = RocketClient('YOUR_API_KEY')
    
    schema = {
        "title": "string",
        "content": "string",
        "summary": "string",
        "sentiment": "string",
        "key_points": [
          {
            "description": "string"
          }
        ]
    }
    
    task_description = """
    Extract and analyze the article content following these steps:
    
    1. Create a concise 3-sentence summary that covers:
       - Main announcement or finding
       - Key technical details or methodology
       - Potential impact or implications
    
    2. Analyze the overall sentiment considering:
       - Language tone and word choice
       - Reported outcomes and implications
       - Expert opinions and quotes
       Return either 'positive', 'negative', or 'neutral'
    
    3. Extract 3-5 key points that:
       - Highlight major findings
       - Include relevant statistics or data
       - Capture significant implications
    """
    
    result = client.scrape(
        'https://example.com/news-article',
        schema,
        task_description=task_description
    )
    print(result)
except Exception as e:
    print(f"Error: {e}")

import { createRocketClient } from 'rocketscraper';

try {
  const client = createRocketClient({ apiKey: 'YOUR_API_KEY' });
  
  const schema = {
    title: 'string',
    content: 'string',
    summary: 'string',
    sentiment: 'string',
    key_points: [
      {
        description: 'string'
      }
    ]
  };
  
  const taskDescription = `
    Extract and analyze the article content following these steps:
    
    1. Create a concise 3-sentence summary that covers:
       - Main announcement or finding
       - Key technical details or methodology
       - Potential impact or implications
    
    2. Analyze the overall sentiment considering:
       - Language tone and word choice
       - Reported outcomes and implications
       - Expert opinions and quotes
       Return either 'positive', 'negative', or 'neutral'
    
    3. Extract 3-5 key points that:
       - Highlight major findings
       - Include relevant statistics or data
       - Capture significant implications
  `;
  
  const result = await client.scrape({ 
    url: 'https://example.com/news-article',
    schema,
    task_description: taskDescription
  });
  console.log(result);
} catch (error) {
  console.error('Error:', error.message);
}

curl -X POST \
  https://api.rocketscraper.com/scrape \
  -H 'x-api-key: YOUR_API_KEY' \
  -H 'Content-Type: application/json' \
  -d '{
    "url": "https://example.com/news-article",
    "schema": {
      "title": "string",
      "content": "string",
      "summary": "string",
      "sentiment": "string",
      "key_points": "array"
    },
    "task_description": "Extract the main article content. Then: 1) Create a concise 3-sentence summary focusing on the key findings and implications, 2) Analyze the overall sentiment (positive/negative/neutral) based on language and outcomes, 3) Extract 3-5 key points as bullet points"
  }'

Example Response

{
  "title": "Breaking News: Tech Innovation",
  "content": "Silicon Valley startup TechCorp unveiled a groundbreaking quantum computing breakthrough today. The new technology promises to solve complex calculations in seconds that would take traditional computers years to process. Early testing shows the system operating at unprecedented efficiency levels, with potential applications ranging from drug discovery to climate modeling.",
  "summary": "A groundbreaking quantum computing breakthrough was announced by TechCorp. The new system can perform complex calculations exponentially faster than traditional computers. Early testing demonstrates exceptional efficiency with wide-ranging potential applications.",
  "sentiment": "positive",
  "key_points": [
    {
      "description": "TechCorp announced a groundbreaking quantum computing breakthrough"
    },
    {
      "description": "The new technology promises to solve complex calculations exponentially faster than traditional computers"
    },
    {
      "description": "Early testing demonstrates exceptional efficiency with wide-ranging potential applications"
    }
  ]
}

Response Format

The API response will always match the structure defined in your schema, returning extracted data in the exact format and types you specified. Any fields that cannot be extracted will return null.

Scraping

POST /scrape​

Authentication​

Request Parameters​

Schema Types​

Schema Best Practices​

Examples of Good vs Basic Field Names​

Example with Descriptive Fields​

Basic Example​

Example Response​

Advanced Example with Task Description​

Example Response​

Response Format​

POST /scrape

Authentication

Request Parameters

Schema Types

Schema Best Practices

Examples of Good vs Basic Field Names

Example with Descriptive Fields

Basic Example

Example Response

Advanced Example with Task Description

Example Response

Response Format