LLMWorkbook

unpack_json_responses Utility Documentation

Overview

The unpack_json_responses utility function in LLMWorkbook is designed to extract structured data from JSON responses generated by an LLM. When the LLM returns responses in JSON format, this function helps unpack nested structures into individual DataFrame columns.

It supports handling errors, flattening nested JSON objects, and selectively extracting specific fields, making it a powerful tool for structured LLM outputs.

Key Features

✔ Extracts JSON responses into separate columns within a DataFrame.
✔ Handles JSON parsing errors with configurable strategies (warn, error, ignore).
✔ Flattens nested structures (configurable).
✔ Supports selective extraction of specific JSON fields.
✔ Option to add column prefixes for better clarity.
✔ Works with DataFrames, Lists, and NumPy arrays.
✔ Can retain or drop the original JSON column after extraction.

Function Definition

def unpack_json_responses(
    df: pd.DataFrame, 
    response_column: str = "llm_response",
    error_handling: str = "warn",
    flatten_nested: bool = True,
    prefix: Optional[str] = None,
    columns_to_extract: Optional[List[str]] = None,
    drop_original: bool = True
) -> pd.DataFrame:
    """
    Unpacks JSON responses from a specified DataFrame column into separate columns.
    
    Args:
        df (pd.DataFrame): DataFrame containing the LLM responses in JSON format.
        response_column (str): Column name where JSON responses are stored.
        error_handling (str): Strategy for handling JSON parsing errors:
                             'warn' (default): Replace with None and log warning
                             'error': Raise exception
                             'ignore': Replace with None silently
        flatten_nested (bool): Whether to flatten nested JSON structures (default: True)
        prefix (str, optional): Prefix to add to all extracted columns
        columns_to_extract (List[str], optional): Specific fields to extract from JSON, 
                                                 extracts all if None
        drop_original (bool): Whether to drop the original JSON column (default: True)

    Returns:
        pd.DataFrame: DataFrame with unpacked JSON response columns.
    """

How It Works

Parses JSON Strings in the Response Column
- Attempts to convert JSON-formatted strings into Python dictionaries.
- Handles missing or non-JSON values gracefully.
Handles Errors Based on the User’s Preference
- "warn" (default) → Logs warnings for JSON errors but replaces invalid entries with None.
- "error" → Raises an exception for any JSON parsing failure.
- "ignore" → Silently replaces invalid JSON values with None.
Extracts & Normalizes Data
- If columns_to_extract is specified, extracts only the given fields.
- Otherwise, extracts all available fields.
- If flatten_nested=True, deeply nested JSON fields are flattened into column names.
Merges Extracted Data into the DataFrame
- Adds new columns to the original DataFrame.
- Optionally drops the original JSON column if drop_original=True.

Usage Examples

Example 1: Basic JSON Unpacking

import pandas as pd
from llmworkbook import unpack_json_responses

# Sample DataFrame with LLM responses in JSON format
df = pd.DataFrame({
    "id": [1, 2, 3],
    "llm_response": [
        '{"name": "Alice", "age": 30, "city": "New York"}',
        '{"name": "Bob", "age": 25, "city": "San Francisco"}',
        '{"name": "Charlie", "age": 35, "city": "Los Angeles"}'
    ]
})

# Unpack JSON responses
clean_df = unpack_json_responses(df)
print(clean_df)

Output:

   id      name  age           city
 1    Alice   30     New York
 2      Bob   25  San Francisco
 3  Charlie   35  Los Angeles

✔ JSON fields (name, age, city) are extracted into separate columns.
✔ The original llm_response column is dropped by default.

Example 2: Retaining the Original JSON Column

clean_df = unpack_json_responses(df, drop_original=False)
print(clean_df)

✔ Keeps the llm_response column while extracting values.

Example 3: Extracting Only Specific Fields

clean_df = unpack_json_responses(df, columns_to_extract=["name", "city"])
print(clean_df)

Output:

   id      name           city
 1    Alice     New York
 2      Bob  San Francisco
 3  Charlie  Los Angeles

✔ Only name and city fields are extracted.
✔ The age field is ignored.

Example 4: Handling Nested JSON

df = pd.DataFrame({
    "llm_response": [
        '{"user": {"name": "Alice", "info": {"age": 30, "city": "New York"}}}',
        '{"user": {"name": "Bob", "info": {"age": 25, "city": "San Francisco"}}}'
    ]
})

clean_df = unpack_json_responses(df, flatten_nested=True)
print(clean_df)

Output:

   user.name  user.info.age      user.info.city
0     Alice             30          New York
1       Bob             25     San Francisco

✔ Nested fields are flattened into column names (user.name, user.info.age, etc.).

Example 5: Handling JSON Parsing Errors

df = pd.DataFrame({
    "llm_response": [
        '{"valid_json": "Yes"}',
        "INVALID JSON STRING",
        '{"valid_json": "Yes"}'
    ]
})

# Raise error on invalid JSON
try:
    unpack_json_responses(df, error_handling="error")
except Exception as e:
    print(f"Error: {e}")

# Ignore errors and replace invalid JSON with None
clean_df = unpack_json_responses(df, error_handling="ignore")
print(clean_df)

✔ Invalid JSON values are replaced with None or raise errors based on settings.

Example 6: Using with LLM Integration

from llmworkbook import LLMConfig, LLMRunner, LLMDataFrameIntegrator, unpack_json_responses

# 1. Create DataFrame with LLM prompts
df = pd.DataFrame({
    "id": [1, 2],
    "prompt_text": [
        "Extract details from this sentence: 'Alice, aged 30, lives in New York.'",
        "Extract details from this sentence: 'Bob, aged 25, lives in San Francisco.'"
    ]
})

# 2. Configure LLM to return JSON
config = LLMConfig(
    provider="openai",
    system_prompt="Process the prompt and return structured JSON.",
    options={"model": "gpt-4o-mini", "response_format": {"type": "json_object"}}
)

# 3. Run LLM
runner = LLMRunner(config)
integrator = LLMDataFrameIntegrator(runner=runner, df=df)

df = integrator.add_llm_responses(prompt_column="prompt_text", response_column="llm_response")

# 4. Unpack JSON responses
clean_df = unpack_json_responses(df)

print(clean_df)

✔ Ensures structured extraction of LLM-generated JSON.
✔ Easily integrates with LLM workflows.

📌 Use unpack_json_responses to quickly transform LLM JSON responses into a structured format for analysis or downstream tasks! 🚀

This site is open source. Improve this page.