The unpack_json_responses
utility function in LLMWorkbook
is designed to extract structured data from JSON responses generated by an LLM. When the LLM returns responses in JSON format, this function helps unpack nested structures into individual DataFrame columns.
It supports handling errors, flattening nested JSON objects, and selectively extracting specific fields, making it a powerful tool for structured LLM outputs.
✔ Extracts JSON responses into separate columns within a DataFrame.
✔ Handles JSON parsing errors with configurable strategies (warn
, error
, ignore
).
✔ Flattens nested structures (configurable).
✔ Supports selective extraction of specific JSON fields.
✔ Option to add column prefixes for better clarity.
✔ Works with DataFrames, Lists, and NumPy arrays.
✔ Can retain or drop the original JSON column after extraction.
def unpack_json_responses(
df: pd.DataFrame,
response_column: str = "llm_response",
error_handling: str = "warn",
flatten_nested: bool = True,
prefix: Optional[str] = None,
columns_to_extract: Optional[List[str]] = None,
drop_original: bool = True
) -> pd.DataFrame:
"""
Unpacks JSON responses from a specified DataFrame column into separate columns.
Args:
df (pd.DataFrame): DataFrame containing the LLM responses in JSON format.
response_column (str): Column name where JSON responses are stored.
error_handling (str): Strategy for handling JSON parsing errors:
'warn' (default): Replace with None and log warning
'error': Raise exception
'ignore': Replace with None silently
flatten_nested (bool): Whether to flatten nested JSON structures (default: True)
prefix (str, optional): Prefix to add to all extracted columns
columns_to_extract (List[str], optional): Specific fields to extract from JSON,
extracts all if None
drop_original (bool): Whether to drop the original JSON column (default: True)
Returns:
pd.DataFrame: DataFrame with unpacked JSON response columns.
"""
"warn"
(default) → Logs warnings for JSON errors but replaces invalid entries with None
."error"
→ Raises an exception for any JSON parsing failure."ignore"
→ Silently replaces invalid JSON values with None
.columns_to_extract
is specified, extracts only the given fields.flatten_nested=True
, deeply nested JSON fields are flattened into column names.drop_original=True
.import pandas as pd
from llmworkbook import unpack_json_responses
# Sample DataFrame with LLM responses in JSON format
df = pd.DataFrame({
"id": [1, 2, 3],
"llm_response": [
'{"name": "Alice", "age": 30, "city": "New York"}',
'{"name": "Bob", "age": 25, "city": "San Francisco"}',
'{"name": "Charlie", "age": 35, "city": "Los Angeles"}'
]
})
# Unpack JSON responses
clean_df = unpack_json_responses(df)
print(clean_df)
Output:
id name age city
0 1 Alice 30 New York
1 2 Bob 25 San Francisco
2 3 Charlie 35 Los Angeles
✔ JSON fields (name
, age
, city
) are extracted into separate columns.
✔ The original llm_response
column is dropped by default.
clean_df = unpack_json_responses(df, drop_original=False)
print(clean_df)
✔ Keeps the llm_response
column while extracting values.
clean_df = unpack_json_responses(df, columns_to_extract=["name", "city"])
print(clean_df)
Output:
id name city
0 1 Alice New York
1 2 Bob San Francisco
2 3 Charlie Los Angeles
✔ Only name
and city
fields are extracted.
✔ The age
field is ignored.
df = pd.DataFrame({
"llm_response": [
'{"user": {"name": "Alice", "info": {"age": 30, "city": "New York"}}}',
'{"user": {"name": "Bob", "info": {"age": 25, "city": "San Francisco"}}}'
]
})
clean_df = unpack_json_responses(df, flatten_nested=True)
print(clean_df)
Output:
user.name user.info.age user.info.city
0 Alice 30 New York
1 Bob 25 San Francisco
✔ Nested fields are flattened into column names (user.name
, user.info.age
, etc.).
df = pd.DataFrame({
"llm_response": [
'{"valid_json": "Yes"}',
"INVALID JSON STRING",
'{"valid_json": "Yes"}'
]
})
# Raise error on invalid JSON
try:
unpack_json_responses(df, error_handling="error")
except Exception as e:
print(f"Error: {e}")
# Ignore errors and replace invalid JSON with None
clean_df = unpack_json_responses(df, error_handling="ignore")
print(clean_df)
✔ Invalid JSON values are replaced with None
or raise errors based on settings.
from llmworkbook import LLMConfig, LLMRunner, LLMDataFrameIntegrator, unpack_json_responses
# 1. Create DataFrame with LLM prompts
df = pd.DataFrame({
"id": [1, 2],
"prompt_text": [
"Extract details from this sentence: 'Alice, aged 30, lives in New York.'",
"Extract details from this sentence: 'Bob, aged 25, lives in San Francisco.'"
]
})
# 2. Configure LLM to return JSON
config = LLMConfig(
provider="openai",
system_prompt="Process the prompt and return structured JSON.",
options={"model": "gpt-4o-mini", "response_format": {"type": "json_object"}}
)
# 3. Run LLM
runner = LLMRunner(config)
integrator = LLMDataFrameIntegrator(runner=runner, df=df)
df = integrator.add_llm_responses(prompt_column="prompt_text", response_column="llm_response")
# 4. Unpack JSON responses
clean_df = unpack_json_responses(df)
print(clean_df)
✔ Ensures structured extraction of LLM-generated JSON.
✔ Easily integrates with LLM workflows.
📌 Use unpack_json_responses
to quickly transform LLM JSON responses into a structured format for analysis or downstream tasks! 🚀