This feature provides tools to transform your data into a format ready for Large Language Models (LLMs). It offers three concrete implementations to handle different data source types:
WrapDataFrame – For transforming a pandas DataFrame.WrapDataArray – For transforming a 2D array-like structure (e.g., a NumPy array or a list of lists).WrapPromptList – For transforming a list of prompts (a 1D list of strings).The module is designed to wrap each row of data into an XML-like structure that LLMs can easily consume. For every row, the following transformation is applied:
<data> tag containing individual <cell> (or custom tag names) elements.<prompt> tag.The module also provides utility methods to preview the transformed data and export it to CSV, JSON, or Excel formats.
This is an abstract class that provides the core functionality:
wrap(): Returns the LLM-ready DataFrame.transform_and_export(): Exports the transformed DataFrame to a file (supports CSV, JSON, and Excel).preview(): Prints a preview of the wrapped data.WrapDataFrame is used when your data is in a pandas DataFrame. It allows you to:
<cell> tags.df (DataFrame): The input DataFrame.prompt_column (str): The column name in df that contains the prompt."prompt_column"data_columns (Optional[List[str]]): A list of column names to wrap as data. If not provided, all columns except the prompt column are used.use_column_header (bool): If True, each cell is wrapped with its column header as the tag name.Falsecolumn_header_index (int): Row Index for applying column headers.
Default: 0import pandas as pd
from llmworkbook import WrapDataFrame
# Sample DataFrame with data and prompt columns
df = pd.DataFrame({
'data1': [10, 20, 30],
'data2': ['A', 'B', 'C'],
'prompt_column': ["Prompt 1", "Prompt 2", "Prompt 3"]
})
# Initialize the wrapper
wrapper_df = WrapDataFrame(
df,
prompt_column='prompt_column',
data_columns=['data1', 'data2'],
use_column_header=True,
column_header_index=0
)
# Generate the wrapped DataFrame
wrapped_output_df = wrapper_df.wrap()
# Preview the result
wrapper_df.preview()
WrapDataArray is designed to work with 2D array-like structures such as NumPy arrays or lists of lists. It:
arr (Union[np.ndarray, list]): The input 2D array.prompt_index (int): The column index in the array that contains the prompt.0data_indices (Optional[List[int]]): A list of column indices to wrap as data. If not provided, all columns except the prompt column are used.import numpy as np
from llmworkbook import WrapDataArray
# Sample 2D array (or list of lists)
data_array = np.array([
["Prompt A", 100, 200],
["Prompt B", 300, 400],
["Prompt C", 500, 600]
])
wrapper_array = WrapDataArray(
data_array,
prompt_index=0,
data_indices=[1, 2]
)
# Generate the wrapped output DataFrame
wrapped_output_array = wrapper_array.wrap()
# Preview the wrapped output
wrapper_array.preview()
WrapPromptList is a simple wrapper for when you only have a list of prompts (i.e., no associated data columns). It creates an empty DataFrame for data (to maintain row alignment) and wraps each prompt.
prompts (List[str]): A list of prompt strings.from llmworkbook import WrapPromptList
# Sample list of prompts
prompt_list = [
"How is the weather today?",
"What is the capital of France?",
"Summarize the following article."
]
# Initialize the wrapper
wrapper_prompts = WrapPromptList(prompt_list)
# Generate the wrapped DataFrame
wrapped_output_prompts = wrapper_prompts.wrap()
# Preview the wrapped output
wrapper_prompts.preview()
All wrapper classes (by inheritance) provide the following methods:
wrap()
Transforms the data/prompt input into a single-column DataFrame with wrapped XML-like content.
transform_and_export(file_path: str, file_format: str = "excel")file_path: Path to save the output.file_format: Output format – choose between "csv", "json", or "excel".preview(n: int = 5)n rows of the wrapped output for a quick look.# Assume wrapper_df is an instance of one of the wrapper classes (e.g., WrapDataFrame)
wrapper_df.transform_and_export("wrapped_data.xlsx", file_format="excel")