Structured Output

lightsong發表於2024-11-28

原文網址 : https://www.cnblogs.com/lightsong/p/18575408

Structured Output

https://python.langchain.com/v0.1/docs/modules/model_io/chat/structured_output/

It is often crucial to have LLMs return structured output. This is because oftentimes the outputs of the LLMs are used in downstream applications, where specific arguments are required. Having the LLM return structured output reliably is necessary for that.

There are a few different high level strategies that are used to do this:

Prompting: This is when you ask the LLM (very nicely) to return output in the desired format (JSON, XML). This is nice because it works with all LLMs. It is not nice because there is no guarantee that the LLM returns the output in the right format.

Function calling: This is when the LLM is fine-tuned to be able to not just generate a completion, but also generate a function call. The functions the LLM can call are generally passed as extra parameters to the model API. The function names and descriptions should be treated as part of the prompt (they usually count against token counts, and are used by the LLM to decide what to do).

Tool calling: A technique similar to function calling, but it allows the LLM to call multiple functions at the same time.

JSON mode: This is when the LLM is guaranteed to return JSON.

Different models may support different variants of these, with slightly different parameters. In order to make it easy to get LLMs to return structured output, we have added a common interface to LangChain models: .with_structured_output.

By invoking this method (and passing in a JSON schema or a Pydantic model) the model will add whatever model parameters + output parsers are necessary to get back the structured output. There may be more than one way to do this (e.g., function calling vs JSON mode) - you can configure which method to use by passing into that method.

https://github.com/horosin/langchain-json-output-python/blob/main/main.py

import os

from langchain.prompts import ChatPromptTemplate, HumanMessagePromptTemplate
from langchain.chat_models import ChatOpenAI
from langchain.output_parsers import PydanticOutputParser, OutputFixingParser
from langchain.schema import OutputParserException

from pydantic import BaseModel, Field
from typing import List


# Define a new Pydantic model with field descriptions and tailored for Twitter.
class TwitterUser(BaseModel):
    name: str = Field(description="Full name of the user.")
    handle: str = Field(description="Twitter handle of the user, without the '@'.")
    age: int = Field(description="Age of the user.")
    hobbies: List[str] = Field(description="List of hobbies of the user.")
    email: str = Field(description="Email address of the user.")
    bio: str = Field(description="Bio or short description about the user.")
    location: str = Field(description="Location or region where the user resides.")
    is_blue_badge: bool = Field(
        description="Boolean indicating if the user has a verified blue badge."
    )
    joined: str = Field(description="Date the user joined Twitter.")
    gender: str = Field(description="Gender of the user.")
    appearance: str = Field(description="Physical description of the user.")
    avatar_prompt: str = Field(
        description="Prompt for generating a photorealistic avatar image. The image should capture the essence of the user's appearance description, ideally in a setting that aligns with their interests or bio. Use professional equipment to ensure high quality and fine details."
    )
    banner_prompt: str = Field(
        description="Prompt for generating a banner image. This image should represent the user's hobbies, interests, or the essence of their bio. It should be high-resolution and captivating, suitable for a Twitter profile banner."
    )


# Instantiate the parser with the new model.
parser = PydanticOutputParser(pydantic_object=TwitterUser)

# Update the prompt to match the new query and desired format.
prompt = ChatPromptTemplate(
    messages=[
        HumanMessagePromptTemplate.from_template(
            "answer the users question as best as possible.\n{format_instructions}\n{question}"
        )
    ],
    input_variables=["question"],
    partial_variables={
        "format_instructions": parser.get_format_instructions(),
    },
)

# Generate the input using the updated prompt.
user_query = (
    "Generate a detailed Twitter profile of a random realistic user with a diverse background, "
    "from any country in the world, original name, including prompts for images. Come up with "
    "real name, never use most popular placeholders like john smith and john doe."
)

chat_model = ChatOpenAI(
    model="gpt-3.5-turbo", openai_api_key=os.getenv("OPENAI_API_KEY"), max_tokens=1000
)

if __name__ == "__main__":
    _input = prompt.format_prompt(question=user_query)
    output = chat_model(_input.to_messages())

    # Parse and fix output if necessary.
    try:
        parsed = parser.parse(output.content)
    except OutputParserException as e:
        print(e)
        new_parser = OutputFixingParser.from_llm(parser=parser, llm=ChatOpenAI())
        parsed = new_parser.parse(output.content)
        print("Fixed parsing errors.")

    print(parsed)

https://python.langchain.com/v0.1/docs/modules/agents/how_to/agent_structured/

https://github.com/sunny2309/langchain_tutorials/blob/main/Structured%20Output%20from%20LLMs.ipynb

https://github.com/instructor-ai/instructor

Instructor makes it easy to get structured data like JSON from LLMs like GPT-3.5, GPT-4, GPT-4-Vision, and open-source models including Mistral/Mixtral, Ollama, and llama-cpp-python.

It stands out for its simplicity, transparency, and user-centric design, built on top of Pydantic. Instructor helps you manage validation context, retries with Tenacity, and streaming Lists and Partial responses.

import instructor
from pydantic import BaseModel
from openai import OpenAI


# Define your desired output structure
class UserInfo(BaseModel):
    name: str
    age: int


# Patch the OpenAI client
client = instructor.from_openai(OpenAI())

# Extract structured data from natural language
user_info = client.chat.completions.create(
    model="gpt-4o-mini",
    response_model=UserInfo,
    messages=[{"role": "user", "content": "John Doe is 30 years old."}],
)

print(user_info.name)
#> John Doe
print(user_info.age)
#> 30

https://cookbook.openai.com/examples/structured_outputs_intro

from enum import Enum
from typing import Union
import openai

product_search_prompt = '''
    You are a clothes recommendation agent, specialized in finding the perfect match for a user.
    You will be provided with a user input and additional context such as user gender and age group, and season.
    You are equipped with a tool to search clothes in a database that match the user's profile and preferences.
    Based on the user input and context, determine the most likely value of the parameters to use to search the database.
    
    Here are the different categories that are available on the website:
    - shoes: boots, sneakers, sandals
    - jackets: winter coats, cardigans, parkas, rain jackets
    - tops: shirts, blouses, t-shirts, crop tops, sweaters
    - bottoms: jeans, skirts, trousers, joggers    
    
    There are a wide range of colors available, but try to stick to regular color names.
'''

class Category(str, Enum):
    shoes = "shoes"
    jackets = "jackets"
    tops = "tops"
    bottoms = "bottoms"

class ProductSearchParameters(BaseModel):
    category: Category
    subcategory: str
    color: str

def get_response(user_input, context):
    response = client.chat.completions.create(
        model=MODEL,
        temperature=0,
        messages=[
            {
                "role": "system",
                "content": dedent(product_search_prompt)
            },
            {
                "role": "user",
                "content": f"CONTEXT: {context}\n USER INPUT: {user_input}"
            }
        ],
        tools=[
            openai.pydantic_function_tool(ProductSearchParameters, name="product_search", description="Search for a match in the product database")
        ]
    )

    return response.choices[0].message.tool_calls

Hive（structured | semi-structured | unstructured）
2018-05-31
HiveStruct
Spark Structured Streaming 解析 JSON
2018-09-14
SparkStructJSON
diff output understanding
2022-09-14
LSM(Log Structured Merge Trees ) 筆記
2021-01-24
Struct筆記
angular input和output
2019-03-11
Angular
DML_The OUTPUT Clause
2020-06-14
Spark 系列（九）—— Spark SQL 之 Structured API
2019-08-13
SparkSQLStructAPI
HTML <output> 輸出域
2018-11-13
HTML
OpenVINO（get_output_tensor()）
2024-10-30
spark structured-streaming 最全的使用總結
2021-11-06
SparkStruct
C++17: 結構化繫結（Structured Bindings）
2024-03-29
C++Struct
mount error(5): Input/output error
2019-03-18
Error
kettle MongoDB Output 配置說明
2024-03-07
MongoDB
zabbix報錯fping failed：no output
2024-03-20
AI
SQL（Structured Query Language，結構化查詢語言）
2024-11-10
SQLStruct
SqlConnection，command基本用法，output，輸出
2018-08-01
SQL
解決IDEA Error:Output directory is not specified
2024-03-25
IdeaError
第10講：Flink Side OutPut 分流
2022-02-12
IDE
MapReduce--Input與Output規則
2020-12-04
input delay和output delay講解
2020-12-03
Spark學習進度11-Spark Streaming&Structured Streaming
2021-01-15
SparkStruct
How to use “cat” command on “find” command's output?
2019-03-08
AndroidStudio Build Output亂碼解決方案
2020-12-28
AndroidUI
sql server 帶有OUTPUT的INSERT,DELETE,UPDATE
2018-05-29
SQLServerdelete
Razavi - RF Microelectronics的筆記 - Differential Output Current
2024-05-06
筆記
輕鬆理解 Transformers (4) ：Decoder 和 Output 部分
2023-11-13
ORM
MySQL 5.6 遭遇 OS bug INNODB MONITOR OUTPUT 事件
2019-02-19
MySql事件
[TinyRenderer] Chapter1 p1 Output Image
2024-06-10
APT
2018-07-28-論文閱讀（1）-Learning Ensembled for Structured Prediction Rules
2018-07-28
Struct
Cascade-LSTM: A Tree-Structured Neural Classifier for Detecting Misinformation Cascades(KDD20)
2022-07-07
StructORM
【AD報錯】GND contains Output Pin and Power Pin objects
2024-10-18
AIObject
[GAUSS-51808] : The env file contains errmsg: {'Node[192.168.56.181]': 'Output:
2022-11-27
AI
MapReduce —— MapTask階段原始碼分析（Output環節）
2021-06-11
APT原始碼
[20190603]關於dbms_output輸出問題.txt
2019-06-06
ValueError: output parameter for reduction operation logical_and has too many dimensions ？
2018-03-19
Error
SAP SD基礎知識之輸出控制(Output Control)
2020-02-10
【解決DML 語句包含不帶 INTO 子句的 OUTPUT 子句】
2024-08-16
[20211231]set linesize and dbms_output.line輸出問題.txt
2021-12-31

Structured Output

Structured Output

相關文章