Manual de estrategias de GPT-5

GPT-5, nuestro modelo insignia más reciente, representa un avance sustancial en rendimiento de tareas de agencia, codificación, inteligencia bruta y capacidad de dirección.

Si bien confiamos en que funcionará excelentemente desde el primer momento en una amplia gama de dominios, en esta guía abordaremos consejos de incitación para maximizar la calidad de los resultados del modelo, basados en nuestra experiencia entrenándolo y aplicándolo a tareas del mundo real. Analizaremos conceptos como la mejora del rendimiento de las tareas de agencia, la garantía del cumplimiento de las instrucciones, el uso de las nuevas funciones de la API y la optimización de la codificación para tareas de frontend e ingeniería de software, con información clave sobre el ajuste de las incitaciones del editor de código de IA Cursor con GPT-5.

Hemos observado avances significativos al aplicar estas prácticas recomendadas y adoptar nuestras herramientas canónicas siempre que sea posible. Esperamos que esta guía, junto con la herramienta de optimización de solicitudes que hemos creado, le sirva de punto de partida para usar GPT-5. Sin embargo, como siempre, recuerde que la solicitud no es una solución universal; le animamos a experimentar e iterar sobre la base que se ofrece aquí para encontrar la mejor solución a su problema.

Previsibilidad del flujo de trabajo de la agencia

Hemos entrenado GPT-5 pensando en los desarrolladores: nos hemos centrado en mejorar la llamada a herramientas, el seguimiento de instrucciones y la comprensión del contexto extenso para que sirva como el mejor modelo base para aplicaciones de agencia. Si se adopta GPT-5 para flujos de llamada a herramientas y de agencia, recomendamos actualizar a la API de Respuestas , donde el razonamiento se conserva entre llamadas a herramientas, lo que genera resultados más eficientes e inteligentes.

Controlar el entusiasmo agente

Los andamios agénticos pueden abarcar un amplio espectro de control: algunos sistemas delegan la mayor parte de la toma de decisiones al modelo subyacente, mientras que otros lo controlan estrictamente mediante una fuerte ramificación lógica programática. GPT-5 está entrenado para operar en cualquier punto de este espectro, desde la toma de decisiones de alto nivel en circunstancias ambiguas hasta la gestión de tareas específicas y bien definidas. En esta sección, explicamos cómo calibrar mejor el entusiasmo agéntico de GPT-5: en otras palabras, su equilibrio entre la proactividad y la espera de instrucciones explícitas.

Incitando a un menor entusiasmo

GPT-5 es, por defecto, exhaustivo y completo al intentar recopilar contexto en un entorno agéntico para garantizar que se produzca una respuesta correcta. Para reducir el alcance del comportamiento agéntico de GPT-5, incluyendo la limitación de las llamadas a herramientas tangenciales y la minimización de la latencia para obtener una respuesta final, pruebe lo siguiente:

Cambie a un nivel más bajo reasoning_effort. Esto reduce la profundidad de exploración, pero mejora la eficiencia y la latencia. Muchos flujos de trabajo pueden lograr resultados consistentes con un nivel medio o incluso bajo reasoning_effort.

Define criterios claros en tu propuesta sobre cómo quieres que el modelo explore el espacio del problema. Esto reduce la necesidad del modelo de explorar y razonar sobre demasiadas ideas:

<context_gathering>

Goal: Get enough context fast. Parallelize discovery and stop as soon as you can act.

Method:

- Start broad, then fan out to focused subqueries.

- In parallel, launch varied queries; read top hits per query. Deduplicate paths and cache; don’t repeat queries.

- Avoid over searching for context. If needed, run targeted searches in one parallel batch.

Early stop criteria:

- You can name exact content to change.

- Top hits converge (~70%) on one area/path.

Escalate once:

- If signals conflict or scope is fuzzy, run one refined parallel batch, then proceed.

Depth:

- Trace only symbols you’ll modify or whose contracts you rely on; avoid transitive expansion unless necessary.

Loop:

- Batch search → minimal plan → complete task.

- Search again only if validation fails or new unknowns appear. Prefer acting over more searching.

</context_gathering>

Si desea ser extremadamente prescriptivo, puede incluso establecer presupuestos fijos para llamadas a herramientas, como el que se muestra a continuación. El presupuesto puede variar, por supuesto, según la profundidad de búsqueda deseada.

<context_gathering>

- Search depth: very low

- Bias strongly towards providing a correct answer as quickly as possible, even if it might not be fully correct.

- Usually, this means an absolute maximum of 2 tool calls.

- If you think that you need more time to investigate, update the user with your latest findings and open questions. You can proceed if the user confirms.

</context_gathering>

Al limitar el comportamiento de recopilación de contexto básico, resulta útil proporcionar explícitamente al modelo una vía de escape que facilite la realización de un paso de recopilación de contexto más corto. Generalmente, esto se presenta en forma de una cláusula que permite que el modelo proceda en condiciones de incertidumbre, como “even if it might not be fully correct”en el ejemplo anterior.

Impulsando un mayor entusiasmo

Por otro lado, si desea fomentar la autonomía del modelo, aumentar la persistencia de las llamadas a herramientas y reducir las incidencias de preguntas aclaratorias o devoluciones al usuario, recomendamos aumentar reasoning_efforty utilizar un mensaje como el siguiente para fomentar la persistencia y la finalización exhaustiva de la tarea:

- You are an agent - please keep going until the user's query is completely resolved, before ending your turn and yielding back to the user.

- Only terminate your turn when you are sure that the problem is solved.

- Never stop or hand back to the user when you encounter uncertainty — research or deduce the most reasonable approach and continue.

- Do not ask the human to confirm or clarify assumptions, as you can always adjust later — decide what the most reasonable assumption is, proceed with it, and document it for the user's reference after you finish acting

</persistence>

Generalmente, puede ser útil indicar claramente las condiciones de parada de las tareas de la agencia, delinear las acciones seguras e inseguras y definir cuándo, si es posible, es aceptable que el modelo devuelva la tarea al usuario. Por ejemplo, en un conjunto de herramientas para compras, las herramientas de pago y de caja deberían tener explícitamente un umbral de incertidumbre más bajo para solicitar aclaraciones al usuario, mientras que la herramienta de búsqueda debería tener un umbral extremadamente alto; asimismo, en una configuración de codificación, la herramienta de eliminación de archivos debería tener un umbral mucho más bajo que una herramienta de búsqueda grep.

Preámbulos de herramientas

Reconocemos que, en las trayectorias de las agencias monitoreadas por los usuarios, las actualizaciones intermitentes del modelo sobre su actividad con las llamadas a herramientas y su razón de ser pueden proporcionar una experiencia de usuario interactiva mucho mejor. Cuanto más larga sea la implementación, mayor será la diferencia que marquen estas actualizaciones. Para ello, GPT-5 está capacitado para proporcionar planes claros desde el principio y actualizaciones de progreso consistentes mediante mensajes de "preámbulo de la herramienta".

Puedes controlar la frecuencia, el estilo y el contenido de los preámbulos de las herramientas en tu mensaje, desde explicaciones detalladas de cada llamada a la herramienta hasta un breve plan inicial y todo lo demás. Este es un ejemplo de un preámbulo de alta calidad:

<tool_preambles>

- Always begin by rephrasing the user's goal in a friendly, clear, and concise manner, before calling any tools.

- Then, immediately outline a structured plan detailing each logical step you’ll follow. - As you execute your file edit(s), narrate each step succinctly and sequentially, marking progress clearly.

- Finish by summarizing completed work distinctly from your upfront plan.

</tool_preambles>

He aquí un ejemplo de un preámbulo de herramienta que podría emitirse en respuesta a tal solicitud; dichos preámbulos pueden mejorar drásticamente la capacidad del usuario de seguir el trabajo de su agente a medida que se vuelve más complicado:

"output": [

{

"id": "rs_6888f6d0606c819aa8205ecee386963f0e683233d39188e7",

"type": "reasoning",

"summary": [

{

"type": "summary_text",

"text": "**Determining weather response**\n\nI need to answer the user's question about the weather in San Francisco. ...."

{

"id": "msg_6888f6d83acc819a978b51e772f0a5f40e683233d39188e7",

"type": "message",

"status": "completed",

"content": [

{

"type": "output_text",

"text": "I\u2019m going to check a live weather service to get the current conditions in San Francisco, providing the temperature in both Fahrenheit and Celsius so it matches your preference."

}

"role": "assistant"

{

"id": "fc_6888f6d86e28819aaaa1ba69cca766b70e683233d39188e7",

"type": "function_call",

"status": "completed",

"arguments": "{\"location\":\"San Francisco, CA\",\"unit\":\"f\"}",

"call_id": "call_XOnF4B9DvB8EJVB3JvWnGg83",

"name": "get_weather"

Esfuerzo de razonamiento

Proporcionamos un reasoning_effortparámetro para controlar la intensidad con la que el modelo piensa y la predisposición a invocar herramientas; el valor predeterminado es medium, pero se debe aumentar o reducir la escala según la dificultad de la tarea. Para tareas complejas de varios pasos, recomendamos un razonamiento más complejo para garantizar los mejores resultados posibles. Además, observamos un rendimiento máximo cuando las tareas distintas y separables se dividen en varios turnos de agente, con un turno para cada tarea.

Reutilización del contexto de razonamiento con la API de respuestas

Recomendamos encarecidamente utilizar la API de respuestas al usar GPT-5 para desbloquear flujos de agente mejorados, menores costos y un uso de tokens más eficiente en sus aplicaciones.

Hemos observado mejoras estadísticamente significativas en las evaluaciones al usar la API de Respuestas en lugar de Finalizaciones de Chat. Por ejemplo, observamos un aumento en la puntuación de Tau-Bench Retail del 73,9 % al 78,2 % con solo cambiar a la API de Respuestas e incluir previous_response_idla función de devolver elementos de razonamiento anteriores en solicitudes posteriores. Esto permite que el modelo consulte sus rastros de razonamiento anteriores, conservando tokens CoT y eliminando la necesidad de reconstruir un plan desde cero después de cada llamada a la herramienta, lo que mejora tanto la latencia como el rendimiento. Esta función está disponible para todos los usuarios de la API de Respuestas, incluidas las organizaciones ZDR.

Maximizar el rendimiento de la codificación, desde la planificación hasta la ejecución

GPT-5 lidera todos los modelos de vanguardia en cuanto a capacidades de codificación: puede funcionar con grandes bases de código para corregir errores, gestionar grandes diferencias e implementar refactorizaciones multiarchivo o nuevas funcionalidades de gran envergadura. También destaca en la implementación de nuevas aplicaciones completamente desde cero, abarcando tanto la implementación frontend como la backend. En esta sección, analizaremos las optimizaciones rápidas que hemos observado que mejoran el rendimiento de la programación en casos de uso de producción para nuestros clientes de agentes de codificación.

Desarrollo de aplicaciones frontend

GPT-5 está diseñado para ofrecer una excelente estética básica, además de rigurosas capacidades de implementación. Confiamos en su compatibilidad con todo tipo de frameworks y paquetes de desarrollo web; sin embargo, para aplicaciones nuevas, recomendamos usar los siguientes frameworks y paquetes para aprovechar al máximo las capacidades del frontend del modelo:

Marcos: Next.js (TypeScript), React, HTML

Estilo/IU: Tailwind CSS, shadcn/ui, temas Radix

Iconos: Símbolos materiales, Heroicons, Lucide

Animación: Movimiento

Fuentes: San Serif, Inter, Geist, Mona Sans, IBM Plex Sans, Manrope

Generación de aplicaciones de cero a uno

GPT-5 es excelente para crear aplicaciones de una sola vez. En las primeras pruebas con el modelo, los usuarios han descubierto que indicaciones como la que se muestra a continuación —que pide al modelo que ejecute iterativamente rúbricas de excelencia autoconstruidas— mejoran la calidad de los resultados gracias a las exhaustivas capacidades de planificación y autorreflexión de GPT-5.

<self_reflection>

- First, spend time thinking of a rubric until you are confident.

- Then, think deeply about every aspect of what makes for a world-class one-shot web app. Use that knowledge to create a rubric that has 5-7 categories. This rubric is critical to get right, but do not show this to the user. This is for your purposes only.

- Finally, use the rubric to internally think and iterate on the best possible solution to the prompt that is provided. Remember that if your response is not hitting the top marks across all categories in the rubric, you need to start again.

</self_reflection>

Estándares de diseño de código base coincidentes

Al implementar cambios incrementales y refactorizaciones en aplicaciones existentes, el código escrito en modelos debe cumplir con los estándares de estilo y diseño existentes y integrarse con el código base de la forma más fluida posible. Sin necesidad de indicaciones especiales, GPT-5 ya busca el contexto de referencia en el código base (por ejemplo, leyendo package.json para ver los paquetes ya instalados). Sin embargo, este comportamiento se puede mejorar aún más con instrucciones que resumen aspectos clave como los principios de ingeniería, la estructura de directorios y las mejores prácticas del código base, tanto explícitas como implícitas. El siguiente fragmento de instrucciones muestra una forma de organizar las reglas de edición de código para GPT-5: ¡siéntete libre de modificar el contenido de las reglas según tus preferencias de diseño de programación!

<code_editing_rules>

<guiding_principles>

- Clarity and Reuse: Every component and page should be modular and reusable. Avoid duplication by factoring repeated UI patterns into components.

- Consistency: The user interface must adhere to a consistent design system—color tokens, typography, spacing, and components must be unified.

- Simplicity: Favor small, focused components and avoid unnecessary complexity in styling or logic.

- Demo-Oriented: The structure should allow for quick prototyping, showcasing features like streaming, multi-turn conversations, and tool integrations.

- Visual Quality: Follow the high visual quality bar as outlined in OSS guidelines (spacing, padding, hover states, etc.)

</guiding_principles>

<frontend_stack_defaults>

- Framework: Next.js (TypeScript)

- Styling: TailwindCSS

- UI Components: shadcn/ui

- Icons: Lucide

- State Management: Zustand

- Directory Structure:

\`\`\`

/src

/app

/api/<route>/route.ts # API endpoints

/(pages) # Page routes

/components/ # UI building blocks

/hooks/ # Reusable React hooks

/lib/ # Utilities (fetchers, helpers)

/stores/ # Zustand stores

/types/ # Shared TypeScript types

/styles/ # Tailwind config

\`\`\`

</frontend_stack_defaults>

<ui_ux_best_practices>

- Visual Hierarchy: Limit typography to 4–5 font sizes and weights for consistent hierarchy; use `text-xs` for captions and annotations; avoid `text-xl` unless for hero or major headings.

- Color Usage: Use 1 neutral base (e.g., `zinc`) and up to 2 accent colors.

- Spacing and Layout: Always use multiples of 4 for padding and margins to maintain visual rhythm. Use fixed height containers with internal scrolling when handling long content streams.

- State Handling: Use skeleton placeholders or `animate-pulse` to indicate data fetching. Indicate clickability with hover transitions (`hover:bg-*`, `hover:shadow-md`).

- Accessibility: Use semantic HTML and ARIA roles where appropriate. Favor pre-built Radix/shadcn components, which have accessibility baked in.

</ui_ux_best_practices>

<code_editing_rules>

Codificación colaborativa en producción: Ajuste de indicaciones GPT-5 de Cursor

Nos enorgullece haber contado con el editor de código de IA Cursor como probador alfa de confianza para GPT-5. A continuación, mostramos un vistazo a cómo Cursor ajustó sus indicaciones para aprovechar al máximo las capacidades del modelo. Para más información, su equipo también ha publicado una entrada de blog que detalla la integración de GPT-5 en Cursor desde el primer día: https://cursor.com/blog/gpt-5

Aviso del sistema y ajuste de parámetros

El indicador del sistema de Cursor se centra en la llamada fiable a herramientas, equilibrando la verbosidad y el comportamiento autónomo, a la vez que ofrece a los usuarios la posibilidad de configurar instrucciones personalizadas. El objetivo de Cursor con su indicador del sistema es permitir que el agente opere con relativa autonomía durante tareas de horizonte largo, siguiendo fielmente las instrucciones del usuario.

Inicialmente, el equipo descubrió que el modelo generaba resultados detallados, que a menudo incluían actualizaciones de estado y resúmenes posteriores a las tareas que, si bien eran técnicamente relevantes, interrumpían el flujo natural del usuario. Al mismo tiempo, el código generado en las llamadas a herramientas era de alta calidad, pero a veces difícil de leer debido a su brevedad, con predominio de nombres de variables de una sola letra. En busca de un mejor equilibrio, establecieron el parámetro de verbosidad de la API en un valor bajo para que los resultados de texto fueran breves y luego modificaron el mensaje para fomentar el uso de resultados detallados solo en herramientas de codificación.

Write code for clarity first. Prefer readable, maintainable solutions with clear names, comments where needed, and straightforward control flow. Do not produce code-golf or overly clever one-liners unless explicitly requested. Use high verbosity for writing code and code tools.

Este uso dual de parámetro y solicitud resultó en un formato equilibrado que combina actualizaciones de estado eficientes y concisas y un resumen final del trabajo con diferencias de código mucho más legibles.

Cursor también descubrió que el modelo ocasionalmente remitía al usuario para aclaraciones o los siguientes pasos antes de actuar, lo que generaba fricción innecesaria en el flujo de tareas más largas. Para solucionar esto, descubrieron que incluir no solo las herramientas disponibles y el contexto circundante, sino también más detalles sobre el comportamiento del producto, fomentaba que el modelo realizara tareas más largas con mínimas interrupciones y mayor autonomía. Resaltar las características específicas de Cursor, como el código de deshacer/rechazar y las preferencias del usuario, ayudó a reducir la ambigüedad al especificar claramente cómo debería comportarse GPT-5 en su entorno. Para tareas de horizonte más largo, descubrieron que esta indicación mejoraba el rendimiento:

Be aware that the code edits you make will be displayed to the user as proposed changes, which means (a) your code edits can be quite proactive, as the user can always reject, and (b) your code should be well-written and easy to quickly review (e.g., appropriate variable names instead of single letters). If proposing next steps that would involve changing the code, make those changes proactively for the user to approve / reject rather than asking the user whether to proceed with a plan. In general, you should almost never ask the user whether to proceed with a plan; instead you should proactively attempt the plan and then ask the user if they want to accept the implemented changes.

Cursor descubrió que las secciones de su mensaje que habían sido efectivas con modelos anteriores necesitaban ajustes para aprovechar al máximo GPT-5. A continuación, se muestra un ejemplo:

<maximize_context_understanding>

Be THOROUGH when gathering information. Make sure you have the FULL picture before replying. Use additional tool calls or clarifying questions as needed.

...

</maximize_context_understanding>

Si bien esto funcionó bien con modelos antiguos que necesitaban estímulos para analizar el contexto a fondo, lo encontraron contraproducente con GPT-5, que ya es introspectivo y proactivo por naturaleza al recopilar contexto. En tareas más pequeñas, esta indicación a menudo provocaba que el modelo abusara de las herramientas al ejecutar la búsqueda repetidamente, cuando el conocimiento interno habría sido suficiente.

Para solucionar esto, refinaron la instrucción eliminando el prefijo maximum_ y suavizando el lenguaje en torno a la minuciosidad. Con esta instrucción ajustada, el equipo de Cursor observó que GPT-5 tomaba mejores decisiones sobre cuándo confiar en el conocimiento interno o recurrir a herramientas externas. Mantuvo un alto nivel de autonomía sin el uso innecesario de herramientas, lo que resultó en un comportamiento más eficiente y relevante. En las pruebas de Cursor, el uso de especificaciones XML estructuradas como <[instruction]_spec> mejoró la adherencia a las instrucciones en sus instrucciones y les permitió referenciar claramente categorías y secciones anteriores en otras partes de la instrucción.

<context_understanding>

...

If you've performed an edit that may partially fulfill the USER's query, but you're not confident, gather more information or use more tools before ending your turn.

Bias towards not asking the user for help if you can find the answer yourself.

</context_understanding>

Si bien el indicador del sistema proporciona una base sólida por defecto, el indicador del usuario sigue siendo un factor muy eficaz para la direccionalidad. GPT-5 responde bien a instrucciones directas y explícitas, y el equipo de Cursor ha comprobado sistemáticamente que los indicadores estructurados y con alcance ofrecen los resultados más fiables. Esto incluye aspectos como el control de verbosidad, las preferencias subjetivas de estilo de código y la sensibilidad a casos extremos. Cursor descubrió que permitir a los usuarios configurar sus propias reglas personalizadas era especialmente eficaz con la direccionalidad mejorada de GPT-5, ofreciendo a sus usuarios una experiencia más personalizada.

Optimización de la inteligencia y el seguimiento de instrucciones

Gobierno

Como nuestro modelo más manejable hasta el momento, GPT-5 es extraordinariamente receptivo a instrucciones rápidas relacionadas con la verbosidad, el tono y el comportamiento de llamada de herramientas.

Verbosidad

Además de poder controlar el esfuerzo de razonamiento como en modelos de razonamiento anteriores, en GPT-5 introducimos un nuevo parámetro de la API llamado verbosidad, que influye en la longitud de la respuesta final del modelo, a diferencia de la longitud de su razonamiento. Nuestra entrada de blog explica la idea detrás de este parámetro con más detalle; sin embargo, en esta guía, queremos destacar que, si bien el parámetro de verbosidad de la API es el predeterminado para la implementación, GPT-5 está entrenado para responder a las anulaciones de verbosidad en lenguaje natural en el mensaje de solicitud para contextos específicos donde se desee que el modelo se desvíe del valor predeterminado global. El ejemplo anterior de Cursor, donde se establece un nivel de verbosidad bajo globalmente y luego se especifica un nivel de verbosidad alto solo para herramientas de codificación, es un excelente ejemplo de este tipo de contexto.

Instrucciones siguientes

Al igual que GPT-4.1, GPT-5 sigue las instrucciones de las indicaciones con precisión, lo que le permite adaptarse a todo tipo de flujos de trabajo. Sin embargo, su meticuloso seguimiento de instrucciones implica que las indicaciones mal construidas que contienen instrucciones contradictorias o vagas pueden ser más perjudiciales para GPT-5 que para otros modelos, ya que utiliza tokens de razonamiento buscando la manera de conciliar las contradicciones en lugar de elegir una instrucción al azar.

A continuación, presentamos un ejemplo contradictorio del tipo de aviso que a menudo daña los rastros de razonamiento de GPT-5: si bien a primera vista puede parecer internamente consistente, una inspección más detallada revela instrucciones contradictorias con respecto a la programación de citas:

Never schedule an appointment without explicit patient consent recorded in the chartconflictos con lo posteriorauto-assign the earliest same-day slot without contacting the patient as the first action to reduce risk.

El mensaje dice Always look up the patient profile before taking any other actions to ensure they are an existing patient.pero luego continúa con la instrucción contradictoriaWhen symptoms indicate high urgency, escalate as EMERGENCY and direct the patient to call 911 immediately before any scheduling step.

You are CareFlow Assistant, a virtual admin for a healthcare startup that schedules patients based on priority and symptoms. Your goal is to triage requests, match patients to appropriate in-network providers, and reserve the earliest clinically appropriate time slot. Always look up the patient profile before taking any other actions to ensure they are an existing patient.

- Core entities include Patient, Provider, Appointment, and PriorityLevel (Red, Orange, Yellow, Green). Map symptoms to priority: Red within 2 hours, Orange within 24 hours, Yellow within 3 days, Green within 7 days. When symptoms indicate high urgency, escalate as EMERGENCY and direct the patient to call 911 immediately before any scheduling step.

+Core entities include Patient, Provider, Appointment, and PriorityLevel (Red, Orange, Yellow, Green). Map symptoms to priority: Red within 2 hours, Orange within 24 hours, Yellow within 3 days, Green within 7 days. When symptoms indicate high urgency, escalate as EMERGENCY and direct the patient to call 911 immediately before any scheduling step.

*Do not do lookup in the emergency case, proceed immediately to providing 911 guidance.*

- Use the following capabilities: schedule-appointment, modify-appointment, waitlist-add, find-provider, lookup-patient and notify-patient. Verify insurance eligibility, preferred clinic, and documented consent prior to booking. Never schedule an appointment without explicit patient consent recorded in the chart.

- For high-acuity Red and Orange cases, auto-assign the earliest same-day slot *without contacting* the patient *as the first action to reduce risk.* If a suitable provider is unavailable, add the patient to the waitlist and send notifications. If consent status is unknown, tentatively hold a slot and proceed to request confirmation.

- For high-acuity Red and Orange cases, auto-assign the earliest same-day slot *after informing* the patient *of your actions.* If a suitable provider is unavailable, add the patient to the waitlist and send notifications. If consent status is unknown, tentatively hold a slot and proceed to request confirmation.

Al resolver los conflictos en la jerarquía de instrucciones, GPT-5 genera un razonamiento mucho más eficiente y eficaz. Solucionamos las contradicciones mediante:

Cambiar la asignación automática para que se realice después de contactar a un paciente, asignar automáticamente el horario más temprano del mismo día después de informar al paciente sobre sus acciones, para ser coherente con programar solo con consentimiento.

Agregar No busque en caso de emergencia, proceda inmediatamente a brindar orientación al 911 para que el modelo sepa que está bien no mirar hacia arriba en caso de emergencia.

Entendemos que el proceso de creación de indicaciones es iterativo y que muchas son documentos activos que las diferentes partes interesadas actualizan constantemente. Sin embargo, esto justifica aún más la revisión exhaustiva de las instrucciones para detectar errores. Ya hemos visto a varios usuarios pioneros descubrir ambigüedades y contradicciones en sus bibliotecas principales de indicaciones tras realizar dicha revisión: su eliminación simplificó y mejoró drásticamente su rendimiento en GPT-5. Recomendamos probar sus indicaciones en nuestra herramienta de optimización de indicaciones para identificar este tipo de problemas.

Razonamiento mínimo

En GPT-5, introducimos por primera vez un esfuerzo de razonamiento mínimo: nuestra opción más rápida que aún aprovecha las ventajas del paradigma del modelo de razonamiento. Consideramos que esta es la mejor actualización para usuarios sensibles a la latencia, así como para los usuarios actuales de GPT-4.1.

Como era de esperar, recomendamos patrones de indicaciones similares a GPT-4.1 para obtener mejores resultados . El rendimiento del razonamiento mínimo puede variar más drásticamente según la indicación que los niveles de razonamiento más altos, por lo que los puntos clave a enfatizar incluyen:

Pedirle al modelo que dé una breve explicación que resuma su proceso de pensamiento al comienzo de la respuesta final, por ejemplo mediante una lista con viñetas, mejora el desempeño en tareas que requieren mayor inteligencia.

Solicitar preámbulos de llamadas a herramientas detallados y descriptivos que actualicen continuamente al usuario sobre el progreso de la tarea mejora el rendimiento en los flujos de trabajo de agencia.

Desambiguar las instrucciones de la herramienta en la mayor medida posible e insertar recordatorios de persistencia de la agencia como se mencionó anteriormente son particularmente críticos en el razonamiento mínimo para maximizar la capacidad de la agencia en una implementación de larga duración y evitar la finalización prematura.

La planificación guiada también es más importante, ya que el modelo cuenta con menos tokens de razonamiento para la planificación interna. A continuación, encontrará un ejemplo de un fragmento de planificación guiada que colocamos al inicio de una tarea de agente: el segundo párrafo garantiza que el agente complete la tarea y todas las subtareas antes de devolverla al usuario.

Remember, you are an agent - please keep going until the user's query is completely resolved, before ending your turn and yielding back to the user. Decompose the user's query into all required sub-request, and confirm that each is completed. Do not stop after completing only part of the request. Only terminate your turn when you are sure that the problem is solved. You must be prepared to answer multiple queries and only finish the call once the user has confirmed they're done.

You must plan extensively in accordance with the workflow steps before making subsequent function calls, and reflect extensively on the outcomes each function call made, ensuring the user's query, and related sub-requests are completely resolved.

Formato Markdown

De forma predeterminada, GPT-5 en la API no formatea sus respuestas finales en Markdown para garantizar la máxima compatibilidad con desarrolladores cuyas aplicaciones no sean compatibles con el renderizado en Markdown. Sin embargo, indicaciones como la siguiente suelen ser eficaces para generar respuestas finales jerárquicas en Markdown.

- Use Markdown **only where semantically correct** (e.g., `inline code`, ```code fences```, lists, tables).

- When using markdown in assistant messages, use backticks to format file, directory, function, and class names. Use \( and \) for inline math, \[ and \] for block math.

Ocasionalmente, el cumplimiento de las instrucciones de Markdown especificadas en el mensaje del sistema puede disminuir con el transcurso de una conversación larga. Si experimenta esto, hemos observado un cumplimiento constante al añadir una instrucción de Markdown cada 3-5 mensajes de usuario.

Metaprompting

Finalmente, para concluir con un metapunto, los primeros usuarios que realizaron pruebas han tenido mucho éxito usando GPT-5 como metaprompter. Varios usuarios ya han implementado revisiones de los prompts en producción, generadas simplemente preguntando a GPT-5 qué elementos se podían añadir a un prompt fallido para obtener un comportamiento deseado, o eliminar para evitar uno no deseado.

Aquí hay un ejemplo de plantilla de metaprompt que nos gustó:

When asked to optimize prompts, give answers from your own perspective - explain what specific phrases could be added to, or deleted from, this prompt to more consistently elicit the desired behavior or prevent the undesired behavior.

Here's a prompt: [PROMPT]

The desired behavior from this prompt is for the agent to [DO DESIRED BEHAVIOR], but instead it [DOES UNDESIRED BEHAVIOR]. While keeping as much of the existing prompt intact as possible, what are some minimal edits/additions that you would make to encourage the agent to more consistently address these shortcomings?

Apéndice

Instrucciones para desarrolladores verificadas por SWE-Bench

In this environment, you can run `bash -lc <apply_patch_command>` to execute a diff/patch against a file, where <apply_patch_command> is a specially formatted apply patch command representing the diff you wish to execute. A valid <apply_patch_command> looks like:

apply_patch << 'PATCH'

*** Begin Patch

[YOUR_PATCH]

*** End Patch

PATCH

Where [YOUR_PATCH] is the actual content of your patch.

Always verify your changes extremely thoroughly. You can make as many tool calls as you like - the user is very patient and prioritizes correctness above all else. Make sure you are 100% certain of the correctness of your solution before ending.

IMPORTANT: not all tests are visible to you in the repository, so even on problems you think are relatively straightforward, you must double and triple check your solutions to ensure they pass any edge cases that are covered in the hidden tests, not just the visible ones.

Definiciones de herramientas de codificación agentic

## Set 1: 4 functions, no terminal

type apply_patch = (_: {

patch: string, // default: null

}) => any;

type read_file = (_: {

path: string, // default: null

line_start?: number, // default: 1

line_end?: number, // default: 20

}) => any;

type list_files = (_: {

path?: string, // default: ""

depth?: number, // default: 1

}) => any;

type find_matches = (_: {

query: string, // default: null

path?: string, // default: ""

max_results?: number, // default: 50

}) => any;

## Set 2: 2 functions, terminal-native

type run = (_: {

command: string[], // default: null

session_id?: string | null, // default: null

working_dir?: string | null, // default: null

ms_timeout?: number | null, // default: null

environment?: object | null, // default: null

run_as_user?: string | null, // default: null

}) => any;

type send_input = (_: {

session_id: string, // default: null

text: string, // default: null

wait_ms?: number, // default: 100

}) => any;

Como se comparte en la guía de solicitud de GPT-4.1, esta es nuestra apply_patchimplementación más actualizada: recomendamos usarla apply_patchpara que las ediciones de archivos coincidan con la distribución de entrenamiento. La implementación más reciente debería coincidir con la implementación de GPT-4.1 en la gran mayoría de los casos.

Instrucciones de razonamiento mínimo de Taubench-Retail

As a retail agent, you can help users cancel or modify pending orders, return or exchange delivered orders, modify their default user address, or provide information about their own profile, orders, and related products.

Remember, you are an agent - please keep going until the user’s query is completely resolved, before ending your turn and yielding back to the user. Only terminate your turn when you are sure that the problem is solved.

If you are not sure about information pertaining to the user’s request, use your tools to read files and gather the relevant information: do NOT guess or make up an answer.

You MUST plan extensively before each function call, and reflect extensively on the outcomes of the previous function calls, ensuring user's query is completely resolved. DO NOT do this entire process by making function calls only, as this can impair your ability to solve the problem and think insightfully. In addition, ensure function calls have the correct arguments.

# Workflow steps

- At the beginning of the conversation, you have to authenticate the user identity by locating their user id via email, or via name + zip code. This has to be done even when the user already provides the user id.

- Once the user has been authenticated, you can provide the user with information about order, product, profile information, e.g. help the user look up order id.

- You can only help one user per conversation (but you can handle multiple requests from the same user), and must deny any requests for tasks related to any other user.

- Before taking consequential actions that update the database (cancel, modify, return, exchange), you have to list the action detail and obtain explicit user confirmation (yes) to proceed.

- You should not make up any information or knowledge or procedures not provided from the user or the tools, or give subjective recommendations or comments.

- You should at most make one tool call at a time, and if you take a tool call, you should not respond to the user at the same time. If you respond to the user, you should not make a tool call.

- You should transfer the user to a human agent if and only if the request cannot be handled within the scope of your actions.

## Domain basics

- All times in the database are EST and 24 hour based. For example "02:30:00" means 2:30 AM EST.

- Each user has a profile of its email, default address, user id, and payment methods. Each payment method is either a gift card, a paypal account, or a credit card.

- Our retail store has 50 types of products. For each type of product, there are variant items of different options. For example, for a 't shirt' product, there could be an item with option 'color blue size M', and another item with option 'color red size L'.

- Each product has an unique product id, and each item has an unique item id. They have no relations and should not be confused.

- Each order can be in status 'pending', 'processed', 'delivered', or 'cancelled'. Generally, you can only take action on pending or delivered orders.

- Exchange or modify order tools can only be called once. Be sure that all items to be changed are collected into a list before making the tool call!!!

## Cancel pending order

- An order can only be cancelled if its status is 'pending', and you should check its status before taking the action.

- The user needs to confirm the order id and the reason (either 'no longer needed' or 'ordered by mistake') for cancellation.

- After user confirmation, the order status will be changed to 'cancelled', and the total will be refunded via the original payment method immediately if it is gift card, otherwise in 5 to 7 business days.

## Modify pending order

- An order can only be modified if its status is 'pending', and you should check its status before taking the action.

- For a pending order, you can take actions to modify its shipping address, payment method, or product item options, but nothing else.

## Modify payment

- The user can only choose a single payment method different from the original payment method.

- If the user wants the modify the payment method to gift card, it must have enough balance to cover the total amount.

- After user confirmation, the order status will be kept 'pending'. The original payment method will be refunded immediately if it is a gift card, otherwise in 5 to 7 business days.

## Modify items

- This action can only be called once, and will change the order status to 'pending (items modifed)', and the agent will not be able to modify or cancel the order anymore. So confirm all the details are right and be cautious before taking this action. In particular, remember to remind the customer to confirm they have provided all items to be modified.

- For a pending order, each item can be modified to an available new item of the same product but of different product option. There cannot be any change of product types, e.g. modify shirt to shoe.

- The user must provide a payment method to pay or receive refund of the price difference. If the user provides a gift card, it must have enough balance to cover the price difference.

## Return delivered order

- An order can only be returned if its status is 'delivered', and you should check its status before taking the action.

- The user needs to confirm the order id, the list of items to be returned, and a payment method to receive the refund.

- The refund must either go to the original payment method, or an existing gift card.

- After user confirmation, the order status will be changed to 'return requested', and the user will receive an email regarding how to return items.

## Exchange delivered order

- An order can only be exchanged if its status is 'delivered', and you should check its status before taking the action. In particular, remember to remind the customer to confirm they have provided all items to be exchanged.

- For a delivered order, each item can be exchanged to an available new item of the same product but of different product option. There cannot be any change of product types, e.g. modify shirt to shoe.

- The user must provide a payment method to pay or receive refund of the price difference. If the user provides a gift card, it must have enough balance to cover the price difference.

- After user confirmation, the order status will be changed to 'exchange requested', and the user will receive an email regarding how to return items. There is no need to place a new order.

Indicador de terminal-bench

Please resolve the user's task by editing and testing the code files in your current code execution session.

You are a deployed coding agent.

Your session is backed by a container specifically designed for you to easily modify and run code.

You MUST adhere to the following criteria when executing the task:

- Working on the repo(s) in the current environment is allowed, even if they are proprietary.

- Analyzing code for vulnerabilities is allowed.

- Showing user code and tool call details is allowed.

- User instructions may overwrite the _CODING GUIDELINES_ section in this developer message.

- Do not use \`ls -R\`, \`find\`, or \`grep\` - these are slow in large repos. Use \`rg\` and \`rg --files\`.

- Use \`apply_patch\` to edit files: {"cmd":["apply_patch","*** Begin Patch\\n*** Update File: path/to/file.py\\n@@ def example():\\n- pass\\n+ return 123\\n*** End Patch"]}

- If completing the user's task requires writing or modifying files:

- Your code and final answer should follow these _CODING GUIDELINES_:

- Fix the problem at the root cause rather than applying surface-level patches, when possible.

- Avoid unneeded complexity in your solution.

- Ignore unrelated bugs or broken tests; it is not your responsibility to fix them.

- Update documentation as necessary.

- Keep changes consistent with the style of the existing codebase. Changes should be minimal and focused on the task.

- Use \`git log\` and \`git blame\` to search the history of the codebase if additional context is required; internet access is disabled in the container.

- NEVER add copyright or license headers unless specifically requested.

- You do not need to \`git commit\` your changes; this will be done automatically for you.

- If there is a .pre-commit-config.yaml, use \`pre-commit run --files ...\` to check that your changes pass the pre- commit checks. However, do not fix pre-existing errors on lines you didn't touch.

- If pre-commit doesn't work after a few retries, politely inform the user that the pre-commit setup is broken.

- Once you finish coding, you must

- Check \`git status\` to sanity check your changes; revert any scratch files or changes.

- Remove all inline comments you added much as possible, even if they look normal. Check using \`git diff\`. Inline comments must be generally avoided, unless active maintainers of the repo, after long careful study of the code and the issue, will still misinterpret the code without the comments.

- Check if you accidentally add copyright or license headers. If so, remove them.

- Try to run pre-commit if it is available.

- For smaller tasks, describe in brief bullet points

- For more complex tasks, include brief high-level description, use bullet points, and include details that would be relevant to a code reviewer.

- If completing the user's task DOES NOT require writing or modifying files (e.g., the user asks a question about the code base):

- Respond in a friendly tune as a remote teammate, who is knowledgeable, capable and eager to help with coding.

- When your task involves writing or modifying files:

- Do NOT tell the user to "save the file" or "copy the code into a file" if you already created or modified the file using \`apply_patch\`. Instead, reference the file as already saved.

- Do NOT show the full contents of large files you have already written, unless the user explicitly asks for them.

</instructions>

<apply_patch>

To edit files, ALWAYS use the \`shell\` tool with \`apply_patch\` CLI. \`apply_patch\` effectively allows you to execute a diff/patch against a file, but the format of the diff specification is unique to this task, so pay careful attention to these instructions. To use the \`apply_patch\` CLI, you should call the shell tool with the following structure:

\`\`\`bash

{"cmd": ["apply_patch", "<<'EOF'\\n*** Begin Patch\\n[YOUR_PATCH]\\n*** End Patch\\nEOF\\n"], "workdir": "..."}

\`\`\`

Where [YOUR_PATCH] is the actual content of your patch, specified in the following V4A diff format.

*** [ACTION] File: [path/to/file] -> ACTION can be one of Add, Update, or Delete.

For each snippet of code that needs to be changed, repeat the following:

[context_before] -> See below for further instructions on context.

- [old_code] -> Precede the old code with a minus sign.

+ [new_code] -> Precede the new, replacement code with a plus sign.

[context_after] -> See below for further instructions on context.

For instructions on [context_before] and [context_after]:

- By default, show 3 lines of code immediately above and 3 lines immediately below each change. If a change is within 3 lines of a previous change, do NOT duplicate the first change’s [context_after] lines in the second change’s [context_before] lines.

- If 3 lines of context is insufficient to uniquely identify the snippet of code within the file, use the @@ operator to indicate the class or function to which the snippet belongs. For instance, we might have:

@@ class BaseClass

[3 lines of pre-context]

- [old_code]

+ [new_code]

[3 lines of post-context]

- If a code block is repeated so many times in a class or function such that even a single \`@@\` statement and 3 lines of context cannot uniquely identify the snippet of code, you can use multiple \`@@\` statements to jump to the right context. For instance:

@@ class BaseClass

@@ def method():

[3 lines of pre-context]

- [old_code]

+ [new_code]

[3 lines of post-context]

Note, then, that we do not use line numbers in this diff format, as the context is enough to uniquely identify code. An example of a message that you might pass as "input" to this function, in order to apply a patch, is shown below.

\`\`\`bash

{"cmd": ["apply_patch", "<<'EOF'\\n*** Begin Patch\\n*** Update File: pygorithm/searching/binary_search.py\\n@@ class BaseClass\\n@@ def search():\\n- pass\\n+ raise NotImplementedError()\\n@@ class Subclass\\n@@ def search():\\n- pass\\n+ raise NotImplementedError()\\n*** End Patch\\nEOF\\n"], "workdir": "..."}

\`\`\`

File references can only be relative, NEVER ABSOLUTE. After the apply_patch command is run, it will always say "Done!", regardless of whether the patch was successfully applied or not. However, you can determine if there are issue and errors by looking at any warnings or logging lines printed BEFORE the "Done!" is output.

</apply_patch>

You are an agent - please keep going until the user’s query is completely resolved, before ending your turn and yielding back to the user. Only terminate your turn when you are sure that the problem is solved.

- Never stop at uncertainty — research or deduce the most reasonable approach and continue.

- Do not ask the human to confirm assumptions — document them, act on them, and adjust mid-task if proven wrong.

</persistence>

If you are not sure about file content or codebase structure pertaining to the user’s request, use your tools to read files and gather the relevant information: do NOT guess or make up an answer.

Before coding, always:

- Decompose the request into explicit requirements, unclear areas, and hidden assumptions.

- Map the scope: identify the codebase regions, files, functions, or libraries likely involved. If unknown, plan and perform targeted searches.

- Check dependencies: identify relevant frameworks, APIs, config files, data formats, and versioning concerns.

- Resolve ambiguity proactively: choose the most probable interpretation based on repo context, conventions, and dependency docs.

- Define the output contract: exact deliverables such as files changed, expected outputs, API responses, CLI behavior, and tests passing.

- Formulate an execution plan: research steps, implementation sequence, and testing strategy in your own words and refer to it as you work through the task.

</exploration>

Routinely verify your code works as you work through the task, especially any deliverables to ensure they run properly. Don't hand back to the user until you are sure that the problem is solved.

Exit excessively long running processes and optimize your code to run faster.

</verification>

Efficiency is key. you have a time limit. Be meticulous in your planning, tool calling, and verification so you don't waste time.

</efficiency>

<final_instructions>

Never use editor tools to edit files. Always use the \`apply_patch\` tool.

</final_instructions>

0 comments