In today’s column, I showcase a powerful new prompting technique called chain-of-draft (CoD) that adds to the ever-expanding list of best practices in prompt engineering.

Readers might recall that I previously posted an in-depth depiction of over 80 prompt engineering techniques and methods (see the link here). Top-notch prompt engineers realize that learning a wide array of researched and proven prompting techniques is the best way to get the most out of generative AI.

The newly devised chain-of-draft leverages and improves upon the classic chain-of-thought (CoT) method. Advantages of the chain-of-draft technique include that it tends to work faster, produces more concentrated results than chain-of-thought, and is less costly since it consumes fewer processing cycles and tokens. A potential downside is that the results are relatively succinct and should therefore be used mainly when you don’t need a blossomed response from AI.

Chain-of-draft ought to be in the skillset of all prompt engineers.

Let’s talk about it.

This analysis of an innovative AI breakthrough is part of my ongoing Forbes column coverage on the latest in AI, including identifying and explaining various impactful AI complexities (see the link here).

Fundamentals Of Chain-of-Thought

Before we dive into chain-of-draft, it is important to be familiar with the chain-of-thought method. This will set the stage for identifying and detailing the new CoD technique.

One of the handiest prompting techniques consists of telling generative AI to do a chain-of-thought processing approach. All you need to do is tell the AI to proceed on a stepwise basis, and the AI will then showcase various logical steps that it performed to reach an answer. In any given prompt, you can simply tell the AI to proceed ahead one step at a time. The AI will get your gist and quickly shift into the revered chain-of-thought mode.

For various examples of chain-of-thought prompting, see my discussion at the link here.

Numerous research studies have substantiated that the use of CoT tends to spur generative AI toward better answers than if CoT is not being used. This is partially because the AI slows down to carefully specify each step of a solving process. Greater depth and focus occur. Most of the AI makers have tilted their AI toward being super-fast, more so than being necessarily accurate or correct. By giving a prompt that explicitly tells the AI to do CoT, you are giving the AI permission to methodically attempt to answer your query.

The New Chain-Of-Draft

Chain-of-draft is an offshoot of CoT and cleverly leverages chain-of-thought as a helpful launching pad. There are many other quite handy CoT variant prompting methods such as logic-of-thoughts (see LoT described at the link here), tree-of-thoughts (see ToT at the link here), skeleton-of-thoughts (see SoT at the link here), and others.

The deal with chain-of-draft is that you are going to provide mindful boundaries to how the stepwise processing is going to take place. The conventional chain-of-thought is a bit overly loose and inadvertently allows AI to wander and be verbose. The CoD prompt will rein in CoT and keep it tight and firm.

A chain-of-draft prompt instructs generative AI to perform a stepwise process, which is similar to a CoT instruction, and then adds an accompanying bounding condition. For example, here’s a chain-of-draft prompt:

  • Chain-of-draft prompt: “Think step-by-step to answer the following question but only keep a minimum draft for each thinking step.”

Notice that the bounding condition is that the AI is to keep to just a minimum draft for each of the thinking steps. This is a crucial means of suppressing the usual wandering and longwindedness of conventional CoT efforts.

You can further expand the bounding condition to include additional constraints if so desired. It all depends on what you are aiming to achieve. If you want a really terse response that solely contains the nuts and bolts of an answer, you can stipulate how many words are allowed per step. For example, you could include an indication that each step is only allowed to consist of five words at most.

To illustrate the chain-of-draft, I present here a conventional chain-of-thought instruction, followed by a basic version of chain-of-draft, and then followed by a more advanced version of chain-of-draft that encompasses an augmented constraint:

  • (1) Conventional CoT: “Think step-by-step to answer the following question.”
  • (2) CoD basic: “Think step-by-step to answer the following question but only keep a minimum draft for each thinking step.”
  • (3) CoD advanced: “Think step-by-step to answer the following question but only keep a minimum draft for each thinking step, with 5 words at most.”

Putting CoD Into Practice

Let’s see what happens when you use the new CoD versus a conventional CoT approach.

I went ahead and opted to ask generative AI to help me with a project that I’m going to be working on. I’m putting together a team of four that will develop a new financial app, and I estimate that the project will take about 3 months to get done. I want a quick overview of a sensible plan for this project.

First, I entered a prompt that is the plain vanilla chain-of-thought and asked for a devised plan:

  • Conventional chain-of-thought prompt: “Think step-by-step to plan the launch of a new financial app that requires a team of four people and will take 3 months to undertake. Show me the devised plan.”
  • Generative AI response: “At the start, you will need to define the core features of the app. This will help scope the work and allocate responsibilities. Then, you should set a rough timeline by carefully dividing the three months into three phases: (1) design, (2) development, and (3) testing and launch. You will have four people, so we need to divide up their responsibilities into the type of work each person will do. One can handle the frontend development, another the backend of the development, one for the interfacing UI/UX, and one for doing the marketing about the app. In the first month, you should focus on the overall design and prototyping. The second month is for building the minimum viable product (MVP). The third month is for testing, bug fixes, and then launching the marketing push. Weekly sprints and daily stand-ups will help with coordination and staying on track.”

You might be satisfied with that answer in the sense that it is worded amply and you certainly get the crux of what the plan consists of. I looked at the processing consumed and found that this required 120 tokens, costing me accordingly based on the rate that I am paying per token for my AI usage.

The thing is, I’ve led many app development projects, and I didn’t need all the embellishment that the AI provided. I only needed the guts of the plan. Unfortunately, since I am paying per token, I paid for a lot of “fluff” that from my perspective wasn’t worth paying for.

So, I started a new conversation with the AI, making sure things were starting fresh, and asked the question again, but this time I used the chain-of-draft to provide a bounding condition:

  • Chain-of-draft basic prompt: “Think step-by-step to plan the launch of a new financial app that requires a team of four people and will take 3 months to undertake but only keep a minimum draft for each thinking step. Show me the devised plan.”
  • Generative AI response: “First, scope the core app features. Breakdown 3 months into design, build, test/launch. Assign 4 roles: Frontend, backend, UI/UX, marketing. Month 1 is mockups and architecture. Month 2 is dev. Month 3 is QA and launch prep. Do weekly sprints.”

I like the layout of the devised plan. The response was much terser, which is to my liking in this circumstance, and I knew instantly what the AI was talking about. Importantly, I checked to see how much processing this took, and it was merely 35 tokens. That was approximately one-fourth in contrast to the CoT consumption, dropping the cost nearly by that same proportion.

When Best To Use CoD

You should use CoD when you are desirous of having the AI work on a short-and-sweet basis.

If the potential response doesn’t have to be especially elaborate and the steps can be compact, you would turn to using CoD over the conventional CoT. It is okay to use either the CoT or CoD, such that you don’t have to always choose one versus the other — simply use the right one for the right situation.

For those who aren’t paying to use generative AI, I suppose that it doesn’t matter to you whether the prompt ends up consuming more tokens than you otherwise need. Well, even those that don’t pay per token are usually kept on a token-oriented budget by the AI maker and when you go over some allowed amount, they cut you off and tell you that you’ll need to wait a few hours before your allotment gets reset.

Thus, tokens and their consumption ultimately do likely matter.

Here’s my overarching advice on the chain-of-draft usage. My preference is to use chain-of-draft most of the time so that I keep things sharp. I will then at times switch away from CoD to a conventional CoT when I am widely exploring a topic or want an embellished answer. The other action I sometimes take is that I’ll use a CoD first, look at the response, and if the response seems overly curt, I will enter the prompt again and this second round uses a fully unencumbered CoT prompt.

Research Validates Chain-Of-Draft

My approach to recommending prompt engineering techniques is that I always insist on first knowing that some form of empirical research has been undertaken to substantiate that the technique is bona fide. There are tons of wanton claims about new potential prompting techniques and few that withstand the intense scrutiny of heads-down truth-seeking research.

In the case of chain-of-draft, there’s an insightful research paper that brought forth the CoD technique, namely “Chain of Draft: Thinking Faster by Writing Less” by Silei Xu, Wenhao Xie, Lingxiao Zhao, Pengcheng He, arXiv, March 3, 2025, that had this to say about CoD (excerpts):

  • “In this work, we propose Chain of Draft (CoD), a novel paradigm inspired by human cognitive processes, where LLMs generate minimalistic yet informative intermediate reasoning outputs while solving tasks.”
  • “Instead of verbose intermediate steps, Chain of Draft encourages LLMs to generate concise, dense-information outputs at each step. This approach reduces latency and computational costs without sacrifice of accuracy, making LLMs more practical for real-world applications where efficiency is paramount.”
  • “By reducing verbosity and focusing on critical insights, CoD matches or surpasses CoT in accuracy while using as little as only 7.6% of the tokens, significantly reducing cost and latency across various reasoning tasks.”

You can cheerfully use chain-of-draft knowing that it has a sound underlying basis.

Prompting Skillfully Is Wise

Chain-of-draft is like the proverbial use of a hammer, meaning that you should use a hammer when warranted and not go around hammering everything such as trying to force screws into a block of wood. Hammer nails instead. You should of necessity always have a hammer in your toolkit, and then use it astutely.

The same goes for chain-of-draft as a valued member of your prompt engineering toolkit and skillset. CoD should be comfortably infused in your prompting mindset when using generative AI. Make sure you are sufficiently versed in using the CoD technique. This can be done by playing around with it.

As they say, the best way to get to Carnegie Hall is three words, which also applies to the smart use of chain-of-draft, notably consisting of practice, practice, practice.

Share.
Exit mobile version