Sept 04, 2023

A Prompting Guide for Self-Tutors

A statistics undergrad was not sufficient for teaching me computer science and I’ve been playing catchup ever since. There are some phenomenal textbooks out there that I’ve been working through, but generally I find myself wanting to explore a topic for a weekend or two before moving on to something else. ChatGPT and some open source LLMs have made content summarization, audience adaptation, example generation far easier. More than anything, I’m looking to clarify this framework for myself.

Prerequisite Note

I’ll be using ChatGPT, August 3 Version, GPT-4 for this post. For GPT-4 my custom instructions follow the beautiful one’s laid out by @sbinslashessend on X, with a few additions from myself.


Be terse and concise without being rude.

It's okay to be opinionated if there's solid justification.

Call out misconceptions directly, but you do not need to find a specific misconception with everything I say.

Start responses with the most pertinent information then give context.

Respond as a busy, knowledgeable staff engineer would.

Do not refer to the project's complexity unless asked.

Do not provide commentary that a project will be too difficult to complete in a weekend or is too ambitious and out of my skill level.

Instead of telling me that "as of X date you don't have enough information" admit that info might be out of date but propose a research plan to learning the information (keywords, ideas, topics).

If I use the code word "tmode", only respond with code.

Curriculum Development

Here’s a sample prompt for outlining a learning curriculum. Fill in the applicable details.


Generate a detailed curriculum for learning [TOPIC]. The timeframe for the curriculum will be [TIMEFRAME].

Include subtopics, important questions, benchmarks, prerequisites and any other relevant information. Also include several chat prompts for exploring each subtopic.

Assume my current level of understanding is [EDUCATIONAL/TECHNICAL BACKGROUND]

As it doesn’t know you personally, GPT-4 may make some assumptions about what you may or may not know. Follow-up’s such as the following prompt can help make adjustments to the initial response. You can also add some information in the custom instructions to help GPT-4 understand your background.


I don't have much C/C++ experience but I am very experienced in python. What adaptations would you make with that understanding?

Here’s a full example where the topic will be “compilers”, the timeframe will be over “2 weeks, 3-4 days each week” and for education/technical background say “a bachelors in statistics and 5 years of systems development.”


Generate a detailed curriculum for learning compilers. The timeframe for the curriculum will be 2 weeks, maybe 3-4 days each week.

Include subtopics, important questions, benchmarks, prerequisites and any other relevant information. Also include several chat prompts for exploring each subtopic.

Assume my current level of understanding is a bachelors in statistics and 5 years of systems development.

And a sample response:


Compiler Curriculum: 2-3 Weeks

* Programming: C, C++, or Rust
* Data Structures: Trees, Graphs, Hash Tables
* Algorithms: Search and Sort
* Basic Computer Architecture: CPU, Memory, Stack, Heap

Week 1: Fundamentals
Day 1-2: Introduction to Compilers
* Subtopics: Lexical Analysis, Syntax Analysis, Intermediate Code Generation
* Questions: What is a compiler? What are the phases of compilation?
* Benchmarks: Implement a simple lexer.
* Chat Prompts: What's the role of a lexer? Compare and contrast lexer and parser.

Day 3-4: Lexical Analysis
* Subtopics: Tokens, Regular Expressions, Finite Automata
* Questions: How are tokens generated? What role do regular expressions play?
* Benchmarks: Build a lexer for a subset of C.
* Chat Prompts: Explain Finite Automata. Discuss the role of regular expressions in lexical analysis.

Week 2: Intermediate Topics
Day 1-2: Syntax Analysis
* Subtopics: Context-Free Grammars, Parsing Techniques
* Questions: What is a parse tree? How does LL and LR parsing differ?
* Benchmarks: Implement a basic parser.
* Chat Prompts: Explain Context-Free Grammars. Describe Recursive Descent Parsing.

Day 3-4: Semantic Analysis
* Subtopics: Type Checking, Scope Resolution
* Questions: What is type inference? What are symbol tables?
* Benchmarks: Implement type checking in your compiler.
* Chat Prompts: Dive into type checking techniques. Discuss symbol tables and scope resolution.

Week 3: Advanced Topics (Optional)
Day 1-2: Intermediate Code Generation
* Subtopics: Three-Address Code, Quadruples, Triples
* Questions: What is SSA form? How is intermediate code optimized?
* Benchmarks: Generate intermediate code from your parse tree.
* Chat Prompts: Discuss the need for intermediate code. Explain SSA form and its advantages.

Day 3-4: Code Optimization & Final Code Generation
* Subtopics: Peephole Optimization, Register Allocation
* Questions: How is final code generated? What are optimization techniques?
* Benchmarks: Implement basic optimizations and code generation.
* Chat Prompts: Discuss different optimization techniques. Describe the process of register allocation.

* "Compilers: Principles, Techniques, and Tools" by Alfred V. Aho
* LLVM and GCC documentation
* Online Compiler Construction Courses

Additional Notes
* Projects are essential; theory alone won't cut it.
* For quick iterations, consider building a compiler for a simpler language first.
* Code reviews can provide valuable insights into better practices.

This gives us a nice starting point for building out our curriculum.

Breaking Down Sub-Topics

Armed with a curriculum we can learn some stuff. Transitioning from the above bullet list to actual content is a bit tricky since GPT-4 tends to veer into tangents that are hard to refocus back to the curriculum. My suggestion is to start a new chat session and roll sections of the curriculum into the prompt.


I'm following a curriculum for learning [TOPIC]. The timeframe for the curriculum is [TIMELINE].
I'm [CURRENT TIME FRAME] into the curriculum, and covering [CURRENT TOPIC]. We'll be covering:


1. Address this area at a high level. Give me historical context, definitions and answer the questions and chat prompts.
2. Break the subtopics into smaller pieces following the same structure.
3. Give me a project to implement for this section.

Here we give the model context as to what we’re learning, the timeframe within which we’re learning it and what we’re covering. We also ask for additional context about the topic. The generated response should provide a lot of follow-up questions that can be used as prompts for further exploration if needed. Finally, we ask for a project implementation. We’ll explore in more detail how to create exercises and project requirements in a few sections. Below I’ll demonstrate an example prompt using the Syntax Analysis section from Week 2, Days 1-2 from the curriculum we generated earlier.


I'm following a curriculum for learning compilers. The timeframe for the curriculum is 2 weeks, 3-4 days each week. I'm into Week 2 of the curriculum, week 2 is intermediate topics and the first 2 days are "Syntax Analysis". We'll be covering:

Subtopics: Context-Free Grammars, Parsing Techniques
Questions: What is a parse tree? How does LL and LR parsing differ?
Benchmarks: Implement a basic parser.
Chat Prompts: Explain Context-Free Grammars. Describe Recursive Descent Parsing.

1. Address this area at a high level. Give me historical context, definitions and answer the questions and chat prompts.
2. Break these subtopics into smaller pieces following the same structure.
3. Give me a project to implement for this section

And the sample response:


1. High-Level Overview
Historical Context
Syntax analysis, or parsing, dates back to early work in linguistics and formal language theory. Noam Chomsky introduced context-free grammars (CFG) in the 1950s. In computing, parsing is crucial in compilers, interpreters, and many other data-processing tools.

* Context-Free Grammar (CFG): A set of production rules that define a language. Comprises terminals, non-terminals, start symbol, and production rules.
* Parse Tree: Hierarchical tree that represents the structure of the source code according to a grammar.
* LL Parsing: Top-down parsing, scans left-to-right, and uses leftmost derivation.
* LR Parsing: Bottom-up parsing, scans left-to-right, and uses rightmost derivation in reverse.

* Parse Tree: Visual representation of how a grammar maps to the input string.
* LL vs LR: LL starts from the top and works down, predicting input. LR starts from the bottom and works up, reducing input to grammar.

Chat Prompts
* Context-Free Grammars: Rule-based way to generate strings in a language. Key in language design and parsing.
* Recursive Descent Parsing: A type of LL parser that uses recursion to break down the input into its grammatical components

2. Breakdown of Subtopics
Context-Free Grammars
* Terminals: The basic symbols of the language.
* Non-terminals: Symbols used to define the language.
* Start Symbol: Where the derivation starts.
* Production Rules: How terminals and non-terminals transform.

Parsing Techniques
* Top-Down Parsing: Start from root and predict leaves; includes LL.
* Bottom-Up Parsing: Start from leaves and reduce to root; includes LR.
* Predictive Parsing: A simplified, non-recursive version of LL.
* Recursive Descent: A subtype of LL, uses recursion for parsing.

3. Project
Implement a Basic Parser
* Goal: Parse a simplified arithmetic expression (e.g., 2+3*4).
 * Sub-goals:
  * Define a CFG for the arithmetic expression.
  * Implement tokenization.
  * Implement a recursive descent parser based on the CFG.
 * Milestones:
  * Tokenization of input into numbers, operators, parentheses.
  * Parse tokens into a parse tree.
  * Evaluate the parse tree to get the final result.

You can implement this in a language of your choice. Use unit tests to verify each sub-goal.

The Session Lesson Plan

While building outlines is fun and gives us some structure, there’s not a whole lot of meat to work with in terms of content. We can call this the session lesson plan, as ideally it’s what you’d be trying to cover in a single learning session.

I’d recommend modifying the custom instructions for these prompts, as the custom instructions we were previously using won’t give much in terms of content specifically because we asked it to “be terse” and “answer as if [it] were a busy staff engineer”.


You're an advanced computer science professor with many years of experience.

You're a gifted tutor and often use a socratic approach to helping learners.

You primarily teach, give context and answer questions.

You have a deep knowledge of resources and domain knowledge and aren't afraid to encourage further reading at specific, well known sources.

This will give us a more socratic model that will make for a better tutor. I need to explore it more, but perhaps some combination of the two custom instructions will make for a better prompt. In the meantime, let’s build a strong prompt for more deeply exploring the subtopics. Create a new chat and try out the following:


I'm studying [TOPIC] and want to learn about [SUB-TOPIC]. Prepare 4 or more sentences about each of the following:

1. High level summary
2. Key Terms & Definitions
3. History
4. Detailed example
5. Practical Applications
6. Comparative Analysis
7. Common misconceptions and caveats
8. Pros and Cons
9. Further Resources
10. Visualization
11. Self-assessment Questions
12. Experts and Pioneers in the field
13. Follow-up questions

Feel free to add or remove any of the numbered items. Some of them may be more or less helpful depending on the topic you’re covering. Adding additional context or asks to some of them can also be helpful, such as:


10. Visualization (Create an ascii visualization of [SUB-TOPIC])

The response will likely end by asking if you want to dive more deeply into any of the areas, feel free to explore here and try stuff out. Here’s an example interaction:


I'm studying compilers and want to learn about type checking in the context of writing compilers. Prepare 4 or more sentences about each of the following:

1. High level summary
(cont... Same as the template)


1. High-Level Summary
Type checking is a phase in the process of compiling a program where the compiler ensures that the operations in the source code are semantically valid and conform to the types they are supposed to operate on. In essence, it aims to catch type errors, such as adding an integer to a string, before runtime. Type checking can be either static (done at compile time) or dynamic (done at runtime), and different programming languages have varying levels of strictness in their type systems.

2. Key Terms & Definitions
Type System: The collection of rules that assign a property called a 'type' to various constructs in a computer program.
Static Type Checking: The process of verifying the type constraints of a program at compile-time.
Dynamic Type Checking: Type checking done during program execution.
Type Inference: The ability of the compiler to automatically deduce the types of expressions.
Polymorphism: The ability to write code that can work on values of multiple types.

3. History
Type checking has its roots in mathematical logic and type theory, which were formalized in the early 20th century by logicians like Bertrand Russell and Alonzo Church. The implementation of type systems in programming languages became prominent with the advent of languages like ALGOL and later, C, which introduced strong static typing. The development of languages like Haskell has taken type systems to an even higher level of sophistication.

4. Detailed Example
Consider a simple example using pseudo-code. Let's say you have a function add(a: int, b: int) that adds two integers. If you try to call add("hello", 2), a static type checker will flag this as an error during compilation, stating that a string cannot be added to an integer.

5. Practical Applications
Type checking is crucial for building reliable, robust software. It minimizes the number of bugs and vulnerabilities related to incorrect type usage, thereby reducing the debugging time and making the codebase easier to understand and maintain.

6. Comparative Analysis
In static type checking, errors are caught at compile-time, making the compiled code more efficient. However, it can make the development process slower and more rigid. Dynamic type checking, common in languages like Python and Ruby, provides more flexibility but can lead to runtime errors if types are misused.

7. Common Misconceptions and Caveats
Many people believe that dynamic typing is inherently less safe than static typing. While dynamic typing can lead to runtime errors, many modern dynamically-typed languages offer ways to mitigate these issues. Also, static typing is not a guarantee against all kinds of errors; logical errors can still exist.

8. Pros and Cons
- Type checking can catch errors early, improving code reliability.
- Enhances code readability and maintainability.
- Static type checking can make the code more verbose.
- Dynamic type checking can lead to runtime errors if not handled carefully.

9. Further Resources
- "Types and Programming Languages" by Benjamin C. Pierce
- "Compilers: Principles, Techniques, and Tools" by Alfred V. Aho, Monica S. Lam, Ravi Sethi, and Jeffrey D. Ullman

10. Visualization
Imagine your code as a pipeline, and the type checker as a filter in that pipeline. Only data (variables, expressions, etc.) that match the expected types can pass through this filter, ensuring that what comes out the other end is type-safe.

11. Self-assessment Questions
- What is the difference between static and dynamic type checking?
- How does type inference work?
- Can you list three practical benefits of type checking?

12. Experts and Pioneers in the Field
- Alonzo Church: For his foundational work in type theory.
- Robin Milner: For his work on type inference in the ML programming language.
- Benjamin C. Pierce: Known for his contributions to the field and his seminal book "Types and Programming Languages".

13. Follow-up Questions
- Would you like to explore the mathematical foundations of type systems in more detail?
- Are you interested in specific type systems used in modern languages like Rust or Haskell?
- How do you plan to apply type checking in your own projects or research?
- Would you like to delve into any of these points in more detail?

Validating Understanding

As part of the learning process, validating or testing yourself is important. We can use prompting to create some open ended questions to explore our understanding of a given topic.


I've just studied [SUB-TOPIC] as part of a syllabus on [TOPIC]. To validate my understanding, please ask me:

Recall and Understanding
* Define key terms related to [TOPIC].
* Summarize the main points of [TOPIC].

* Provide a practical example where [TOPIC] can be applied.
* Pose a problem-solving question related to [TOPIC].

* Categorize the types or classes of [TOPIC].

* Suggest ways to improve or adapt [TOPIC].
* Integrate [TOPIC] with [ANOTHER TOPIC] to solve a given problem.

* Critique the strengths and weaknesses of [TOPIC].
* Recommend a course of action for [SPECIFIC SCENARIO].

* Ask me to rate my confidence level in understanding [TOPIC].
* Identify gaps in my understanding and areas for further study.

Since this section may have a lot of back and forth it helps to break the sections up into separate prompts/chats. I’ll offer a brief example as this section is relatively open-ended.


I've just studied context-free grammars as part of a syllabus on building compliers. To validate my understanding, please ask me:

Recall and Understanding
- Define key terms related to context-free grammars.
- Summarize the main points of Context-free Grammars.


Certainly, let's delve into your understanding of context-free grammars (CFGs).
Recall and Understanding
* Define Key Terms: Could you please provide definitions for the following terms related to context-free grammars?
  * Terminal
  * Non-terminal
  * Production rule
  * Derivation
  * Parse tree
* Summarize Main Points: Can you provide a concise summary of the main points concerning context-free grammars? What roles do they play in compiler design, and why are they important?

Feel free to be as detailed as you like. This will give us a good starting point to explore the topic further.

Creating Exercises

Ok, so now we want some hands-on coding practice. This exercise generator is handy for tailoring an exercise to a preferred topic and difficulty level and also doesn’t require the pre-requisite context of all the previous chats. Don’t forget to remove the Solution & Explanation section if you don’t want a sneak peek at the answer.


I'm learning [TOPIC] and I want to focus on [SUB-TOPIC]. Generate a [DIFFICULTY-LEVEL] programming exercise for me with the following:
* Conceptual Foundation: Briefly explain the concept this exercise will focus on.
* Learning Objectives: State the learning objectives for this exercise.
* Requirements & Constraints: List the tasks to be accomplished and any constraints.
* Sample Inputs & Outputs: Provide sample input and expected output for clarity.
* Hints & Tips: Offer any hints or resources that could help in solving the exercise.
* Extension Opportunities: Suggest ways to extend or adapt the exercise.
* Assessment Criteria: Define the criteria for successful completion.
* Solution & Explanation: Optionally, provide a sample solution and explanation for review later.
* Reflection: Pose questions for self-reflection after the exercise is completed.

I ran this with the topic building compliers, the sub-topic of lexars and an intermediate difficulty and got the following sample response.


Exercise: Implementing a Simple Lexical Analyzer for a Subset of Python

Conceptual Foundation
Lexical analysis is the first phase of a compiler's frontend. In this phase, the source code is converted into a sequence of tokens. Each token represents a particular syntactic role in the language. In this exercise, we will focus on building a simple lexical analyzer for a small subset of the Python language that recognizes identifiers, keywords, and some operators.

Learning Objectives
- Understand the basic principles behind lexical analysis.
- Learn to identify different types of tokens like identifiers, keywords, and operators.
- Gain experience with regular expressions and state machines.

Requirements & Constraints
1. Implement the lexical analyzer in Python.
2. Your lexical analyzer should be able to recognize the following types of tokens:
 - Identifiers: a sequence of letters and numbers, starting with a letter. For simplicity, assume identifiers are case-sensitive.
 - Keywords: if, else, while, for, def
 - Operators: =, +, -, *, /
- You can only use standard libraries.
- The lexical analyzer should ignore white spaces and new lines.

Sample Inputs & Outputs
if x == 10:
  y = x + 1

 ('KEYWORD', 'if'),
 ('IDENTIFIER', 'x'),
 ('OPERATOR', '='),
 ('INTEGER', '10'),
 ('KEYWORD', 'y'),
 ('OPERATOR', '='),
 ('IDENTIFIER', 'x'),
 ('OPERATOR', '+'),
 ('INTEGER', '1')

Hints & Tips
- Familiarize yourself with regular expressions in Python.
- You can use Python's re library to implement the token matching logic.

Extension Opportunities
1. Add support for more data types like floats, strings, and booleans.
2. Extend the lexical analyzer to recognize more operators and keywords.
3. Implement a pretty-printer for the tokens.

Assessment Criteria
- The lexical analyzer should correctly tokenize the sample input.
- The code should be well-commented, following best practices for readability and maintainability.

* Was the regular expression approach easier or more difficult than you anticipated? Why?
* Could you think of other ways to extend this lexical analyzer?
* How would you adapt this exercise to handle more complex language constructs?

Storing and Revisiting Material

Finally, we need a way to catalog our learning. I’d suggest placing a file in your project directory or multiple if you find yourself getting a bit carried away! Don’t be afraid to iterate or change the prompts as the models evolve or if the outputs aren’t meeting your expectations. Sometimes one or two additional or removed lines can make a difference in the response output. Most importantly though, I highly recommend keeping a prompt library somewhere.