flowchart LR
A["You describe<br/>a goal"] --> B["Agent plans<br/>the steps"]
B --> C["Agent executes<br/>(read, write, run)"]
C --> D["Agent checks<br/>results"]
D -->|"Not done"| B
D -->|"Done"| E["Returns<br/>output"]
Lecture 12: Responsible Use of Coding Agents

Last class we used Claude Code to turn a CV into a Quarto website. Today we go deeper: understanding how agentic tools work under the hood, using them responsibly, and applying them to two real tasks—data analysis and literature research.
Agentic concepts
What makes an agent “agentic”?
A regular chatbot generates text. An agentic tool takes actions—reading files, running code, searching the web, calling APIs. The difference is autonomy: you describe a goal, and the agent figures out the steps.
Three concepts make modern agents powerful: skills, MCP servers, and slash commands.
Skills
A skill is a reusable chunk of instructions that tells the agent how to do something specific. Think of it as a recipe card. When you invoke a skill, the agent loads those instructions and follows them.
For example, Claude Code has a built-in /commit skill. When you type /commit, it doesn’t just run git commit—it reads a detailed set of instructions that tell it to:
- Check
git statusandgit diff - Draft a concise commit message summarizing the “why”
- Stage the right files
- Avoid committing secrets
Without the skill, you’d have to explain all of this every time. Skills encode best practices so you don’t have to repeat yourself.
MCP: Model Context Protocol
MCP (Model Context Protocol) is an open standard that lets agents connect to external services. An MCP server is a small program that exposes a set of tools the agent can call—like searching a database, querying a bioinformatics platform, or fetching data from the web.
flowchart LR
A["Agent<br/>(Claude Code)"] --> B["MCP Server<br/>(Galaxy)"]
A --> C["MCP Server<br/>(PubMed)"]
A --> D["MCP Server<br/>(File system)"]
B --> E["Search tools<br/>Run workflows"]
C --> F["Search papers<br/>Fetch abstracts"]
D --> G["Read/write<br/>local files"]
Why does this matter? Without MCP, an agent can only do what’s built into it. With MCP, it can plug into any service that provides an MCP server. This is the same idea as USB—a universal standard that lets different devices work together without custom adapters.
MCP was introduced by Anthropic in late 2024 and is now supported by multiple AI platforms. It’s an open protocol—anyone can build an MCP server for their service.
Slash commands
Slash commands are shortcuts you type in the agent’s prompt to trigger specific behaviors. They’re the user-facing interface to skills and tools.
Some examples in Claude Code:
| Command | What it does |
|---|---|
/commit |
Stages changes and creates a Git commit |
/review-pr |
Reviews a pull request |
/help |
Shows available commands |
Slash commands keep you in flow—instead of writing a multi-sentence prompt, you type a short command and the agent knows exactly what to do.
Putting it together
The key insight: skills define how to do things, MCP servers define what services are available, and slash commands let you trigger them quickly. Together, they make agents reliable and extensible. Instead of hoping the AI figures out the right approach, you give it structured tools and instructions.
In Claude we trust?
Agentic AI is developing at an unprecedented pace. Before we use these tools for real work, we need to talk about what they are—and what they aren’t.
The current moment
Two recent opinion pieces capture the tension:
Paul Ford (New York Times, Feb 18 2026): AI and the Future of Software — a software professional describes the moment when AI could do the technical work of his job, and what that means for the field.
Peggy Noonan (Wall Street Journal, Feb 13 2026): Brace Yourself for the AI Tsunami — on the warnings from AI creators themselves, including Anthropic CEO Dario Amodei, that this is moving faster than expected and carries risks of “terrible empowerment.”
Watch this video for additional context:
A hammer doesn’t decide where to strike. An agent does. It reads your files, writes code, runs commands, and makes decisions about what to do next. This autonomy is what makes agents powerful—and what makes them dangerous when used carelessly.
Despite the fact that modern models are extremely capable, they still make mistakes. They hallucinate references, invent data, produce plausible-looking but incorrect code, and confidently assert things that are wrong.
In research contexts—writing papers, reporting results, citing literature—you must never trust AI output without verification. Every claim needs a source. Every number needs a check. Every citation needs to be confirmed as real. The agent is a collaborator, not an authority.
Setting things up
Before we start, let’s set up a fresh workspace.
Step 1: Log into GitHub
Go to github.com and sign in.
Step 2: Create a new repository
1. Click “+” → “New repository” in the top-right corner
2. Name it something like research-initiated
3. Check “Add a README file”
4. Click “Create repository”
Step 3: Start a Codespace
1. In your new repo, click Code → Codespaces → “+”
2. Wait for the Codespace to spin up and open the terminal
Step 4: Install Claude Code and set your API key
Follow the same setup as Lecture 11 — Steps 2 through 4:
1. Install Claude Code:
curl -fsSL https://claude.ai/install.sh | bash2. Restart the terminal (or run source ~/.bashrc)
3. Set your API key (replace with the key I give you):
export ANTHROPIC_API_KEY="your-key-here"4. Configure Claude Code for API key authentication:
mkdir -p ~/.claude
echo 'echo ${ANTHROPIC_API_KEY}' > ~/.claude/anthropic_key_helper.sh
chmod +x ~/.claude/anthropic_key_helper.shLAST20=$(echo -n "$ANTHROPIC_API_KEY" | tail -c 20)
cat > ~/.claude.json << EOF
{
"customApiKeyResponses": {
"approved": ["$LAST20"],
"rejected": []
},
"hasCompletedOnboarding": true
}
EOF
cat > ~/.claude/claude.json << EOF
{
"apiKeyHelper": "$HOME/.claude/anthropic_key_helper.sh"
}
EOF5. Start Claude Code:
claudeData analysis with plan files
Why plan files?
In the previous lecture we typed a single prompt and let Claude Code run. That works for simple tasks, but for data analysis we want something better:
- Reproducibility: anyone can read the plan and understand exactly what was done
- Iteration: you can refine the plan, re-run it, and compare results
- Documentation: the plan itself becomes part of your analysis record
Instead of ad-hoc prompts, we’ll write plan files—structured Markdown documents that describe what we want the agent to do. The agent reads the plan and produces a Jupyter notebook as output.
The dataset: Anscombe’s quartet
Anscombe’s quartet is a famous set of four datasets that have nearly identical summary statistics (mean, variance, correlation, regression line) but look completely different when plotted. It’s a classic demonstration of why visualization matters.
Step 1: Download the data
Open a second terminal in the Codespace so you can keep Claude Code running. Click the “+” button in the top-right corner of the terminal panel, or use the shortcut Ctrl+Shift+`. Then download the data:
curl -o anscombe_quartet.tsv https://raw.githubusercontent.com/nekrut/bda/main/data/anscombe_quartet.tsvTake a look at what we downloaded:
head anscombe_quartet.tsvYou should see three columns: dataset, x, and y, with four groups (I, II, III, IV).
Step 2: Write a plan file
We’ll write the plan together in class. Create a file called anscombe_plan.md in the Codespace editor and describe what you want the analysis to do. Think about:
- What summary statistics to compute for each group
- What visualizations to create
- What the output should be (a Jupyter notebook)
Step 3: Execute the plan
Switch back to the terminal running Claude Code and feed it your plan:
Read anscombe_plan.md and execute it. Create a Jupyter notebook with the complete analysis.Watch what Claude Code does—it reads the plan, writes code, and produces the notebook.
Step 4: Review and iterate
Open the generated notebook and check:
- Are the summary statistics correct?
- Do the plots look right?
- Is the interpretation reasonable?
If something needs fixing, update the plan file and re-run. This is the power of plan-based workflows: you refine the specification, not the code.
The plan file approach works for any analysis, not just Anscombe’s quartet. For your own research, write a plan describing your data, your questions, and your expected outputs. Then let the agent do the implementation while you focus on the science.
Literature research with agentic tools
The challenge
Finding and summarizing relevant papers is one of the most time-consuming parts of research. An agentic tool can help—but only if you verify everything it produces.
In this example, we’ll research recent papers on a topic of your choice. I’ll demonstrate with the CDKN2A gene across mammals—a tumor suppressor gene (also known as p16) that plays a critical role in cell cycle regulation. Studying it across species helps us understand cancer resistance in long-lived mammals.
Step 1: Write a literature search plan
We’ll write this plan together in class. Create a file (e.g., lit_search_plan.md) describing:
- What topic you’re searching for
- What search terms to use
- What information to extract per paper (title, authors, DOI, summary)
- What verification steps to perform
The key requirement: every paper found must be verified. The plan should explicitly instruct the agent to confirm that DOIs resolve, titles match, and authors are correct.
Step 2: Execute the plan
Feed the plan to Claude Code and let it search for papers and create a literature review document.
The agent will find papers and write summaries. Some will be real. Some might be hallucinated—fake papers with plausible-sounding titles and author names that don’t actually exist.
You must check every single reference. Click every DOI. Confirm every title. This is not optional—citing a non-existent paper in a thesis or publication is a serious academic integrity issue.
Step 3: Validate the output
For each paper in the generated review:
1. Click the DOI or URL — does it lead to a real paper?
2. Does the title match?
3. Are the authors correct?
4. Does the summary accurately reflect the paper’s findings?
Mark any discrepancies. If a paper doesn’t exist, delete it. If details are wrong, correct them.
This verification step is not a one-time annoyance—it’s the core skill. The agent does the tedious initial search. You do the critical thinking. This division of labor is what makes agentic tools useful rather than dangerous.
Summary
What we covered today
- Agentic concepts: skills, MCP servers, and slash commands—how agents extend their capabilities
- AI safety: the current moment, risks of agentic AI, and why verification is non-negotiable
- Plan-based data analysis: writing structured plans for reproducible analysis with Anscombe’s quartet
- Literature research: using agents to find papers while maintaining academic rigor through verification
Key takeaways
- Plan files give you reproducibility and documentation for free—use them instead of ad-hoc prompts
- MCP servers let agents connect to external services through a universal protocol
- Skills and slash commands encode best practices so agents behave consistently
- Never trust agent output in research contexts without independent verification
- The agent is a collaborator, not an authority—you are always responsible for the final result