Python syntax reference
Python syntax reference
General notes I keep for stdlib data work: parsing, grouping, counting, sorting, aggregation, dates.
File IO
with open("data.csv") as f:
text = f.read() # whole file as one string
lines = f.read().splitlines() # (or) list of lines, no trailing \n
with open("data.csv") as f:
for line in f: # iterate line by line, line keeps trailing \n
line = line.strip()
with open("out.txt", "w") as f: # "w" write, "a" append
f.write("hello\n")with auto-closes the file. f.read() = all of it. f.readlines() = list with \n kept. Iterating f is memory-friendly for big files.
Approach for a data problem
- Confirm the input shape (list of dicts? CSV string? list of strings?).
- Note the edge cases up front: empty input, ties, missing fields, malformed rows.
- Say the plan in one sentence before writing: parse, then group, then aggregate, then sort/select.
- Talk through each step while writing it.
Parse a CSV string
lines = data.strip().splitlines()
header = lines[0].split(",")
for line in lines[1:]:
row = line.split(",")
record = dict(zip(header, row))
record["amount"] = int(record["amount"])First line is the header. dict(zip(header, row)) builds the record. Cast numeric fields. Free text: line.split(). Key=value: split(";") then split("=").
sorted
sorted(xs, key=lambda x: x["amount"])reverse=True for descending. key=lambda x: (-x["amount"], x["name"]) for desc-then-asc. key=lambda w: w.lower() for case-insensitive.
defaultdict
from collections import defaultdict
d = defaultdict(int)
d["a"] += 5defaultdict(int) starts at 0. defaultdict(list) starts at [] (use .append). defaultdict(lambda: defaultdict(int)) for two keys.
Counter
from collections import Counter
c = Counter(["a", "b", "a"])
c.most_common(2)Counter(list) counts occurrences. Counter(dict) wraps existing counts. .most_common(k) returns top k as [(item, count), ...].
statistics
import statistics
statistics.mean(xs)
statistics.median(xs)
statistics.stdev(xs)
sum(xs)
min(xs)
max(xs)
round(x, 2)
len(xs)datetime
from datetime import datetime, timedelta
now = datetime.now() # current time
dt = datetime.fromisoformat("2026-05-28T09:00:00") # from ISO string
dt = datetime(2026, 5, 28, 9, 30) # from numbers: y, m, d, h, min
dt = datetime.strptime("05/28/2026", "%m/%d/%Y") # from non-ISO string
now - timedelta(days=30) # point minus duration -> point (30 days ago)
now + timedelta(hours=2) # point plus duration -> point
gap = dt2 - dt1 # point minus point -> duration (timedelta)
gap.days # whole days in the duration
gap.total_seconds() # whole duration as seconds
dt < other_dt # datetimes compare directly
dt.year dt.month dt.day dt.hour dt.weekday() # weekday: 0=Montimedelta = a duration (a length of time). datetime = a point in time. Add a timedelta to a datetime to anchor it back to a point.