Introduction
Have you heard of the Apple Refurbished Store? It’s Apple’s official channel that sells returned and display items at 15–20% off retail — but inventory appears and disappears without notice. I found myself refreshing the page several times a day looking for a specific item, which led to one thought:
“Why not just automate this?”
That’s how DeReel was born. Short for Data Extraction & REEL Engine, it’s a personal monitoring bot that tracks prices and inventory in real time and sends alerts via Telegram.
This post covers Phase 1-A — how I implemented the Apple Refurbished stock monitoring feature.
Design Goal: No Server, No Cost
My first principle was Phase 1 costs $0/month. Running a persistent server like AWS EC2 for a personal project means paying even on idle days. Looking for alternatives, I landed on GitHub Actions.
- Public repositories get GitHub Actions completely free with unlimited minutes
- Supports cron scheduling for periodic execution
- The runner environment fully supports Python and Playwright
There was one problem to solve. The crawler needs to remember the previous inventory state to detect “newly stocked” items, but GitHub Actions spins up a fresh VM on every run — there’s no persistent state.
The solution turned out to be simple: save state as a JSON file and have GHA auto-commit it. The repository itself acts as the database.
⏰ GitHub Actions (runs every hour)
↓
🐍 Python crawler (runs on runner)
↓
💾 data/apple_refurb_state.json ← previous stock snapshot
↓
📱 Telegram alert (on change only)
↓
🤖 GHA bot auto-commits changed data/
Apple Refurbished Crawler Implementation
The Problem: JavaScript Rendering
The Apple Refurbished page is React-rendered. requests + BeautifulSoup only gets you empty HTML. Playwright is needed to spin up a real browser.
from playwright.async_api import async_playwright
async with async_playwright() as pw:
browser = await pw.chromium.launch(headless=True)
page = await browser.new_page()
await page.goto(url, wait_until="networkidle", timeout=60_000)
html = await page.content()
The wait_until="networkidle" option is critical — the page must fully settle before React finishes rendering data.
Discovery: Bootstrap JSON
I was about to parse the DOM directly when I spotted something far better while analyzing the page source.
<script>
window.REFURB_GRID_BOOTSTRAP = {"tiles": [...], "totalResults": 42, ...};
</script>
This is JSON Apple embeds for page initialization — far more reliable than DOM parsing.
import re, json
_BOOTSTRAP_RE = re.compile(
r"window\.REFURB_GRID_BOOTSTRAP\s*=\s*(\{.+?\});\s*\n",
re.DOTALL
)
def _parse(self, html: str) -> list[StockResult]:
m = _BOOTSTRAP_RE.search(html)
if not m:
raise ValueError("REFURB_GRID_BOOTSTRAP not found — possible page structure change")
tiles = json.loads(m.group(1)).get("tiles") or []
results = []
for tile in tiles:
results.append(StockResult(
site="apple_refurb",
product_id=tile["partNumber"],
name=tile["title"],
url="https://www.apple.com" + tile["productDetailsUrl"].split("?")[0],
price=float(tile["price"]["currentPrice"]["raw_amount"]),
currency=tile["price"].get("priceCurrency", "KRW"),
in_stock=True,
))
return results
Regex vs. BeautifulSoup
When extracting JSON embedded inside a script tag, regex is simpler than BeautifulSoup. That said, if Apple changes the format the regex will silently break — always raise a clear error on parse failure so you know immediately.
Stock Change Detection Logic
Once the crawler returns the current inventory list, the Comparator checks it against the previous snapshot.
async def compare_stock(self, site: str, current: list[StockResult]) -> None:
previous = self._storage.load_state(site) # load previous snapshot
newly_stocked = [
r for r in current
if r.in_stock and not previous.get(r.product_id, False)
]
for result in newly_stocked:
alert_key = f"{site}:{result.product_id}:stock"
if self._alert_history.can_alert(alert_key):
await self._notifier.send(self._format_message(result))
self._alert_history.record(alert_key)
# save current state for the next comparison
self._storage.save_state(site, {r.product_id: r.in_stock for r in current})
The key condition is “currently in stock AND was not in stock before”. Without this check, you’d get an alert every single hour for items that are already available.
24-Hour Alert Deduplication
Sometimes an item goes in stock → out of stock → back in stock all within the same day. Repeated alerts for this would be exhausting. AlertHistory manages a 24-hour cooldown.
def can_alert(self, alert_key: str) -> bool:
record = self._storage.get_alert_record(alert_key)
if record is None:
return True # first-ever alert
last_sent = record.get("last_sent_at")
elapsed = datetime.now(UTC) - last_sent
return elapsed >= timedelta(hours=24)
Cooldown records are also persisted in a JSON file (data/apple_refurb_alerts.json), so they survive across GHA runs.
interval_hours: Controlling Crawl Frequency
The GHA cron runs every hour (0 * * * *), but I only wanted to actually crawl Apple Refurbished every 4 hours. This is controlled via targets.yaml and interval_hours.
# config/stock.yaml
targets:
- site: apple_refurb
interval_hours: 4
url: "https://www.apple.com/kr/shop/refurbished/airpods"
enabled: true
last_crawled = storage.get_last_crawled_at(schedule_key)
if last_crawled:
elapsed_hours = (datetime.now(UTC) - last_crawled).total_seconds() / 3600
if elapsed_hours < interval_hours:
logger.debug(f"[{site}] {elapsed_hours:.1f}h/{interval_hours}h — skipping")
continue
Why GHA cron + interval_hours together?
Different sites warrant different crawl intervals: apple_refurb every 4 hours, steam every 3, gog every 6. Since GHA can’t have multiple cron schedules in a single workflow, the approach is to run at the shortest interval (1 hour) and check elapsed time internally per site.
GitHub Actions Setup
Workflow Essentials
# .github/workflows/crawl.yml
on:
schedule:
- cron: "0 * * * *"
workflow_dispatch: # manual trigger also available
concurrency:
group: dereel-crawlers
cancel-in-progress: false # never cancel in-progress runs (prevents state corruption)
cancel-in-progress: false matters here. If the next cron fires while the previous run is still going, it should not cancel it — a mid-run cancellation during JSON reads/writes can corrupt state.
Auto-committing State Files
- name: Commit state files
run: |
git config user.name "github-actions[bot]"
git config user.email "github-actions[bot]@users.noreply.github.com"
git add data/
git diff --cached --quiet || (
git commit -m "chore: update state [skip ci]" &&
git pull --rebase origin main &&
git push
)
[skip ci] is a convention to prevent the commit from triggering another GHA run.
Consecutive Failure Detection
If the crawler fails 3 times in a row, a Telegram alert fires.
except Exception as e:
failures_count = storage.increment_failures(site, str(e))
logger.error(f"[{site}] crawl failed ({failures_count} consecutive) — {e}")
if failures_count >= 3:
await notifier.send(
f"🚨 [DeReel Alert] Consecutive crawler failures\n"
f"Site: {site}\nError: {e}\nCount: {failures_count}"
)
Failure counts are stored in data/crawl_schedule.json, so the system self-monitors without any external service.
Results
In practice, I got a Telegram notification at 6 AM that AirPods refurbished stock had dropped, and was able to buy immediately. Mission accomplished.
One thing worth noting from production: the REFURB_GRID_BOOTSTRAP data has been remarkably stable. As long as Apple doesn’t make major page structure changes, the parser holds up.
Phase 1-A Summary:
| Item | Details |
|---|---|
| Infrastructure cost | $0 / month |
| Crawl interval | 4 hours |
| Alert channel | Telegram |
| State storage | GitHub repo JSON |
| Alert deduplication | 24-hour cooldown |
Up Next
In Phase 1-B, I added Steam, GOG, and Epic price monitoring. I’ll share how I worked around Steam’s bundle API returning 403 with HTML scraping, and the NoneType runtime error I hit in Epic’s free game API.
Source code is available on GitHub.