Summary
The “ArXiv Watchdog” is a lightweight Python application designed to monitor the arXiv RSS feed for new research papers matching user-defined keywords and automatically notify you via Telegram. Running as a simple background process on any Linux machine, it ensures you never miss important developments in fields like whole brain emulation, connectomics, and high-resolution neural mapping—streamlining your daily literature surveillance into an automated, hands-free workflow.
Project Description
This project consists of a single Python script (main.py) that:
Fetches the Computer Vision RSS feed (cs.CV) from arXiv at regular intervals.
Parses each entry and filters by a customizable list of keywords.
Sends a Telegram message for every newly detected paper, including the title and abstract link.
Downloads the corresponding PDF into a local folder for offline reading.
Dependencies are minimal—just requests and feedparser—and it leverages the standard Telegram Bot API for notifications. The entire setup can be deployed in minutes and maintained as a background process via nohup or similar tools.
Purpose
Automate Awareness: Replace manual RSS checks with instant alerts, freeing you from repeatedly visiting arXiv.
Focus on Relevance: Only get notified about papers that match your research criteria, reducing noise and information overload.
Seamless Integration: Leverage Telegram—on desktop or mobile—as your central notification hub, without needing complex infrastructure.
Offline Access: Automatically download PDFs so you can review new papers anytime, even without internet.
Linux machine with Python 3.8+ installed
A Telegram Bot Token, obtained via @BotFather
Your personal Chat ID (use @userinfobot or the getUpdates API to retrieve it)
Basic command-line proficiency (bash)
# Create project directory
mkdir -p ~/arxiv-watchdog
cd ~/arxiv-watchdog
# (Optional) Create and activate a virtual environment
python3 -m venv venv
source venv/bin/activate
pip install requests feedparser
requests: HTTP client for fetching RSS feeds and PDFs
feedparser: XML/RSS parsing
Populate ~/arxiv-watchdog/main.py with the following:
import os
import time
import requests
import feedparser
# ─── Configuration ─────────────────────────────────────────────────────────
TELEGRAM_TOKEN = "YOUR_TELEGRAM_BOT_TOKEN"
CHAT_ID = "YOUR_CHAT_ID"
KEYWORDS = [
"whole brain emulation",
"connectomics",
"high-resolution",
"molecular-resolution"
]
DOWNLOAD_DIR = "papers"
os.makedirs(DOWNLOAD_DIR, exist_ok=True)
# ─── Telegram Notification Helper ──────────────────────────────────────────
def send_telegram(message: str):
url = f"https://api.telegram.org/bot{TELEGRAM_TOKEN}/sendMessage"
payload = {"chat_id": CHAT_ID, "text": message}
try:
requests.post(url, json=payload, timeout=10)
except Exception as e:
print(f"[ERROR] Telegram send failed: {e}")
# ─── RSS Scraper Function ──────────────────────────────────────────────────
def fetch_arxiv():
"""
Fetches arXiv RSS for cs.CV and yields (title, link) pairs
matching any keyword.
"""
rss_url = "https://rss.arxiv.org/rss/cs.CV"
response = requests.get(rss_url, timeout=10)
if response.status_code != 200:
print(f"[ERROR] RSS HTTP {response.status_code}")
return
feed = feedparser.parse(response.content)
for entry in feed.entries:
title_lower = entry.title.lower()
if any(kw in title_lower for kw in KEYWORDS):
yield entry.title, entry.link
# ─── PDF Download Function ─────────────────────────────────────────────────
def download_pdf(arxiv_link: str):
"""
Given an arXiv abstract link, constructs the PDF URL and downloads it.
"""
pdf_url = arxiv_link.replace("/abs/", "/pdf/") + ".pdf"
response = requests.get(pdf_url, timeout=20)
if response.status_code == 200:
filename = arxiv_link.split("/")[-1] + ".pdf"
path = os.path.join(DOWNLOAD_DIR, filename)
with open(path, "wb") as f:
f.write(response.content)
# ─── Main Logic ────────────────────────────────────────────────────────────
seen = set()
def main():
"""
Checks for new matching papers and:
1. Sends a Telegram alert for each new paper.
2. Downloads its PDF into DOWNLOAD_DIR.
"""
global seen
for title, link in fetch_arxiv() or []:
if link not in seen:
seen.add(link)
send_telegram(f"🔔 New paper: {title}\n{link}")
download_pdf(link)
# ─── Continuous Watchdog Loop ─────────────────────────────────────────────
if __name__ == "__main__":
# Adjust interval as needed (e.g., 2 hours = 2*3600)
INTERVAL_SECONDS = 2 * 3600
while True:
main()
time.sleep(INTERVAL_SECONDS)
To keep the script running even after closing your shell:
cd ~/arxiv-watchdog
nohup python3 main.py > watchdog.log 2>&1 &
nohup ensures the process survives logout.
& sends it to the background.
watchdog.log captures all stdout/stderr for later inspection.
Verify process is running
ps aux | grep main.py
Follow logs in real-time
tail -f ~/arxiv-watchdog/watchdog.log
To stop the watcher, find its PID (ps aux | grep main.py) and:
kill <PID>
Adjust the polling interval by changing INTERVAL_SECONDS.
Alter the RSS category by modifying rss/arxiv.org/rss/cs.CV (e.g., cs.NE for Neural & Evolutionary Computing).
Expand keywords or load them from a config file.
Add more feeds by calling fetch_arxiv() for multiple URLs.
Improve formatting (e.g., Markdown, emojis) to suit your Telegram preferences.