ArXiv Watchdog - Telegram Bot

Summary

The “ArXiv Watchdog” is a lightweight Python application designed to monitor the arXiv RSS feed for new research papers matching user-defined keywords and automatically notify you via Telegram. Running as a simple background process on any Linux machine, it ensures you never miss important developments in fields like whole brain emulation, connectomics, and high-resolution neural mapping—streamlining your daily literature surveillance into an automated, hands-free workflow.

Project Description

This project consists of a single Python script (main.py) that:

Fetches the Computer Vision RSS feed (cs.CV) from arXiv at regular intervals.
Parses each entry and filters by a customizable list of keywords.
Sends a Telegram message for every newly detected paper, including the title and abstract link.
Downloads the corresponding PDF into a local folder for offline reading.

Dependencies are minimal—just requests and feedparser—and it leverages the standard Telegram Bot API for notifications. The entire setup can be deployed in minutes and maintained as a background process via nohup or similar tools.

Purpose

Automate Awareness: Replace manual RSS checks with instant alerts, freeing you from repeatedly visiting arXiv.
Focus on Relevance: Only get notified about papers that match your research criteria, reducing noise and information overload.
Seamless Integration: Leverage Telegram—on desktop or mobile—as your central notification hub, without needing complex infrastructure.
Offline Access: Automatically download PDFs so you can review new papers anytime, even without internet.

Prerequisites

Linux machine with Python 3.8+ installed
A Telegram Bot Token, obtained via @BotFather
Your personal Chat ID (use @userinfobot or the getUpdates API to retrieve it)
Basic command-line proficiency (bash)

1. Project Setup

# Create project directory

mkdir -p ~/arxiv-watchdog

cd ~/arxiv-watchdog

# (Optional) Create and activate a virtual environment

python3 -m venv venv

source venv/bin/activate

2. Install Dependencies

pip install requests feedparser

requests: HTTP client for fetching RSS feeds and PDFs
feedparser: XML/RSS parsing

3. Create main.py

Populate ~/arxiv-watchdog/main.py with the following:

import os

import time

import requests

import feedparser

# ─── Configuration ─────────────────────────────────────────────────────────

TELEGRAM_TOKEN = "YOUR_TELEGRAM_BOT_TOKEN"

CHAT_ID = "YOUR_CHAT_ID"

KEYWORDS = [

"whole brain emulation",

"connectomics",

"high-resolution",

"molecular-resolution"

]

DOWNLOAD_DIR = "papers"

os.makedirs(DOWNLOAD_DIR, exist_ok=True)

# ─── Telegram Notification Helper ──────────────────────────────────────────

def send_telegram(message: str):

url = f"https://api.telegram.org/bot{TELEGRAM_TOKEN}/sendMessage"

payload = {"chat_id": CHAT_ID, "text": message}

try:

requests.post(url, json=payload, timeout=10)

except Exception as e:

print(f"[ERROR] Telegram send failed: {e}")

# ─── RSS Scraper Function ──────────────────────────────────────────────────

def fetch_arxiv():

"""

Fetches arXiv RSS for cs.CV and yields (title, link) pairs

matching any keyword.

"""

rss_url = "https://rss.arxiv.org/rss/cs.CV"

response = requests.get(rss_url, timeout=10)

if response.status_code != 200:

print(f"[ERROR] RSS HTTP {response.status_code}")

return

feed = feedparser.parse(response.content)

for entry in feed.entries:

title_lower = entry.title.lower()

if any(kw in title_lower for kw in KEYWORDS):

yield entry.title, entry.link

# ─── PDF Download Function ─────────────────────────────────────────────────

def download_pdf(arxiv_link: str):

"""

Given an arXiv abstract link, constructs the PDF URL and downloads it.

"""

pdf_url = arxiv_link.replace("/abs/", "/pdf/") + ".pdf"

response = requests.get(pdf_url, timeout=20)

if response.status_code == 200:

filename = arxiv_link.split("/")[-1] + ".pdf"

path = os.path.join(DOWNLOAD_DIR, filename)

with open(path, "wb") as f:

f.write(response.content)

# ─── Main Logic ────────────────────────────────────────────────────────────

seen = set()

def main():

"""

Checks for new matching papers and:

1. Sends a Telegram alert for each new paper.

2. Downloads its PDF into DOWNLOAD_DIR.

"""

global seen

for title, link in fetch_arxiv() or []:

if link not in seen:

seen.add(link)

send_telegram(f"🔔 New paper: {title}\n{link}")

download_pdf(link)

# ─── Continuous Watchdog Loop ─────────────────────────────────────────────

if __name__ == "__main__":

# Adjust interval as needed (e.g., 2 hours = 2*3600)

INTERVAL_SECONDS = 2 * 3600

while True:

main()

time.sleep(INTERVAL_SECONDS)

4. Run as a Background Service

To keep the script running even after closing your shell:

cd ~/arxiv-watchdog

nohup python3 main.py > watchdog.log 2>&1 &

nohup ensures the process survives logout.
& sends it to the background.
watchdog.log captures all stdout/stderr for later inspection.

5. Monitoring & Management

Verify process is running

ps aux | grep main.py

Follow logs in real-time

tail -f ~/arxiv-watchdog/watchdog.log

To stop the watcher, find its PID (ps aux | grep main.py) and:

kill <PID>

6. Customization Tips

Adjust the polling interval by changing INTERVAL_SECONDS.
Alter the RSS category by modifying rss/arxiv.org/rss/cs.CV (e.g., cs.NE for Neural & Evolutionary Computing).
Expand keywords or load them from a config file.
Add more feeds by calling fetch_arxiv() for multiple URLs.
Improve formatting (e.g., Markdown, emojis) to suit your Telegram preferences.

Page updated

Google Sites

Report abuse