Advanced Google News scrapers with Python

Free & open-sourced Google News API alternative. Scrape Google News and not got blocked. Collect data for your next news mining project.

Advanced Google News scrapers with Python

We're going to use pygooglenews package that will help us get structured news articles from any Google News page.

Disclaimer: NewsCatcher team has created this Python package. If you want to know more about how this package works, read this article:

Google News RSS. The missing documentation
Get Google News RSS feeds by keyword, geo position, time range, topic. Get RSS feeds for websites that do not support it. Or, scrape google news without limits.

PyGoogleNews package overview

pygooglenews is a python wrapper of the Google News RSS feed.

kotartemiy/pygooglenews
If Google News had a Python library. Contribute to kotartemiy/pygooglenews development by creating an account on GitHub.

In a nutshell, it exploits the fact that Google News data can also be accessed via the RSS: even custom search! Months of testing have shown that Google won't block your IP if you access RSS feed 100k+ times per day. I believe it is because RSS is created to be accessed by other machines. Also, it is a super lightweight page (30 kB compared to 1 Mb+ google news UI).

What data you can access with pygooglenews

  1. Top news
  2. News articles by topic (business, politics, etc)
  3. News articles by town, country, location
  4. News by your custom search

Demo

Installation

pip install pygooglenews --upgrade

1. Top Google News articles

from pygooglenews import GoogleNews

# default GoogleNews instance
gn = GoogleNews(lang = 'en', country = 'US')

top_news = gn.top_news()

To know more about the supported languages and countries, check here.

2. Google News articles by topic

Accepted topics are:

  • WORLD
  • NATION
  • BUSINESS
  • TECHNOLOGY
  • ENTERTAINMENT
  • SCIENCE
  • SPORTS
  • HEALTH
from pygooglenews import GoogleNews

# default GoogleNews instance
gn = GoogleNews(lang = 'en', country = 'US')

business = gn.topic_headlines('BUSINESS')

In addition to these preset topics you may also parse custom ones, such as "COVID-19".  Check more in this part of the documentation.

3. Google News articles by geolocation

from pygooglenews import GoogleNews

# default GoogleNews instance
gn = GoogleNews(lang = 'uk', country = 'UA')

kyiv = gn.geo_headlines('kyiv')
# or 
kyiv = gn.geo_headlines('kiev')
# or
kyiv = gn.geo_headlines('киев')
# or
kyiv = gn.geo_headlines('Київ')

All of the 4 options presented above will return the same news feed about Kyiv, Ukraine. Google News will "autoparse" the place name. It also seems to be language agnostic but it doesn't mean that all places feeds will be present for all languages.

from pygooglenews import GoogleNews

# default GoogleNews instance
gn = GoogleNews(lang = 'en', country = 'US')

# find all latest news about NFT
s = gn.search('NFT')

Here you can pass any keywords that you want.

pygooglenews helps you with all the URL-escaping that is required by Google Newsю

Some advanced search parameters that you might want to add (check this part of the documentation):

  • restrict search to some particular date
  • exclude/include keywords
  • exact match
  • search for keywords to be present in the title

Check advanced examples to have a better understanding.


If you liked this post, or you're using our package, please just share this blog post! This will help us get better SEO.