TL;DR You can get a narrowed Google News RSS feed of aggregated news: search by keyword, geo position, time range, topic, etc. You just need to know the syntax. Unfortunately, Google does not provide any official documentation, so we'll try to fill the gap.
We are open-sourcing a lot of our work and building our company in public. In this post, we would like to share all of our findings of Google News RSS feed (which appeared to be much more useful than we initially thought).
How can I use Google News RSS?
- To integrate it into your RSS feed reader
- Web scraping, or maybe "smart web scraping". Google's RSS feed contains the same data as Google News UI version (except the thumbnail image); however, it is:
- much easier to scrape
- the RSS page is super light
- you're not getting blocked for doing many requests (not that fast as with UI)
4 types of Google News RSS
There are 4 main feeds that could be generated. Here are one-liners for each one:
Top headlines - get the latest trending news headlines for your country.
Headlines by topic - get the latest topic-oriented news headlines for your country.
Location headlines - get the latest location-oriented news headlines (city, state, country, etc).
News by your search criteria - use the full power of the most advanced search engine: search by keywords, websites, dates, or any of these combined.
Common things through all Google News RSS feed types
- 100 articles max - no matter what you want to do, one call to Google's RSS will not give you more than 100 articles per one search.
- Country & language - not all countries & languages are supported. To check the available country & language combinations check the bottom left of the Google News UI
- Google News RSS URL always starts by https://news.google.com/rss
1. Top Headlines
Copy-paste https://news.google.com/rss in your browser and you will be forwarded to the main Google News feed for your country & language. If it is the US then most likely you'll end up with:
ceid: country: language
You can modify these to change the feed to your country and language.
And, that is pretty much all you do to get the latest headlines in RSS.
2. Headlines By Topic
Accepted topics are:
For each allowed country+language combination you can get these topic-oriented feeds.
US-English BUSINESS topic example:
To break it down:
- the base part:
- topic part:
- parameters (only the country-language):
Just change the
<TOPIC> part to any of the 8 allowed topics to get specialized feeds.
Yes, there are "hidden" topics. If you already tried to insert the "BUSINESS" url from the section above in your browser, you might have noticed that it is being forwarded to another URL:
To break it down:
- the base part:
- a different topic part:
- "mysterious" topic hash:
- parameters (only the country-language):
Apparently, this hash string (
CAAqKggKIiRDQkFTRlFvSUwyMHZNRGx6TVdZU0JXVnVMVlZUR2dKVlV5Z0FQAQ) is what is
BUSINESS topic is for Google News
Initially, my thought was that those 8 topics are "special" because they work for all country & language combinations while others are not. But, what works for one language seems to work for all others.
You can go to UI version of Google News; start typing something into the console. If what you are searching for is available as a theme then you just can copy its topic hash and use it within RSS.
So, our US election oriented RSS URL will look like:
3. Location Headlines
Find news that talks about a specific place.
US-English New York example:
https://news.google.com/rss/headlines/section/geo/NY?hl=en-US&gl=US&ceid=US:en https://news.google.com/rss/headlines/section/geo/New York?hl=en-US&gl=US&ceid=US:en https://news.google.com/rss/headlines/section/geo/NewYork?hl=en-US&gl=US&ceid=US:en
All of the above 3 links will be redirected to:
Therefore, locations are also topics, however, Google will help you find it even when you're using the RSS!
Once again, you may copy the topic hash string, and use it for any country & language combination.
4. Advanced Search
Everything up to this point was more or less known when we started our "investigation". This part is 90-95% of the time spent to figure out what we could actually achieve with Google News RSS feed.
In short, you can search for news indexed by Google's engine within RSS. It is a big deal because you can web scrape news links from Google by loading a 30KB RSS web page instead of a 1MB+ UI version of it.
Let's start with a simple search. Let's say we want to read the latest articles about Elon Musk:
q=Elon%20Musk is the part we are interested in.
q parameter advanced options
4.1.1 Boolean OR Search
] - the default behavior for Google News RSS is to put
AND between each term you put into q parameter. So,
Elon Musk is actually
Elon AND Musk if you want to search for at least one should match you should use
OR parameter. For example, to search for articles that mention SpaceX or Boeing:
q=SpaceX%20OR%20Boeing (q=SpaceX OR Boeing)
4.1.2. Exact Match (
"your exact match search") - use quotes to perform exact match querying. Must use when working with company names, persons, and places.
4.1.3. Exclude Query Term [
"The exclude (
-) query term restricts results for a particular search request to documents that do not contain a particular word or phrase. To use the exclude query term, you would preface the word or phrase to be excluded from the matching documents with "-" (a minus sign).
4.1.4. Include Query Term [
"The include (
+) query term specifies that a word or phrase must occur in all documents included in the search results. To use the include query term, you would preface the word or phrase that must be included in all search results with "+" (a plus sign).
The URL-escaped version of
+ (a plus sign) is %2B"
4.2. Advanced search with a time range
I mentioned before that Google News RSS page can return only up to 100 results. So, if you want to scrape some data for your project you would need more than that. How? By iterating your query by some time range.
after parameters will allow you to search by date. Unfortunately, you can narrow down your search only by day (not time allowed). So, if there are more than 100 articles that match your query you will not be able to find them.
For example, if we want to find articles about Boeing for the first of July, 2020:
The query part:
You can also use one of two to make open-ended time searches.
when parameter sets the time range for the published datetime. I could not find any documentation regarding this option, but here is what I deducted:
hfor hours. (For me, worked for up to
when=12hwill search for only the articles matching the
searchcriteria and published for the last 12 hours
mfor month (For me, worked for up to
For example, all articles about Boeing for the past hour:
4.3. Not just the
allintext: query term requires each document in the search results to contain all of the words in the search query in the body of the document. The query should be formatted as
allintext: followed by the words in your search query.
If your search query includes the
allintext: query term, Google will only check the body text of documents for the words in your search query, ignoring links in those documents, document titles and document URLs."
: query term restricts search results to documents that contain a particular word in the document title. The search query should be formatted as
intitle:WORD with no space between the intitle: query term and the following word."
allintitle: query term restricts search results to documents that contain all of the query words in the document title. To use the
allintitle: query term, include "allintitle:" at the start of your search query.
allintitle: at the beginning of a search query is equivalent to putting intitle: in front of each word in the search query."
inurl: query term restricts search results to documents that contain a particular word in the document URL. The search query should be formatted as
inurl:WORD with no space between the inurl: query term and the following word"
allinurl: query term restricts search results to documents that contain all of the query words in the document URL. To use the
allinurl: query term, include allinurl: at the start of your search query.
We published a Python library that does all the job for you: