Scraping with Nokogiri for dynamic page(?)

aris · June 13, 2012, 5:23am

Ruby 1.9

I’m trying to scrape a part of a web page,

http://www3.nhk.or.jp/nhkworld/chinese/top/index.html

(excuse me, it’s an unknown language for most of you. It’s a chinese
page of Japanese news site)

I hope you can see the portion which I want in the attached file.

the Xpath for the portion should be

/html/body[@id=‘nhkworld-language-template-index’]/div[@id=‘mainBox’]/div[@id=‘mainBoxL’]/div[@id=‘news’]/h2/span[@class=‘update’]

the code would be

url_date = “http://www3.nhk.or.jp/nhkworld/chinese/top/index.html”
doc_init = Nokogiri::HTML(open(url_date))
date =
doc_init.xpath(“/html/body[@id=‘nhkworld-language-template-index’]/div[@id=‘mainBox’]/div[@id=‘mainBoxL’]/div[@id=‘news’]/h2/span[@class=‘update’]”)
p date.text()

But it does not get anything. The expected outcome is something like

更新 6月12日 21:34（日本时间）

showing the date and time of update, which of course varies depending on
when you execute it.

Looking at the source of this page at line 96,

is the place. It seems like this javascript file, ‘update_news.js’,
gets the date and time dynamically.

Is there anyway to get the particular portion of this page?

soichi

soujiro0725 · June 13, 2012, 4:09pm

Have you looked at the file?

http://www3.nhk.or.jp/nhkworld/chinese/top/update_news.js

It basically just writes out the date; just get it from there.

– Matma R.

soujiro0725 · June 14, 2012, 2:14am

thanks, that was simple.

soichi