Scraping with Nokogiri for dynamic page(?)

Ruby 1.9

I’m trying to scrape a part of a web page,

http://www3.nhk.or.jp/nhkworld/chinese/top/index.html

(excuse me, it’s an unknown language for most of you. It’s a chinese
page of Japanese news site)

I hope you can see the portion which I want in the attached file.

the Xpath for the portion should be

/html/body[@id=‘nhkworld-language-template-index’]/div[@id=‘mainBox’]/div[@id=‘mainBoxL’]/div[@id=‘news’]/h2/span[@class=‘update’]

the code would be

url_date = “http://www3.nhk.or.jp/nhkworld/chinese/top/index.html
doc_init = Nokogiri::HTML(open(url_date))
date =
doc_init.xpath(“/html/body[@id=‘nhkworld-language-template-index’]/div[@id=‘mainBox’]/div[@id=‘mainBoxL’]/div[@id=‘news’]/h2/span[@class=‘update’]”)
p date.text()

But it does not get anything. The expected outcome is something like

更新 6月12日 21:34(日本时间)

showing the date and time of update, which of course varies depending on
when you execute it.

Looking at the source of this page at line 96,

新闻

is the place. It seems like this javascript file, ‘update_news.js’,
gets the date and time dynamically.

Is there anyway to get the particular portion of this page?

soichi

Have you looked at the file?

http://www3.nhk.or.jp/nhkworld/chinese/top/update_news.js

It basically just writes out the date; just get it from there.

– Matma R.

thanks, that was simple.

soichi