Ruby 1.9
I’m trying to scrape a part of a web page,
http://www3.nhk.or.jp/nhkworld/chinese/top/index.html
(excuse me, it’s an unknown language for most of you. It’s a chinese
page of Japanese news site)
I hope you can see the portion which I want in the attached file.
the Xpath for the portion should be
/html/body[@id=‘nhkworld-language-template-index’]/div[@id=‘mainBox’]/div[@id=‘mainBoxL’]/div[@id=‘news’]/h2/span[@class=‘update’]
the code would be
url_date = “http://www3.nhk.or.jp/nhkworld/chinese/top/index.html”
doc_init = Nokogiri::HTML(open(url_date))
date =
doc_init.xpath(“/html/body[@id=‘nhkworld-language-template-index’]/div[@id=‘mainBox’]/div[@id=‘mainBoxL’]/div[@id=‘news’]/h2/span[@class=‘update’]”)
p date.text()
But it does not get anything. The expected outcome is something like
更新 6月12日 21:34(日本时间)
showing the date and time of update, which of course varies depending on
when you execute it.
Looking at the source of this page at line 96,
is the place. It seems like this javascript file, ‘update_news.js’,
gets the date and time dynamically.
Is there anyway to get the particular portion of this page?
soichi