How to get all image, pdf and other files links from a website?

dubstep · January 4, 2012, 12:26pm

I have to develop an application which fetches all the images, pdf, cgi,
etc. file extension links from website.

Can anybody guide me from where should I begin?

cyber_y · January 4, 2012, 12:46pm

You can find usefully information at

Specially Mechanize

[]'s

Felipe Fontoura
Eng. de Computao
http://www.felipefontoura.com

2012/1/4 cyber y. [email protected]

cyber_y · January 4, 2012, 1:11pm

Well wget has a mirror mode that will clone a website

or you could look at nutch (Home - NUTCH - Apache Software Foundation) which is a
web crawler for building searches.