2010年6月30日水曜日

wget で Web サイトの情報を全てダウンロードする方法

いつか使うかもしれないのでメモ。

$ wget \
--recursive \
--no-clobber \
--page-requisites \
--html-extension \
--convert-links \
--restrict-file-names=windows \
--domains website.org \
--no-parent www.website.org/tutorials/html/
オプション説明
--recursiveDownload the entire Web site.
--no-clobberDon't overwrite any existing files
(used in case the download is interrupted and resumed).
--page-requisitesGet all the elements that compose the page (images, CSS and so on).
--html-extensionSave files with the .html extension.
--convert-linksConvert links so that they work locally, off-line.
--restrict-file-names=windowsModify filenames so that they will work in Windows as well.
--domains website.orgDon't follow links outside website.org.
--no-parentDon't follow links outside the directory tutorials/html/.

参考:
http://www.linuxjournal.com/content/downloading-entire-web-site-wget

● wget, web, 全て, ダウンロード, download, all, entire, スクレイピング

0 件のコメント: