You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
|
4 years ago | |
---|---|---|
.gitignore | 4 years ago | |
README.md | 4 years ago | |
main.go | 4 years ago |
README.md
webhazer-spider
Spider Module for the webhazer project.
The module works using the following concept:
- Get the current page from the queue.
- Search all html links (
<a> ... </a>
) in the current page. - Find the
href
key and extract the attribute (the url). - Submit the found url to the queue.
Using this method, all the links on one page can be appended to the queue and the directory structure that lies behind the page can be partially disclosed.
In order to prevent loops, the queue should automatically find out if the entry that is about to get added is already present. If so, the url should not be appended.
###reqs:
$ go get golang.org/x/net/html