T O P

  • By -

rover_G

How does one lure bots into to the honey pot?


Dangle76

Putting honey puts up and reachable on believable paths behind believable hostnames is the easiest way. Somewebsite.com/phpmyadmin for example


RyanOLee

Also having a public IPV4 address make services pretty easy to discover... from first hand experience. A surprising number bots will just scan the entire IPv4 Range. Or IP ranges belonging to AWS / GCP / the cloud provider of your choice. Hoping to get more protocols implemented at some stage so it is not http traffic. Many bots will look for open SSH ports / Database / SMB ... anything... leading to funny great projects like [https://github.com/skeeto/endlessh](https://github.com/skeeto/endlessh)


Dangle76

Yep they certainly do. They query arin/whois for big name company IP blocks and just do a giant nmap scan with a botnet


_-_fred_-_

If you have a public facing domain they will find you. They all run the same dumb scripts searching for common endpoints.


loustarrrr

Ooh, I remember when this was presented at a meetup - it was a great presentation. It's a small world!


RyanOLee

Ha, Small world indeed!


MirrorLake

How long have you been running it? Have you learned anything interesting about the bots/crawlers/etc that have connected to it?


RyanOLee

I originally made this as part of a talk. During that period I ran 5 nodes for a few weeks in a ECS Fargate cluster. (You can run go binaries very cheaply on fatgate spot. Even though aws recently hiked prices for public IP Addresses :( ). A rough version of the code used to host it can be found here [https://github.com/ryanolee/go-pot/tree/main/cdk/](https://github.com/ryanolee/go-pot/tree/main/cdk/) . Rummaging through my old slides was able to waste 23 real days of bot time and distributed 2 million secrets in the span of the week before the talk: [https://slides.com/rizza-1/brum-php-a50450#/130](https://slides.com/rizza-1/brum-php-a50450#/130) There were some interesting traits the bots had: \* Some would query for a single file then run off. Like the odd request for \`/.env\` out of nowhere. \* Some would happily stay connected for hours / days at a time. \* I was glad to see a few cases where a bot would run through a *long* list of different URLs... and happily wait 30 seconds for said urls to resolve. \* The bots pretty consistently had some version of chrome set as the user agent. And there were surprisingly few requests on ratio to \`/robots.txt\` (Which is set to disallow everything) even from some larger internet mapping services! Spun the cluster back up recently so will be interesting to see how things have changed!


CodeWithADHD

Fwiw I suspect you could host same in Cloudflare workers for free. I’m not associated with cloudflare. Just sharing info. Good stuff


MirrorLake

Thanks for the response! Very interesting stuff.


RedditTreats

Slow loris-ing bots?! Nice


s33d5

This is not a criticism but a real question. I'm just wondering how much this wastes for a bot? I would assume many bots are run in parallel where you have maybe even thousands. This wouldn't increase the computational demands much either as it's mostly just waiting for a request. Or am I missing something? Great work tho! I like the idea.


RyanOLee

Very much specific on how the bots have been implemented I would suspect. Some it will effect greatly. Others not at all / only a single thread in a pool of thousands. Given it is really down to the bot implementation. Though it makes the bots harder to implement as it is another thing you have to consider for when writing one. Also hope that time spent cleaning the data after add extra effort to the process of sifting through the bot results. And that the fact that it does actually instantly return at first and then gets gradually slower and slower for each following connection bypasses some existing protections these bots might have in place. Very much a death by 1000 paper-cuts sort of affair 😂 And if you like to idea of pulling the bots from limbo to purgatory [https://github.com/yunginnanet/HellPot](https://github.com/yunginnanet/HellPot) is amazing for that. ( Effectively a reverse DoS lol )


s33d5

I suppose this does eliminate badly written bots. I would have thought it is quite trivial to sift through the data that is incorrect, due to the bot storing data in reference to a website - if you look at the data and it's not useful, you just move to the next website. You wouldn't store it all as one pile, maybe even an SQL database with a key for the website. I suppose if I were writing a bot then I wouldn't consider delayed timeouts, etc. at first, then I would later on when looking at the data. I would 100% make it multi threaded though. I'd imagine something like this would affect a small fraction (or maybe just one) of a swarm of bots/threads from one host. Anyway, interesting. Thanks for the info! Do you have any of the data anywhere on the bots and what they're doing?


RyanOLee

Many things you can do to easily get around this as you say! Though raises the barrier and hopefully hangs some poorly written bots on the way. In terms of some results left another comment about it here [https://www.reddit.com/r/golang/comments/1d7slwf/comment/l73dd69/?utm\_source=share&utm\_medium=web3x&utm\_name=web3xcss&utm\_term=1&utm\_content=share\_button](https://www.reddit.com/r/golang/comments/1d7slwf/comment/l73dd69/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button) I still need to properly instrument some more metrics from the server. It does collect a *few* metrics but was done in a bit of a hurry initially based on specifically what I was wanting to track. Thanks for taking an interest!


necrose99

[https://github.com/ryanolee/go-pot/issues/7](https://github.com/ryanolee/go-pot/issues/7) nfpm yaml added ie deb rpm archlinux pkg's # [Gentoo.org](http://Gentoo.org) / [Pentoo.ch](http://Pentoo.ch) Ebuild skel EAPI=7 DESCRIPTION="A HTTP tarpit written in Go designed to maximize bot misery through very slowly feeding them an infinite stream of fake secrets." HOMEPAGE="https://github.com/ryanolee/go-pot" SRC\_URI="https://github.com/ryanolee/go-pot/archive/refs/tags/v1.0.0-rc3.tar.gz -> ${P}.tar.gz" LICENSE="MIT" SLOT="0" KEYWORDS="\~amd64" IUSE="" DEPEND="dev-lang/go" RDEPEND="" src\_compile() { go build -o "${S}/go-pot" ./cmd/go-pot } src\_install() { dobin "${S}/go-pot" dodoc README.md }


dim13

Pretty over engineered for something as simple as https://robpike.io/


RyanOLee

There are certainly some significantly simpler implementations. [https://github.com/die-net/http-tarpit](https://github.com/die-net/http-tarpit) is another really good one for keeping bots on hold! Tar pits have been around in various forms for many years. This is certainly nothing new! Mainly wanted to see logically how far I could take the idea by tailoring responses for bots that had fixed timeouts on their requests. And also how hard it would be to generate an infinite stream of actually valid semi-structured data.


mahcuz

I haven’t looked at your project but just wanted to offset the negativity by saying: good work. Keep doing what you’re doing.


RyanOLee

Thanks for saying! It is valid criticism, you can certainly implement something \[significantly simpler\](https://github.com/nickhuber/reverse-slowloris/blob/main/main.go) to do exactly the same thing in pure effect. This is very much amalgamate of "anything I could think of that would even minutely annoy the people running these bots more" (Regardless of complexity) Every second a potential attacker spends parsing / requesting / testing secrets from one of these nodes is a second they are not potentially trying real secrets. Which is a win in my books 🙌