**pamela** @pamela@bsd.network · 2022-11-04T22:06:13Z

pamela @pamela@bsd.network

Hacky folks, please resist finding ways to scrape the fediverse, build archives, automate tools and connect to people via bot without their consent.

Just resist the urge. Because you're not going to think to check for robots.txt, but you ought to, that's how we communicate we don't want to be involved. You're probably not going to bother to read the various terms of service for the instances you touch, many of which explicitly ask you not to do any scraping or automated activity beyond normal use of the service. You're not going to know to respect people using the NoBot hashtag that was meant to prevent automated follows. You're certainly not going to parse user profiles and pinned posts to learn how people like to be approached, what will get you blocked, or even think to heavily throttle your activity because instances are falling over in response to load.

Whatever your thing is, make it 100% opt-in. Make it appropriate for a significantly more at-risk user than you are. Make sure it forgets things, purges info about servers it can't contact, can't operate in any sort of logged-in mode where consent is an issue.

We will straight up help advertise your cool thing if it respects users properly and takes the time to consider the safety and preferences of every person involved. There are a lot of fun, thoughtfully-designed toys! And there are a lot of people really tired of having to come and tell you off when you wanted to help, honestly. Help yourself and ask around before you flip on your cool new thing, let folks point out what you're missing.

Nov 04, 2022, 22:06 · · · ·

**cdc** @cdc@ioc.exchange · Nov 04, 2022, 22:08

**cdc** @cdc@ioc.exchange · Nov 04, 2022, 22:08

Nov 04, 2022, 22:08

cdc @cdc@ioc.exchange

Hacky folks, please resist finding ways to scrape the fediverse, build archives, automate tools and connect to people via bot without their consent.

@pamela There is a setting to prevent indexing by search engines, and there are options to make your timeline private. Is that not sufficient to gauge consent?

**pamela** @pamela@bsd.network · Nov 04, 2022, 22:29

**pamela** @pamela@bsd.network · Nov 04, 2022, 22:29

Nov 04, 2022, 22:29

pamela @pamela@bsd.network

re: Hacky folks, please resist finding ways to scrape the fediverse, build archives, automate tools and connect to people via bot without their consent.

@cdc historically, many just do not bother to check. Or users don't have control of things they think they do to express a desire to opt-out of things. Like rss feeds, they go out even if you've turned off indexing, even when the local timeline is hidden. Public entries only, so people think who cares, right? Well, the people who don't know those exist sure might, or the people whose post and account deletions aren't being honored because someone didn't think about the possibility that someone might badly need to delete specific information for their safety.

It's just a lot of potential nasty surprises that are totally avoidable with a little warning and review from the outside.

**cdc** @cdc@ioc.exchange · Nov 04, 2022, 22:35

**cdc** @cdc@ioc.exchange · Nov 04, 2022, 22:35

Nov 04, 2022, 22:35

cdc @cdc@ioc.exchange

re: Hacky folks, please resist finding ways to scrape the fediverse, build archives, automate tools and connect to people via bot without their consent.

@pamela Not being able to ensure deletion across the fediverse is a known problem and a major drawback for sure.

**maya ⛓️** @maya@occult.institute · Nov 04, 2022, 22:28

**maya ⛓️** @maya@occult.institute · Nov 04, 2022, 22:28

Nov 04, 2022, 22:28

maya ⛓️ @maya@occult.institute

@cdc @pamela No, it is not. You can go ahead and write "people are going to do this eventually, like it or not", but you're going to have to live with: not! People may not like it! That's kind of what's being expressed here – and ignoring that people are expressing their opt-out in things like instance ToS/CoC, in ways other than the ways one finds convenient... is IMO at very least a jerk move and probably violating some European law or other.

**cdc** @cdc@ioc.exchange · Nov 04, 2022, 22:32

**cdc** @cdc@ioc.exchange · Nov 04, 2022, 22:32

Nov 04, 2022, 22:32

cdc @cdc@ioc.exchange

@maya Darn, I know people not liking thing goes a long way to preventing someone from doing something... but are the existing private/indexing opt-outs insufficient? Ignoring those controls would definitely constitute a "dick move."

**maya ⛓️** @maya@occult.institute · Nov 04, 2022, 22:39

**maya ⛓️** @maya@occult.institute · Nov 04, 2022, 22:39

Nov 04, 2022, 22:39

maya ⛓️ @maya@occult.institute

@cdc FWIW the history here is valuable : https://www.sunclipse.org/wp-content/downloads/2020/01/open-letter.html

I think it's accurate to say that the ambient attitude (not just pamela's view) is that stuff should be pretty specific and opt-in. IMO it will suck if people have to nuke discoverability of their instances to avoid arbitrarily invasive automated scraping projects that decide opt-in is too much overhead.

**cdc** @cdc@ioc.exchange · Nov 04, 2022, 22:44

**cdc** @cdc@ioc.exchange · Nov 04, 2022, 22:44

Nov 04, 2022, 22:44

cdc @cdc@ioc.exchange

@maya I've talked a bunch about where the real test for Mastodon will be adversarial pressure and this seems like a great example. When someone you don't want is going to scrape your content, what can you do?

Does Mastodon make it clear that you MUST opt-out for any attempt at privacy? Much like it should ditch DMs because of how woeful the security implications are? That is shouldn't even attempt to imply that DMs have any kind of privacy from instance owners?

**maya ⛓️** @maya@occult.institute · Nov 04, 2022, 22:50

**maya ⛓️** @maya@occult.institute · Nov 04, 2022, 22:50

Nov 04, 2022, 22:50

maya ⛓️ @maya@occult.institute

@cdc Privacy isn't binary. Asserting that something is wrong doesn't imply naivete about its remaining technically possible.

**Phil Landmeier** @shuttersparks@yiff.co · Nov 05, 2022, 00:46

**Phil Landmeier** @shuttersparks@yiff.co · Nov 05, 2022, 00:46

Nov 05, 2022, 00:46

Phil Landmeier @shuttersparks@yiff.co

Hacky folks, please resist finding ways to scrape the fediverse, build archives, automate tools and connect to people via bot without their consent.

@pamela You can be sure this has been done since day one by governments and big data corporations.

**Daniel Bohrer** @daniel_bohrer@chaos.social · Nov 05, 2022, 01:16

**Daniel Bohrer** @daniel_bohrer@chaos.social · Nov 05, 2022, 01:16

Nov 05, 2022, 01:16

Daniel Bohrer @daniel_bohrer@chaos.social

Hacky folks, please resist finding ways to scrape the fediverse, build archives, automate tools and connect to people via bot without their consent.

@shuttersparks @pamela yes. We can be sure because we've already seen it happen, a few years ago. Not by big corporations, or governments, but by simple people. And it was hell of a debate, and a lot of trust was lost in the community.

**The Real Coffee** @coffee@kakafe.ga · Nov 05, 2022, 21:35

**The Real Coffee** @coffee@kakafe.ga · Nov 05, 2022, 21:35

Nov 05, 2022, 21:35

The Real Coffee @coffee@kakafe.ga

Hacky folks, please resist finding ways to scrape the fediverse, build archives, automate tools and connect to people via bot without their consent.

@pamela Someone will do it, and you should consider what you post as such.
If you don't want something to be archived don't share it publicly

(And no, while I'm a #datahoarder i don't do such thing in the fediverse... Yet...)

#datahoarding

**Lucy Osmond** @naln1@social.naln1.ca · Nov 05, 2022, 22:24

**Lucy Osmond** @naln1@social.naln1.ca · Nov 05, 2022, 22:24

Nov 05, 2022, 22:24

Lucy Osmond @naln1@social.naln1.ca

@pamela Yesterday my server grinded to a halt because someone whose User-Agent was “RSS Discovery Engine 0.1” (which is apparently Quakkels’ RSS Discovery Engine) was making dozens of requests to my poor Raspberry Pi 3+ per second, because Quakkels’ RSS Discovery Engine doesn’t even bother to check robots.txt ;-; It’s important to remember that - barring all the ethical and moral issues - there are a lot of small instances on Fedi that just don’t have the computerpower to handle more than a few requests per second, and that you’re literally just DoSsing them.

**pamela** @pamela@bsd.network · Nov 06, 2022, 00:43

**pamela** @pamela@bsd.network · Nov 06, 2022, 00:43

Nov 06, 2022, 00:43

pamela @pamela@bsd.network

@naln1 yes! We were all supposed to be able to run this from whatever machine we could access, and so people do just that. And I have to imagine it must be seriously frustrating sometimes when others forget...

**Adam ♿** @voltagex@aus.social · Apr 13, 2023, 02:13

**Adam ♿** @voltagex@aus.social · Apr 13, 2023, 02:13

Apr 13, 2023, 02:13

Adam ♿ @voltagex@aus.social

@naln1 @pamela an issue from a year ago - https://github.com/quakkels/rssdiscoveryengine/issues/26

Grounds for dropping all traffic matching that user agent IMO: or return a small RSS explaining the issue.

**Lucy Osmond** @naln1@social.naln1.ca · Apr 13, 2023, 02:16

**Lucy Osmond** @naln1@social.naln1.ca · Apr 13, 2023, 02:16

Apr 13, 2023, 02:16

Lucy Osmond @naln1@social.naln1.ca

@voltagex @pamela I'm actually the sole upvote on that issue :3

**Voline** @Voline@kolektiva.social · Nov 10, 2022, 19:57

**Voline** @Voline@kolektiva.social · Nov 10, 2022, 19:57

Nov 10, 2022, 19:57

Voline @Voline@kolektiva.social

Hacky folks, please resist finding ways to scrape the fediverse, build archives, automate tools and connect to people via bot without their consent.

@pamela
Pamela is one of the wisest of wisest old heads on the Fediverse. Heed her!

**r3g_5z** @r3g_5z@plem.sapphic.site · Nov 12, 2022, 21:39

**r3g_5z** @r3g_5z@plem.sapphic.site · Nov 12, 2022, 21:39

Nov 12, 2022, 21:39

r3g_5z @r3g_5z@plem.sapphic.site

re: Hacky folks, please resist finding ways to scrape the fediverse, build archives, automate tools and connect to people via bot without their consent.

@pamela i don’t really understand the idea behind wanting to scrape the fediverse in the first place. what is the purpose? because i can only think of bad ideas coming from it where people will target marginalised groups, when the fediverse mainly consists of marginalised groups, by being able to search for their toots and harass them with it.

**JustRosy** @JustRosy@postpandemicparty.org · Nov 12, 2022, 23:14

**JustRosy** @JustRosy@postpandemicparty.org · Nov 12, 2022, 23:14

Nov 12, 2022, 23:14

JustRosy @JustRosy@postpandemicparty.org

Hacky folks, please resist finding ways to scrape the fediverse, build archives, automate tools and connect to people via bot without their consent.

@pamela What I'd like to know is how you posted beyond the 500 char limit. How did you do this?

**pamela** @pamela@bsd.network · Nov 13, 2022, 01:44

**pamela** @pamela@bsd.network · Nov 13, 2022, 01:44

Nov 13, 2022, 01:44

pamela @pamela@bsd.network

re: Hacky folks, please resist finding ways to scrape the fediverse, build archives, automate tools and connect to people via bot without their consent.

@JustRosy We changed the limit for our server in the settings. :) A lot of servers do this to allow real blogging and longer conversations...we gave everyone about a page of text.

**Ed Bennett** @edbennett@mastodon.social · Nov 12, 2022, 23:59

**Ed Bennett** @edbennett@mastodon.social · Nov 12, 2022, 23:59

Nov 12, 2022, 23:59

Ed Bennett @edbennett@mastodon.social

Hacky folks, please resist finding ways to scrape the fediverse, build archives, automate tools and connect to people via bot without their consent.

@pamela
Good advice here. The Fediverse is not Twitter.

**Jons Mostovojs** @jonn@social.doma.dev · Nov 13, 2022, 12:12

**Jons Mostovojs** @jonn@social.doma.dev · Nov 13, 2022, 12:12

Nov 13, 2022, 12:12

Jons Mostovojs @jonn@social.doma.dev

Hacky folks, please resist finding ways to scrape the fediverse, build archives, automate tools and connect to people via bot without their consent.

@pamela if I understand it correctly, only the toots from exactly the set of the accounts someone on the instance follows or interact with will end in the federated timeline (and thus be effectively scraped by the instance)?

**Keira (She/Her)** @keira_reckons@aus.social · Apr 13, 2023, 02:15

**Keira (She/Her)** @keira_reckons@aus.social · Apr 13, 2023, 02:15

Apr 13, 2023, 02:15

Keira (She/Her) @keira_reckons@aus.social

@pamela I find it fascinating that the same people who will say things like "users are too stupid to be trusted with X" will make the case that the ordinary user of a social media site meaningfully understands how it works, and can understand that scraping even exists.

Most people in my life have zero concept even that an admin or dev might see their content, or that things are kept *forever*, or that someone could access their toots while not using a normal GUI.

There is *no* ethical or moral basis for opt out privacy, unless you are *only* dealing with data/dev/cyber people.

**Keira (She/Her)** @keira_reckons@aus.social · Apr 13, 2023, 02:16

**Keira (She/Her)** @keira_reckons@aus.social · Apr 13, 2023, 02:16

Apr 13, 2023, 02:16

Keira (She/Her) @keira_reckons@aus.social

@pamela aaah I just realised how old this toot is! Must have seen it reboosted. Sorry for dredging it up!

**pamela** @pamela@bsd.network · Apr 13, 2023, 03:06

**pamela** @pamela@bsd.network · Apr 13, 2023, 03:06

Apr 13, 2023, 03:06

pamela @pamela@bsd.network

@keira_reckons you've made good points, and sadly it's no less an issue than it's always been! but at least it's not something new every week like it was for awhile

**sagebiel** @msgbi@mastodon.social · Apr 13, 2023, 20:56

**sagebiel** @msgbi@mastodon.social · Apr 13, 2023, 20:56

Apr 13, 2023, 20:56

sagebiel @msgbi@mastodon.social

Hacky folks, please resist finding ways to scrape the fediverse, build archives, automate tools and connect to people via bot without their consent.

@pamela @charles_ex @aloa5 just my API calls and not unfollow anymore @Ginger149

Resources

Developers

What is Mastodon?

bsd.network

More…