Follow

Hacky folks, please resist finding ways to scrape the fediverse, build archives, automate tools and connect to people via bot without their consent. 

Just resist the urge. Because you're not going to think to check for robots.txt, but you ought to, that's how we communicate we don't want to be involved. You're probably not going to bother to read the various terms of service for the instances you touch, many of which explicitly ask you not to do any scraping or automated activity beyond normal use of the service. You're not going to know to respect people using the NoBot hashtag that was meant to prevent automated follows. You're certainly not going to parse user profiles and pinned posts to learn how people like to be approached, what will get you blocked, or even think to heavily throttle your activity because instances are falling over in response to load.

Whatever your thing is, make it 100% opt-in. Make it appropriate for a significantly more at-risk user than you are. Make sure it forgets things, purges info about servers it can't contact, can't operate in any sort of logged-in mode where consent is an issue.

We will straight up help advertise your cool thing if it respects users properly and takes the time to consider the safety and preferences of every person involved. There are a lot of fun, thoughtfully-designed toys! And there are a lot of people really tired of having to come and tell you off when you wanted to help, honestly. Help yourself and ask around before you flip on your cool new thing, let folks point out what you're missing.

Hacky folks, please resist finding ways to scrape the fediverse, build archives, automate tools and connect to people via bot without their consent. 

@pamela There is a setting to prevent indexing by search engines, and there are options to make your timeline private. Is that not sufficient to gauge consent?

re: Hacky folks, please resist finding ways to scrape the fediverse, build archives, automate tools and connect to people via bot without their consent. 

@cdc historically, many just do not bother to check. Or users don't have control of things they think they do to express a desire to opt-out of things. Like rss feeds, they go out even if you've turned off indexing, even when the local timeline is hidden. Public entries only, so people think who cares, right? Well, the people who don't know those exist sure might, or the people whose post and account deletions aren't being honored because someone didn't think about the possibility that someone might badly need to delete specific information for their safety.

It's just a lot of potential nasty surprises that are totally avoidable with a little warning and review from the outside.

re: Hacky folks, please resist finding ways to scrape the fediverse, build archives, automate tools and connect to people via bot without their consent. 

@pamela Not being able to ensure deletion across the fediverse is a known problem and a major drawback for sure.

@cdc @pamela No, it is not. You can go ahead and write "people are going to do this eventually, like it or not", but you're going to have to live with: not! People may not like it! That's kind of what's being expressed here – and ignoring that people are expressing their opt-out in things like instance ToS/CoC, in ways other than the ways one finds convenient... is IMO at very least a jerk move and probably violating some European law or other.

@maya Darn, I know people not liking thing goes a long way to preventing someone from doing something... but are the existing private/indexing opt-outs insufficient? Ignoring those controls would definitely constitute a "dick move."

@cdc FWIW the history here is valuable : sunclipse.org/wp-content/downl

I think it's accurate to say that the ambient attitude (not just pamela's view) is that stuff should be pretty specific and opt-in. IMO it will suck if people have to nuke discoverability of their instances to avoid arbitrarily invasive automated scraping projects that decide opt-in is too much overhead.

@maya I've talked a bunch about where the real test for Mastodon will be adversarial pressure and this seems like a great example. When someone you don't want is going to scrape your content, what can you do?

Does Mastodon make it clear that you MUST opt-out for any attempt at privacy? Much like it should ditch DMs because of how woeful the security implications are? That is shouldn't even attempt to imply that DMs have any kind of privacy from instance owners?

@cdc Privacy isn't binary. Asserting that something is wrong doesn't imply naivete about its remaining technically possible.

Hacky folks, please resist finding ways to scrape the fediverse, build archives, automate tools and connect to people via bot without their consent. 

@pamela You can be sure this has been done since day one by governments and big data corporations.

Hacky folks, please resist finding ways to scrape the fediverse, build archives, automate tools and connect to people via bot without their consent. 

@shuttersparks @pamela yes. We can be sure because we've already seen it happen, a few years ago. Not by big corporations, or governments, but by simple people. And it was hell of a debate, and a lot of trust was lost in the community.

Hacky folks, please resist finding ways to scrape the fediverse, build archives, automate tools and connect to people via bot without their consent. 

@pamela Someone will do it, and you should consider what you post as such.
If you don't want something to be archived don't share it publicly

(And no, while I'm a #datahoarder i don't do such thing in the fediverse... Yet...)

#datahoarding

@pamela Yesterday my server grinded to a halt because someone whose User-Agent was “RSS Discovery Engine 0.1” (which is apparently Quakkels’ RSS Discovery Engine) was making dozens of requests to my poor Raspberry Pi 3+ per second, because Quakkels’ RSS Discovery Engine doesn’t even bother to check robots.txt ;-; It’s important to remember that - barring all the ethical and moral issues - there are a lot of small instances on Fedi that just don’t have the computerpower to handle more than a few requests per second, and that you’re literally just DoSsing them.

@naln1 yes! We were all supposed to be able to run this from whatever machine we could access, and so people do just that. And I have to imagine it must be seriously frustrating sometimes when others forget...

@naln1 @pamela an issue from a year ago - github.com/quakkels/rssdiscove

Grounds for dropping all traffic matching that user agent IMO: or return a small RSS explaining the issue.

Hacky folks, please resist finding ways to scrape the fediverse, build archives, automate tools and connect to people via bot without their consent. 

@pamela
Pamela is one of the wisest of wisest old heads on the Fediverse. Heed her!

re: Hacky folks, please resist finding ways to scrape the fediverse, build archives, automate tools and connect to people via bot without their consent. 

@pamela i don’t really understand the idea behind wanting to scrape the fediverse in the first place. what is the purpose? because i can only think of bad ideas coming from it where people will target marginalised groups, when the fediverse mainly consists of marginalised groups, by being able to search for their toots and harass them with it.

Hacky folks, please resist finding ways to scrape the fediverse, build archives, automate tools and connect to people via bot without their consent. 

@pamela What I'd like to know is how you posted beyond the 500 char limit. How did you do this?

re: Hacky folks, please resist finding ways to scrape the fediverse, build archives, automate tools and connect to people via bot without their consent. 

@JustRosy We changed the limit for our server in the settings. :) A lot of servers do this to allow real blogging and longer conversations...we gave everyone about a page of text.

Hacky folks, please resist finding ways to scrape the fediverse, build archives, automate tools and connect to people via bot without their consent. 

@pamela
Good advice here. The Fediverse is not Twitter.

Hacky folks, please resist finding ways to scrape the fediverse, build archives, automate tools and connect to people via bot without their consent. 

@pamela if I understand it correctly, only the toots from exactly the set of the accounts someone on the instance follows or interact with will end in the federated timeline (and thus be effectively scraped by the instance)?

@pamela I find it fascinating that the same people who will say things like "users are too stupid to be trusted with X" will make the case that the ordinary user of a social media site meaningfully understands how it works, and can understand that scraping even exists.

Most people in my life have zero concept even that an admin or dev might see their content, or that things are kept *forever*, or that someone could access their toots while not using a normal GUI.

There is *no* ethical or moral basis for opt out privacy, unless you are *only* dealing with data/dev/cyber people.

@pamela aaah I just realised how old this toot is! Must have seen it reboosted. Sorry for dredging it up!

@keira_reckons you've made good points, and sadly it's no less an issue than it's always been! :flan_tired:​ but at least it's not something new every week like it was for awhile

Hacky folks, please resist finding ways to scrape the fediverse, build archives, automate tools and connect to people via bot without their consent. 

@pamela @charles_ex @aloa5 just my API calls and not unfollow anymore @Ginger149

Sign in to participate in the conversation
BSD Network

bsd.network is a *BSD-adjacent Mastodon Instance. We have a code of conduct.