Path To Citus Con, for developers who love Postgres | Transcript: Working in public on open source with Simon Willison & Marco Slot

Working in public on open source with Simon Willison & Marco Slot

June 29, 2023 / 01:05:02/E1

Working in public on open source
===

Claire: To those of you in the audience and listening to the recording later, welcome to Path To Citus Con. It is a new live show on Discord. This is episode number one. The text chat is gonna be happening in the #cituscon channel; specifically in the Path To Citus Con Episode 01-E01 thread.

We've got a Code of Conduct, as you might expect, and there's a Code of Conduct for this Discord server. But in addition, we tend to follow the Citus Con Code of Conduct at which you can find at aka.ms/cituscon-conduct. And my name is Claire Giordano. I'm a Citus open source champion here at Microsoft.

And I'm co-chair of the Citus Con: An Event for Postgres event which is virtual and is happening in a couple of weeks. And I'm here with my co-host, Pino de Candia, who is an engineering manager at Microsoft, working on Postgres. And we're super excited to introduce our two guests. I didn't mean to cut you off there, Pino.

Pino: No, no, no. That's all right. Actually, I wanted to ask do you wanna shout out to the producers now?

Claire: That's a great point. In the background Aaron Wislang and Carol Smith, are producing the show wouldn't be happening without them and all their work behind the scenes. So you can say hello to them on chat or give them any feedback during or afterwards as well.

So and then I think Teresa Giacomini is also here. Teresa is my co-chair for the, the bigger Citus Con event happening in a couple weeks. Cool. All right. So without further ado, let's get started. Simon Willison is here with us to talk about working in public on open source.

Simon: Hi.

Claire: Hey Simon.

So I just have a few things I want people to know about you. Obviously, you're a keynote speaker for the America's livestream for Citus Con happening later in April, and officially, I think, I think people would say you're an independent researcher and developer. When I think of you, I think back, what is it, 20 years ago to the fact that you were a co-creator of Django?

Simon: Wow. Yeah. Yep. 2020 years ago, 19 and a half years ago. At this point, I think.

Claire: And more recently, you created Datasette and I'm sure we'll talk about that in a few minutes so you can kind of explain to people what that is and why they might care. You're on the PSF Board of Directors and most recently you've been talking a lot about large language models in chatGPT and even the new Bing.

Simon: Yeah. They're beguiling. I can't pull myself away from them. They're just too fascinating and incredibly distracting.

Claire: I have been following you for years online and one of the things that you've been talking about for a few years is this concept of working in public and the benefits of working in public and that's part of what inspired us to choose that topic for today.

But before we dive in, we should introduce Marco.

Pino: And I have the honor of doing that. So first of all, hi Marco.

Marco: Hey, how's it going?

Pino: And for those of you that don't know Marco Slot, our second guest today, who is the keynote speaker for the EMEA livestream of Citus Con: An Event for Postgres.

The EMEA livestream will be happening in Europe on the morning of April 19th. Marco is the lead architect for the Citus Open Source project. Citus is a Postgres extension. It allows you to scale Postgres to multiple nodes, so you can start small with one or a few nodes and scale to many, many nodes to a large distributed cluster.

He's also the creator of the popular pg_cron extension. And all of Marco's Citus Engine work is done out in the open in public on GitHub in the Citus open source repo.

Marco: Yeah. Yeah, definitely. Yeah, you can pretty much see anything I did in the last few years in GitHub. But yeah, thanks for also the pg_cron shout out. That is my sort of project. It's much simpler than PG than Citus. I spend way less time on it. But it's like funny that these kind of simple pieces of software you build and then just put out there in the open, it can have a huge amount of impact if it just solves a specific problem that people have.

So I'm always happy to hear people using pg_cron.

Claire: So if we wanna dive into today's topic, I guess I'll just start with a super open-ended question. Like what do you both see as the benefits of working in public? Start with the positives.

Simon: So the thing for me is that the work that I do, I never want to have to solve the same problem twice, ever.

That's the most frustrating thing is when you sit down to solve the problem. You're like, "wow, I solved this before" and I'm gonna have to do it again. I may have to waste my time figuring it out all over again. And a lot of the problems that I solve when I'm engineering are problems that could be captured in some kind of form. Maybe it's a commit message with a commit that updates something. Maybe it's a few notes, maybe it's just a sketch and an issue description of the approach that I was going to take. And I found that having those out there in a system massively increases my productivity and defaulting to putting them in public.

Partly, it's sort of an insurance scheme. You know, I've worked for companies where I did everything in private and then I left those companies and I've lost all of that work. You know, I don't get to, and I'm gonna have to reinvent things and solve the same problems again that I've already solved. Everything that I do in public that has like an open source license attached to it, and it's just out there.

I will never have to think about those things ever again. That 's a problem that I've solved once and will never have to go back and revisit. And I love that. You know, I feel like the work that I'm doing is constantly adding up to just me having more capabilities and more tools in my tool belt.

Claire: So that's a really Simon-centric perspective. Like I kind of expect --

Simon: It's very selfish.

Absolutely.

Claire: I kind of expected you to talk about the benefits of sharing your learnings with others and how we can build on top of each other's learnings and how other people benefit when you share your Today I Learned.

Simon: Yeah, no, again, it's very selfish.

So I have this website my TIL website, which I'll drop a link into the chat and I just published my 400th note there. And on the one hand it is for other people, so that if somebody else needs to figure out how to copy a table from one SQLite database to another and they do a Google search, they'll land on my site and it'll solve the problem for them.

But mainly it's for me, the fact that I'm publishing causes me to increase the quality of the notes a little bit so they make more sense to other people, but it also makes more sense to me when I come back in a year's time and I've forgotten everything. So, yeah, I feel like you can actually be very selfish in your sort of motivations and still do all of this stuff in public in a way that benefits other people.

Pino: I really like that because I hadn't thought of open source beyond code, documentation, design. And you're actually preserving your thought process for so that you can look at it later. And that is a huge part of what we lose over time.

Claire: Well, there's this concept-

Simon: Absolutely. Yeah.

Claire: That comes up a lot in recent years.

I mean, maybe people talked about this 10 years ago and I just missed it and it went right over my head. But the concept of doing something for future Claire, right. Not just present Claire. And I definitely think about that. I had conversations with people just this week about, "Hmm, you know what, let's go update that document," so next year when we're planning for Citus Con, we don't have to resolve this problem. Right. We'll remember this bug and it will have been addressed.

Pino: So I wanna hear the same. So sorry. Go ahead Simon.

Simon: I was just gonna say that actually, that's really important. I think for publishing you are writing these notes anyway, right?

Like, to be productive in our lives, we need to make meticulous notes about things. The cost of publishing them is pretty tiny compared to the effort of putting them together in the first place. And, so you may as well default to publishing them so they benefit other people, which benefits you because you get a reputation as a useful person and you are all of that kind of stuff.

But yeah, it's a very incremental cost on top of, or you should be keeping meticulous notes. Why not publish them?

Claire: That reputation as a useful person is an interesting concept. So Marco, I'm gonna turn to you. Do you think your work in open source gives you a reputation as a useful person?

Marco: Well, I work in more of a team capacity on Citus, I guess. So I don't know if it would single me out as a useful person.

And if you're kind of just maintaining a very actively used open source repo by yourself, you're mostly just gonna get a lot of complaints a lot of the time, to be honest. But I do agree with like what Simon says. It's good too, especially if you found a good way of doing things, it's good to just put it out there.

I mean, it can be on a blog, it can be on GitHub. Because, and, I mean, one thing I try to do a lot, especially in writing code is I feel like code should always sort of, the main job of code is to explain the problem. Like, it's not so much like, I would say like good code explains the problem and solves it as a side effect.

Like you kind of wanna structure it and comment on it in a way that future Marco or future team member will look at it and say, okay, this makes sense. I can work with this. So, it's a bit more of a narrow scope to do that within, let's say, code comments and code structure of a particular project.

But yeah, for me, a big part of working in public is the feedback mechanism. Like, it's kind of hard to write good software that's reliable and solves the right problem for people. And if you're working on a sort of proprietary fashion proprietary product, eventually your customers might complain or maybe you don't get customers and there's no one to complain.

But in open source it can go really quick. Right. I have that sometimes. I push out the new version and then 15 minutes later someone says, "Hey, something broke for me." And, that's extremely useful. But you also get this constant feedback of like, man, I really wish we could do this on GitHub issues or on our Slack channel.

And that feedback really also just helps you do better work. And I mean, there's also a kind of developer experience side to it, which is part selfish and part almost kind of promotional, where it's just easier to use stuff that's open source because you don't have to set up a lot of authentication mechanisms and VPNs.

It's like you just do a git clone and you compile and it works. And that's the developer experience of open source I think is a very important aspect.

Pino: Marco, Marco, when you earlier said that you get a lot of complaints, I thought you were going to point out that that's difficult, but then you pointed out its value.

Does it also have a negative side? Something difficult to deal with?

Marco: I'm not sure there's many negatives. I mean, you shouldn't take it personally, I guess. But I mean, if many people are complaining about the same thing, then you know, it's a good thing to fix. If one person is complaining about the same thing over and over again, but no one else is complaining about it, I mean, it can, I guess be a bit annoying.

But most of the time it's just useful and most people are well intentioned anyway. But yeah, sometimes there's also a challenge of, okay, you get too much feedback, right? And you have to kind of start saying "no" to 90% that comes in and then the project becomes more popular.

So you get even more feedback and you need to say even more. You need to say no to 99% of requests. So that's that can get a little bit difficult sometimes.

Simon: I've got a workaround for that which is kind of fun. Oh yeah, so, my main project Datasette, one of the big features it has is a plug-in system so you can write plug-ins for it.

And the joy of plug-ins is that people can add features to my software without me being involved at all, even if I think those features are a terrible idea. And this actually plays well for me because I come up with features that I think are a terrible idea and I can still build them. Like I can go and build a plug-in that does something kind of ridiculous and silly because it doesn't harm the core project because it's a separate thing.

And so that I found is, has been when people say "how can they contribute to my software?" I tell them, write plug-ins for it. I won't even have to review their pull request. You know, they can work completely independently of me and explore new things. And maybe my software gets a really cool new feature as a result.

Claire: Well, that's really interesting because that connects to a capability that Postgres has. Like if you go back to the very first paper that was originally published, when was that? 1985, 86? Marco? About Postgres. One of the primary design constructs in the database was that it be extensible. And so in fact there's this ability to create Postgres extensions and Citus, is, in fact, a Postgres extension and it's enabled all this innovation to flourish that wouldn't have been able to be put into Postgres core.

Right. It kind of gives people runway to go off and make things happen without having to get them in.

Simon: pg_cron is, yeah, pg_cron is another fantastic example of how well that kind of thing works.

Pino: What about community fragmentation aspect of that? So if you, and particularly in Datasette, Simon, I wanted to ask you, if someone goes off and writes a plugin, do they continue to have the conversation in the context of your of Datasette, or do they end up splintering off?

And how do you bring those conversations back together?

Simon: So, right now the community is small enough that we have a Discord that we hang out on and that's not been a problem yet. But also the project is, young. Datasette itself is five years old now. The plug-in system's maybe four years old, but it's only in the past year that people have really started building quite elaborate plug-ins on top of it, which is super exciting and it's the position I've wanted the project to get to.

But I feel like there's gonna be growing pains going forward that we haven't encountered yet. So right now it works because it's a small enough community that that's been okay. I do worry about things like I want to make changes to Datasette, which could break plug-ins and that's fine when I wrote the plug-ins because I can upgrade them.

But now I've got external volunteer maintainers building their own plug-ins. I have to think more carefully about that kind of thing. But yeah, I feel like it's a well-trodden path. Like, my inspiration was WordPress, where WordPress plug-ins have been around for 15 years and I feel that the reason that WordPress has been so successful is the design and architecture challenges a really interesting, you know, designing a plug-in system that lets people do flexible things without sort of binding your hands in terms of the future of the project itself takes a lot of practice and work.

And I still don't feel like I'm figuring out those patterns as I go along, but it's difficult. There's not much guidance out there as to how to design your plug-ins or extensions model for sort of maximum power and minimum friction.

Pino: So what about the Postgres community? How has that been?

Obviously the extensions are massively successful. There are lots of really popular extensions. Was that a problem? I'm fairly new to the Postgres community, so I'd love to know a little bit about history of extensions and how the community avoided fragmentation.

Marco: Yeah, I wouldn't know what the first extensions were.

I mean, I think the first really major one was, was PostGIS or PostGIS which is the kinda add geospatial data types and functions to Postgres without changing a line of Postgres code. And it's very interesting. So you can have these polygons on maps that you store in your database, but you can also then create indexes on spatial indexes effectively.

Because the plug-ins, the indexing system in Postgres is also very pluggable. I think most places where you can run Postgres, you usually also have PostGIS. But this kind of data type, adding a new data type that's a very kind of clean interface In Postgres, there are certain other type of extensions.

They don't really have a name. I kind of refer to them as deep extensions that really alter the behavior. Citus is one of those and TimescaleDB and there's this new graph extension called Age or I think AgensGraph previously. It's basically a graph database on top of Postgres, and like they go a lot deeper into the extension interfaces that Postgres offers.

I think sometimes it happens in the Postgres community where someone wants to add a new feature, how to do it. And what the design should look like. So they just add a function pointer and say, "okay, you built your own extension" and, you know, do it your way. And that gives an enormous amount of power like that.

There's nothing like it in any other database that you can just change the planner into something completely different. But that part is also a little less cleanly layered. So it's hard to layer certain extensions that mess with the planner on top of each other. Sometimes it works, sometimes it doesn't.

You get into binary compatibility issues that mess with the data structures in incompatible ways. So it hasn't really become a huge problem yet so far. Most of the time people use one or two of the deep extensions, but not a big combination.

And then there's the vast majority of more new data types and new functions. And those extensions usually compose really nicely.

Claire: So when we were brainstorming topics for today's first episode Postgres extensions was absolutely one of the things we considered. We could spend a whole hour talking about it.

And in fact, next week in episode two, I think we're gonna be back here Wednesday 10:00 AM Pacific time as well with some different guests. And our topic is something like how to get Postgres ready for the next a hundred million users. And I'm sure Postgres extensions will come up in that conversation too.

I wanna circle us back to working in public for a second. Simon and Marco, you both have talked about some of the benefits of doing it, but what I wanted to drill into, because one of you kind of planted the seed in my mind, what was surprising? What has been surprising about working in public? Has there been anything surprising about working in public?

Simon: I think the surprises they're the nice little surprises when somebody says, oh, I really appreciated that little note that you put out, and it's something that you threw out six months ago and promptly forgot about, and honestly didn't think anyone would ever read.

So that's the real delight of it. You know, it's when somebody says, that thing that you wrote helped me solve a problem or was useful to me, and somebody will come talk to you at a conference or whatever. And that's always delightful because honestly, I publish so much stuff. Like on any given day, I probably publish one or two TILs, like half a dozen issues, a bunch of commits.

There's a massive volume of it, so I assume that nobody is going to see most of it or any of it, because who's got the time to look at my latest GitHub commits or whatever. And so when people do, that's kind of lovely, because like I said earlier, I do this selfishly, it's mainly for me, but anytime it's valuable to someone else, it's always a treat.

It's always delightful to hear that did have a impact in the world beyond just me having my own notes.

Claire: Well, the reason you're here is because I've been following you for years and have learned so many different things from you that I finally mustered the courage to introduce myself and invite you to be a keynoter at Citus Con.

So yeah, I think oftentimes people do appreciate the work that gets shared publicly, but we're not wired to necessarily express that gratitude or say thank you. So when you do get a compliment, it is kind of cool.

Simon: Yeah.

Claire: When you think about it, on GitHub, there's a tab for issues. There's no tab for gratitude or appreciation or compliments or accolades.

It's not there. We're all focused on what's wrong and how can we make it better.

Marco: Well, there's the star button.

Claire: Oh, that's true. There are stars. Yeah. I love it when people star the Citus GitHub repo. But I try not to ask for that too much because it seems so shameless.

Marco: But, I suppose comments could be nice, but yeah, you were asking were were there any downsides?

Claire: No, I was asking what was surprising.

Marco: Ah, what was, was surprising. Yeah. So I think one of the surprising things about if you just push some software out there, a lot of the usage is actually very silent. A lot for a long time. Well, just like Simon said, just someone came out of the blue and said, had read this message.

And the same thing happens with software where you suddenly realize there's this enormous user that has been doing very interesting thing with the things you built. Like, the funniest anecdote we have to share is on pg_cron. It's a lot of that. Our team is in Turkey, and supposedly the Turkish government uses pg_cron to schedule the street lights for turning on and off.

Simon: Amazing.

Marco: I don't know the exact mechanics of it, but it seemed brilliant. I totally love it. And those are also always the nice surprises because they tend to happen quietly for years. Turns out your project has been used.

I also remember: I come from this university where we had this professor, Andy Tanenbaum, and he had this operating system called Minux, which before Linux was kind of the main open operating systems that people were using. And he had these long debates with Linus and about the operating system architecture.

And I mean, in the end, Linux became much bigger and Minux became much less important. But then at some point it turned out that Intel had put Minux in one of their chips and it was like one of the most widely deployed operating systems in the world. And often these things come really quietly, it turns out that there can be massive impact of the things you've done in public.

So that's like one of the really nice things.

Simon: I think one thing I'll say there is that people often ask, "how can I contribute to open source?" And there's this idea that, oh, well now you need to fork the code base and fix bugs and submit pull requests. Totally like forget about that. There is so much you can do for an open source project that comes way before you're actually sending in patches and trying to commit code.

And one of the most valuable initial things is just tell the people who built the project what you did with it. Because, I can guarantee for the vast majority of open source projects, weeks will go by with maybe a bug report or two, but no real evidence that it's being used because people can take it and use it for free independently.

And so if somebody says to me, "Hey, I use Datasette to build this thing", and honestly, it doesn't matter what this thing is, that will make my day, I will be absolutely thrilled to see evidence that people are engaging and using it. So yeah, just telling people that you use their stuff is great.

Even better than that, write about the thing that you did. If you like tweet a screenshot of something that you've built with my software, again, that's social proof. I can promote that to people and I get to see what's going on. So there's very tiny things you can do to support an open source project just in terms of talking about what you're doing with it that are way more valuable than you might expect.

Claire: About 10 years ago Josh Berkus gave a talk. I think the title was something like "50 Ways to Love Your Project" and it was about all those non-code ways that people can contribute to open source. And I love the two that you just mentioned. Tell the people what you did with it and write about it, tweet about it.

I ended up doing a reprise of Josh's talk at a couple conferences in the last year, like Fibonacci Spirals and all these ways to contribute to Postgres beyond code. But telling people what you did with it is probably one of my favorite ones. Because it helps other prospective users, right?

It helps other people who are thinking about using it in that way to learn from your experience. And then it makes the creator's day. I mean, your dog probably got an extra walk that day or maybe not. Does that not happen when you're really excited to, do you not take the dog out for a walk?

I don't know. It's what I do.

Marco: We have a cat, he walks himself.

Claire: Marco, what's a two-flight project since you were talking about pg_cron earlier?

Marco: Oh, sorry. A two-flight project.

Claire: Yeah. Isn't that how you describe pg_cron? No.

Marco: Oh, two-flight. Oh, yes. A two-flight project. Oh, yeah. You asked me this before. Yes. So I don't know.

I used to have, I mean, I don't really travel by plane much anymore, but I used to fly a lot and then for work and that was always the best place for me to write code. It's just sort of disconnected from the internet and people and email and chats. And so I I guess writing pg_cron initially took me two flights to the US to get it done.

But I had an intention there of you know, it's not part of my day job. And I tried to carefully think of, you know, if people are gonna rely on this, how without me doing a lot of work, do I make it as reliable as and as possible? And it begins with keeping it very simple and small and focused on a specific problem.

And there's quite a few feature requests, which I'm not ignoring, I'm just weighing them extremely carefully. So for a few years I didn't add, for example, the ability to schedule jobs in seconds because it's not something that cron does, but recently there were so many people asking for, it's like, okay, well this solves a real problem.

I'll pay the cost of doing it. And you also have to be careful if you sign up for it, like if you put something out there that people then start relying on, I mean, you can have the choice of either building a community around this or being on the hook for it or kind of abandoning it.

And so currently I'm sort of keeping myself on the hook for it, but keeping it as simple as possible as well.

Simon: I feel like what you're describing there is the hardest problem in software development, prioritization. It's deciding what to build next, deciding what features are worth paying, the sort of ongoing maintenance tax of just figuring out what is the most valuable thing that I could be building.

I find that incredibly difficult because I'm independent, I don't have a boss or investors or anything. So there's very little sort of, I don't really have a forcing function to help me make those decisions so I can get to the end of the day and I've built a new thing, which is fine, but it wasn't the thing I intended to do with my time to reach my larger goals.

Pino: In that case. I was going to ask you about that before, just in terms of, you publish week notes, you clearly have a discipline and you have a habit of doing certain things. You explained the motivation before as selfishness, but then there's also this aspect of prioritization. So since you've gotta decide across multiple projects, what's your day like?

Do you sit down and I don't know, pre-prioritize everything you could potentially, do you do is that weekly? I'd like to hear about your habits.

Simon: Yes. So on a good day when things are going well, I have a slot between 9 and 9:30 in the morning where I make my plan for the day and I figure out, okay.

I try and say, I'll go for one big thing and two small things that I want to get achieved. And then I'll check in at the end of the day and see if I did those. And then once a week I sort of look at my larger goals and try and use that thing. That's when things are going well. Past couple of months, things have been going disastrously wrong because every sodding morning some new AI thing has happened, which distracts me for half an hour and I miss my planning session and so forth.

So honestly, there are periods of time when it's all working really well and I'm prioritizing well. And then there are periods of time where it's just complete dumb luck if I get something useful achieved by the end of the day and the week notes are a good cover for that because every week or every two weeks I publish a thing with notes on what I've been doing and it always looks like I've been super productive.

But if you actually look at the strategy and say, hang on a second, were those things that he wrote about the things that were the things he most wanted to get done or needed to get done, they often aren't. So, yeah, it's, it's an ongoing struggle for me. And, I've learned that sometimes I do this stuff well and sometimes I don't.

And that seems to be a sort of like cycle that I can't break out of. So as long as I don't have six months of complete sort of productive unproductivity, then I'll be okay. But yeah at the moment, I really need to bust away from AI research world and get the next alpha of the Datasette 1.0 release out.

That's been top of my priority list, like a week and a half, and I still need to push forward and get it done.

Claire: Are you saying that it's distracting when Elon Musk tweets out a link to a blog post that you've written?

Simon: It was a bit distracting that day. Yeah. That was quite, that was a few weeks ago.

I wrote a story about Bing when Bing had just launched and it was going completely off the rails and, you know, threatening people and blackmailing people and all of that kind of stuff. And yeah, so I wrote a blog entry about what had been happening and Elon Musk tweeted a link to my blog entry and I got a million readers in the next 24 hours, because it was two days after he'd tweaked the Twitter algorithm so that everything he said was shown to everybody.

So it was all that was. Yeah. And that was what, four weeks ago. And I've been distracted by AI stuff ever since because stuff just keeps on building on top of that. And yeah, it's on the one hand it's fascinating, but the other hand it's definitely delaying my ability to get a whole bunch of stuff done that I wanted to get done.

Claire: Okay. I wanna circle back to working in public again and ask a different question to each of you Marco and Simon. What do you think makes engineers who are new to working in public? I mean, maybe they're either fresh outta college or, or maybe they've been working in a proprietary context for a number of years.

What makes them uncomfortable with working in public on open source?

Simon: I can speak for myself here actually. The biggest one is when I'm employed. I feel like the work I'm doing is private to that company. And so I've had periods of my career where I've done very little stuff in public because I'm working for an organization who pays for my time and they pay for my code.

And I signed a thing when I signed up that they had the intellectual property and so forth. And that really held me back. And then there was a point a few years ago where I realized, hang on a second, I am allowed to work on things on weekends and stuff. You know, I don't have to have to stick as hard.

And also these companies will never say, you know, approve it with your manager. And the vast majority of managers who say, "Hey, I want to do this thing", they'll say "yes" because why would they say no? So for me, I think it's partly because I'm a habitual rule follower. I sort of stifled myself by just going too close to the feeling that no, my employer should get all of my intellectual output.

And when I relaxed that I was way happier and I was way more productive.

Claire: Makes sense. Marco, what about you?

Marco: Yeah, I think it, it partly depends on the culture of the company and how you experience that. But so for example, a long time ago I worked for Amazon and they had a very, wouldn't say anti-open source policy, but they like using open source but not contributing.

There was a time when if you wanted to contribute to open source, you had to ask VP approval. I mean, that has radically changed at Amazon. But like Microsoft has a little bit more of a, well, I mean, the new Microsoft, let's say in the past few years has a very pro-open source policy. But not everyone necessarily feels making the decision.

I wrote some useful code and I'm gonna push it out by there in GitHub. Like, what if somewhere, someone, somewhere in the company went, well, I disagree with that. But actually, in practice, it's often quite, if it concerns, high stakes software and say that customers are paying a lot of money for it, it's obviously not a good idea.

But the company's very open to it and it's not hard to get approval and for many things you don't really need to ask. Even like, Microsoft actually encourages it. So it really depends a bit on the company and also on how you've experienced the company so far.

Is it like in your team, in your organization, a common practice to just put stuff on GitHub or work on other projects? I think in our immediate teams, it's pretty common. You see a bug in, let's say, PgBouncer, you go fix the bug. Like, they're not gonna ask anyone should I ask permission to fix this bug?

But I don't know if that's the case for all teams at Microsoft. Probably not, or, I mean, I think they could, but I don't think they necessarily feel comfortable doing that.

Simon: So I do have a suggestion for things to write about that I feel are safe no matter what. And I just dropped a link into the chat to this, but basically TILs, this idea of writing about things that you have learned, I feel is the sort of lowest risk form of online publishing that there is.

Because the great thing about saying "today, I learned how to do a for loop in Bash" or whatever, is that you're setting expectations upfront. This is not that you're not gonna rock somebody's world and give them new insight. This is just, I learned to do this thing today and I'm writing about it. And if that's useful to you, then fine.

If it's not useful to you, then that's okay. This wasn't for you. And so I started publishing those myself a few years ago and I love it. It's so liberating because I don't get that writer's block anymore. I'm like, wow, do I really have something unique and interesting to say about this topic?

You're like, no, I just figured out for loops in bash. I'm gonna write up two paragraphs of text and a sample of code and I'm gonna publish it and I'm gonna move on. And that, I love that. And I feel like if I was working for a company with very stringent sort of "no, you can't like release code" and things, I'd still feel okay writing about things I'd learned.

You know, that feels like a very safe category of notes to be making and sharing with the world. And then the other thing is: I've set myself a rule that anytime I do a project, the price for doing that project is I have to write about it. And this is good for me because, like I said earlier, I'm very easily distracted and I can get to the end of the day and I had my idea for a project and I built it, and that wasn't on my list, but at least now I have to pay for it.

And the payment is, I have to write it up. And the writeup can be just, like, a README in a GitHub repo with just explaining what the thing is in four paragraphs of text and then always add screenshots. I feel like anything you build, you should take a screenshot because the code won't work in 10 years time, but the screenshot will last forever.

So I'm a huge fan of screenshotting your work as a way to illustrate it. But yeah, I feel like if that's all you ever do online is: anytime you learn something, you write up notes about what you learned and anytime you do a project, you build something, you put up a quick post saying what it was you built and adding a screenshot that will put you in the top 1% of internet users in terms of sort of quantity of content that you're producing.

And it's great content and none of it is stuff which I feel like it's very low risk. Like, if you put out a blog post saying, "Hey, this is the way Agile should be done." Lots of horrible people are going to tell you that you're wrong about it. If you put out a blog post saying, "I figured out for loops in Bash," I kind of feel like it's a waste of their time for people to be super critical of that.

Pino: Simon, you make it sound so easy. Could, could I just ask how, how much time does that take you, each of those examples? You said two paragraphs on "I learned," the TIL and the README...

Simon: I've been writing. Yeah. So, I've been writing online for 20 years and so I'm very fast at it.

So a TIL post will take me between 5 and 20 minutes generally, and that's partly as well because anytime I'm figuring something out, I'm actually making notes as I go along. I use GitHub issues for this. I've got public issues in public repos. I also have a private repo called notes, which I just use for when I'm figuring something out.

And so often when I get to a writeup, it's literally copying and pasting markdown from my issue notes into, into a TIL document and hitting go. So, a lot of the time I've kind of written the notes already, the public writeup is just cleaning it up a tiny bit and adding a little bit of extra context.

But yeah, I feel like for the vast majority of people, it's not going to take 5 to 15 minutes at first because you've got to get into the swing of it and find your voice and sort of learn how to productively write, but over time it just keeps on getting faster. And these are crucial skills when you talk about becoming a senior engineer, the path to a senior engineer, I think is through writing, through written communication.

Like, that's the difference between seniors and juniors. The seniors are better at communicating about their work. So developing writing skills is a crucial professional, professional skill anyway.

Claire: So one of the other things,

Marco: Oh, and thanks for writing up about, for loops in Bash because it's one of those things that I have to Google every single time along with for loops and plp.

That one I cannot remember. Do you have one? Do you have a TIL on that? That would be great.

Simon: I have to admit for loops and Bash, I will never write one ever again because chatGPT writes my Bash scripts for me. So I'm just like, "Hey, write a script that loops through every file in this repo, in this folder runs FFmpeg to extract some frames that puts them in a zip" and it does it.

And so no, I no longer feel like I need to dedicate even a corner of my brain to thinking about Bash because chatGPT knows all of the Bash that I'll ever need to know.

Pino: So, this brings up a topic for me. Marco earlier talked about writing code on flights and not flying as much these days now we're. We touched on chatGPT.

I wanted to ask about changes in working in public, working on open source in the last five years, both technological changes and cultural changes. What comes first to mind and maybe Marco, I'll go to you first if that's okay.

Marco: Yeah. Obviously chatGPT and AI is gonna constitute a huge change.

I've even used Vim and Bash for most of my career. And I'm, I'm starting to think, I should probably be using VS Code because that's kind of where all the good integrations will be happening for things like Copilot. So that's for me personally gonna be a big change.

I mean, less technically, I guess a big change is that large organizations like both tech companies, but also just large enterprises are massively embracing open source both in terms of usage, but also in terms of contribution. Is that...

Claire: Oh, you, are you cutting out just for me or for everybody?

Simon: Yeah, that cut out for me.

Claire: Uh-oh. Yeah. You're cutting out Marco. I don't know why. Okay, well, try again and then we'll circle back to you if it doesn't work this time.

Yeah. It's still not working. Okay. I'm gonna jump in and say that one of the changes I've observed is more of a long-term change. If we go back to when I first started in my career, the way people wrote was a little bit different. There was almost an expectation that their audience, whether it was a paper they were writing, or just a lengthy email that people would read every word.

That may not have been true, but the way they wrote felt like that right? There were these long, chunky paragraphs and maybe things were written at the 16th grade level or something like that. And I know that when I advise people about how to blog and when I write my own stuff, now I think about scan-ability, browse-ability.

I assume people are not gonna read the whole thing. I make sure, I assume they might jump to the screenshot, right? Like, just read a couple of the section headlines and then go to the screenshot and read the caption underneath that. And so, I, at least when I think about writing for people, I think about the fact that they're busy and how to make it easier for them to digest and to scan.

And I didn't used to think about that 20 years ago. So.

Simon: Yeah, that's there's definitely something very important. I mean, that's part of why you want to write a lot is that the more you write, the more you develop those instincts for what's actually going to work. And yeah, I had a sort of moment of crisis professionally a few years ago when I'd gotten into the habit of writing these enormously, like detailed documents, about project proposals and stuff, and then I kind of had a hunch that maybe nobody was reading them, and I started polling around and I couldn't find anyone who'd read these documents. And yeah, it made me rethink and think, okay, actually, screenshots and illustrating things, more animated demos. I love having a live demo, I feel like a quick live prototype of an idea speaks a thousand, text documents, because the moment people start playing with it, they can have a much richer conversation about it.

Another thing I found, getting back to the chatGPT/AI side of things. A realization I had the other day is that there will never be documentation that is better in quality than what I can do with chatGPT and tools like that provided they have the underlying knowledge.

My favorite example is FFmpeg. I did this project the other day where I had a video and I wanted to spit out every 10 seconds. I wanted a JPEG frame of that video. It was a video of a thermometer over time. And I wanted to do OCR on it to extract out the readings. And so, how do you use FFmpeg to spit out one JPEG for every 10 seconds?

I cannot tell you that, but I have done it because I said to chatGPT: use FFmpeg to spit out JPEG for every 10 seconds. And it gave me this incomprehensible sort of set of DLLs and scripts and all of this stuff, which just worked. And I cannot imagine FFmpeg documentation that would be good enough that it would answer that question for me as quickly as a chat bot that has been trained on that documentation.

And so, on the one hand, it's weird, there's this new world we live in where a chat bot provides better documentation than the best possible crafted documentation. But it also speaks to skills we need to develop as writers, we need to almost write our documentation so that chat bots can interpret it correctly and accurately to help answer people's questions.

And then as users of this stuff, the skill that we need to develop is getting really good at these language models to spit out the right information for us to help us solve problems. Spot all of the times that it makes stuff up, which is a huge problem, but just check and then just lose that fear.

I'm no longer afraid of FFmpeg because I know that something can show me how to use it, whereas previously I very rarely used it because it's a notoriously complicated piece of software.

Claire: So, as I was preparing, I'm gonna pivot from the "how have things changed?" Back to the working in public thing again.

I reached out to Scott Hanselman last night and he says hello, by the way, Simon.

Simon: Hey.

Claire: And one of the questions he suggested I ask, and actually somebody, Olaf on Mastodon suggested asked something very similar, is, as you work in public, how do you stay positive in the face of assholes? Like right?

There are critics out there on the internet there are haters there are complainers. How do, how do you stay positive when faced with that?

Simon?

Simon: I think part of it is that you develop a very thick skin. You know, if you're online for 20 years, you get to the point where somebody's mean to you. And I kind of think they're probably in their early twenties, they wouldn't be that overconfident and mean if they had actual real life experience.

So that helps me to a certain extent. I'm quite good. But also yeah, I think it's you. It's also a lot of it's about self-confidence. Like, I am confident enough now that I know my stuff, that if somebody says, "no, you're clearly in idiot because you got this wrong". I'll be like, "yeah, but I'm better at Django than you are," you know?

So that helps me a lot. But it comes down to a personality thing as well. You know, I think if you want to really expose yourself on the internet in this kind of way, it does help to have quite a robust ego, and to have that sort of confidence in your own abilities, because yeah, people can knock you down and they will, and if they succeed, it's miserable.

Pino: But to put it in context. Does it happen much? I feel like people have become more aware of the consequences that even words can have. Is it getting better? Does it still happen?

Simon: Maybe it is getting better. Yeah, maybe.

Claire: Oh, it still happens. Well, at least I see it. Like, I share information on Reddit because I espouse the philosophy of meet developers where they are.

Right? And so, when Marco writes some brand new brilliant Citus-related blog post that I want Postgres users to see in case it's useful to them, I will share it on Reddit and, a lot of times comments are supportive and positive, but sometimes they're definitely not.

Simon: I mean, I'm in the most privileged position you can be. You know, I'm white, male, you know, I don't have any of the, like, there are all sorts of aspects of sexism and racism and stuff that I just don't see. So, you know, it's a lot easier I think in that respect.

Pino: Marco.

Claire: Marco, are you, Marco, are you back?

Marco: Yeah. Let's, let's see. Can you hear me?

Claire: Yeah, we can hear you. Perfect.

Pino: So repeat that question too.

Claire: Yeah. How, how do you stay positive in the face of critics on the internet? Has that been an issue for you?

Marco: Well, I don't spend a lot of time on Twitter and those kind of things, but the main thing for me is just, focus on, what you're doing and, believe in what you're doing.

It's like, if someone comes and criticizes your project, it's like, you know, I've already thought about this much more, and I kind of know we're doing the right thing, or that we're just working within the constraints that we've had. So it doesn't really bother me in that case.

But yeah, just probably like Simon, I'm also in this kind of privileged position, I guess. So I probably also see less of it. You know.

Pino: Now what about the perspective of maybe junior developers or people that are new to open source and some people in the open source community have reputations for being harsh and quick to critique a new piece of code or an idea?

Marco: I think the pg hackers list is kind of quite interesting in that, but it's very, well, it has a particular style, but it can be very critical of patches and designs. But it's kind of for, in some sense the good cause of making Postgres as good as possible.

But, and it's never on the person. It's just like always critiquing code. But it can be very tough for a new person to come in and say, oh I have this nice patch, and then it kinda gets criticized and that can be a little bit tough.

Claire: Speaking of junior engineers, I went back and did a search on Twitter, Simon for all the instances where you tweeted about working in public and there is something that you tweeted back in July of 2021 where you said,

Simon: Oh, okay.

Claire: You said: if you wanna stand out from other candidates, having even one piece of writing or published piece of code that shows something you've built is a great way to do that. So would you still agree with that and do you still offer that advice?

Simon: Oh, 100%. So, I do mentoring for code bootcamps occasionally.

And yeah, one of the things was, because in these boot camps, often the students they'll have a final project that they do and they'll put that up on GitHub and I always tell them, put up screenshots in your README, because the people evaluating these things are not going to click on the demo link and the demo will be broken in six months anyway, because that's just what happens.

So, if your README has like, multiple paragraphs of texts with intersperced screenshots and all of that kind of thing, that right there will be your resume for the next like three years and it'll be work incredibly well. Because yeah, when I've been interviewing candidates for work, most candidates don't have a, like inevitably, you're gonna cyber-stalk your candidates a little bit. You're gonna check and see if they've got a GitHub repository and look at their LinkedIn and that kind of stuff. And the vast majority of candidates, you won't find anything that helps you answer the question, "can this person do the job?"

When you stumble across a candidate who's got one project on GitHub with some screenshots that shows that they can code, now I can skip the fizz buzz interview question. You know, because I've seen their work, I've seen that they can do that. And if they've got a blog entry from five years ago with, like, six paragraphs discussing the internals of React or something, they're now in my mind an expert on this one subject.

So, yeah, if there are a hundred people applying for a job and you are the only one with a blog and your blog hasn't been updated in five years, and it has one article on it and a screenshot of something, that's still a leg up, that still makes you stand out from the crowd.

Claire: Marco, do you cyber-stalk your candidates when you're talking to them?

Marco: Yeah, sometimes, I mean it's nice to just review, basically do a code review before you work with someone rather than after hiring them. But yeah, it definitely is a leg up, like if you have some great projects on GitHub or very technical blog, it helps.

It's much more, the resume format is, I always remember that one of the most senior and best engineers I've ever worked with before he got into software during the dotcom bubble, he was a forest firefighter. And I was like, so it doesn't really matter.

I mean, your background matters a little bit, but if you can display your skill, it's just, you know, a hundred percent better than anything else.

Claire: So before we wrap today I wanted to give each of you a chance to talk a little bit about your upcoming keynote. Now, I don't know if you've written your slides yet and if you're ready, but Simon, the title of your keynote at Citus Con, which is on Tuesday the 18th at nine o'clock Pacific Time (livestreamed, virtual) is: Big Opportunities in Small Data.

And I just thought that the backstory to why you're giving that talk and why you think it matters might be interesting for people to hear about.

Simon: Oh, so this is something. So, my day job, and I say job, I'm self-employed. So it's the thing I try to spend most of my time on, is building this open source project called Datasette.

But really the theme is, tools for data journalism specifically. So I have a bit of a journalism background. I've worked at a couple of newspapers. Django came out of a newspaper 20 years ago. And data journalism is the bit of journalism. I think it's the most interesting thing in the world, right?

It's where you work with journalists to try and tell stories with data. And, you know, anytime you see an infographic in the newspaper, any time you see a chart or a map or something, somebody went out and collected the data for that. That's a data-driven story. And when I worked at the Guardian newspaper in London, like 13, 14 years ago we realized that there was a reporter at the Guardian called Simon Rogers, who, he was the data expert.

He knew who to call at, which government department to get data on any story that you liked, that you wanted. And then he had all of these meticulous spreadsheets that he kept on a hard drive under his desk. And we got talking. We're like, we should do something with these meticulous spreadsheets about the world.

So ended up starting a blog. We started a thing called the Guardian Data Blog, and the idea was to publish the data behind the stories. And we ended up doing that just using Google Sheets because Google Sheets was free and and it worked, and you could put data in it and people could copy that data back out again.

But I always felt like there should be a more effective way of publishing data, like a way of putting data online so people can browse and explore it, but also do API integrations with it and export it as different formats and all of that kind of stuff. And so that was the initial idea for Datasette.

It's a Python web application for publishing data online, and it's built on top of SQLite because SQLite is tiny and fast, and you can actually package a database with your underlying code when you deploy it so you don't have to think about even running a separate database server.

And this led me to this whole world of small data where I realized that there's been lots of fuss in the industry in the past, sort of, 5 to 10 years about big data, which is measured in petabytes and you need a giant data warehouse for, but actually for the vast majority of people and organizations, what matters is the small data, it's the data that fits on the USB stick. It's like if you have as an individual, I care about things like my blood sugar levels over time and my step count and my tweets and emails and so on, as an organization, maybe I want to know who my customers are, which is probably like 50,000 rows of data.

You know, it's absolutely tiny and there's this space where I don't feel like people are investing enough effort in building these tools for small data. It shocks me that Microsoft Access has been kind of frozen in time for the past 20 years, when it should be one of the most powerful pieces of software in the Office suite.

But yeah, so I've been. Looking at small data building tools for that, thinking about it from a sort of data reporting and journalism point of view, and then watching how all of these governments are now releasing open data through these open data portals. So name a city in the United States, it probably has a data portal with CSV files full of trees and parking meters and all sorts of things like that.

And it's kind of just sat there because the tooling isn't good enough for regular individuals. And reporters, journalists, who I'm building for, to take that data and turn it into stories. So yeah, I feel like there's a huge opportunity there. I would like to use Postgres for this stuff more than I do.

And so part of my keynote's going to be talking about the ways I've been solving these problems outside of the Postgres ecosystem. And then I also want to tie it into Postgres by thinking about, I was trying to inspire things that Postgres could do better or the Postgres community could build that would make it even more applicable to solving these much sort of smaller problems.

Claire: Well, you just answered my question, which is what does this small data talk based rooted in Datasette and SQLite have to do with Postgres? And there's the answer. So yeah, hopefully people will tune in for that. Obviously it'll be recorded online after the fact, but we'd love for people to join live, too, and ask you questions.

Okay. Marco, your keynote for the EMEA livestream on Wednesday April 19th at nine o'clock central European summertime is: The distributed Postgres problem and how Citus solves it. So in a nutshell,

Marco: yeah.

Claire: What's that about?

Marco: I will update it in my slide to how Citus sometimes solves it because it's actually not an entirely solvable problem.

So I wanna talk a bit about there's kind of different implementations appearing for distributed Postgres, but it's also been this thing that people have dreamed of for many years, but it seems to never quite happen. And I wanna talk a bit about what's the technical problem behind it?

Why hasn't someone come in and like, submitted a patch to Postgres and it's now distributed. And there's new implementations appearing of distributed implementations of Postgres. But in my mind, fundamentally, if you spread data across many machines, the first thing that happens, everything gets slow because now you have to go to a different machine to get the data.

So it's no longer nicely compacted into one place. And this is a very important thing to understand, because Postgres is a relational database, so there are a lot of relationships within the data. And if all those relationships start spanning over a network, distribution doesn't help very much.

So, it's worth understanding that problem and then seeing, well, when does it help? Like, when does distribution make a lot of sense for what kind of applications and what kind of patterns do they use use to get there? So that's what I'm gonna talk about.

Claire: Awesome. We are a little bit over time, not that we're set in stone to end at the top of the hour. Pino, are there, are there more questions that we should have asked? Are there more things you wanted us to cover?

Pino: Not, not for me. I think we covered everything I thought of. But I, I wanna say thank you to our guests. This was really interesting.

Claire: Yes, Simon and Marco. I. I have been fans of your work, both of you, for quite a number of years now, and yeah, totally an honor to have you be the first guest for Pino and I here on Path To Citus Con. So thank you and again, big shout out to Aaron and Carol in the background, without whom this thing wouldn't even have happened today. And to everybody who came in the audience who participated in the chat and joined us and tweeted about this and let their friends know really appreciate your support.

Pino: And Claire, do we have another event coming up?

Claire: We do, we do. So next Wednesday April 12th, same time of day, 10:00 AM Pacific time which is a nice time slot for those of you in Europe, because hopefully it's like right before or right after dinner. Doesn't work for my friends in New Zealand, unfortunately.

But we have a number of guests: Melanie Plageman will be here. Samay Sharma. Abdullah Ustuner, whose last name I probably mispronounced. And Burak Y will be here also. And the topic is again, how to get Postgres ready for the next eleventy-million users. And so we've got some people who work on Postgres open source amongst that group and others who work on Citus open source. And we thought that would make an interesting discussion, too. I think somebody could probably drop the calendar invite in the chat for that episode two in case any of you wanna put it on your calendars I think it's aka.ms/pathtocituscon-ep02-cal or something like that. Anyway, Simon Marco, thank you.

Simon: Thank you so much. This has been really fun.

Marco: Yeah, same. It's, it's been great. And thanks everyone in the audience for tuning in. Yeah. Hope we get more of these for next year's Citus Con.

Claire: It was great. It was very cool.

And Pino, I love collaborating with you. This is cool.

Pino: Same here. I had a lot of fun. I'm looking forward to doing it again.

Claire: All right. See you next Wednesday everybody. And Marco and Pino, we'll see you at Citus Con on the 18th and 19th.

Pino: Can't wait.

Marco: Yeah. See you at Citus Con.

Claire: That's a wrap.

Pino: Bye everyone.

Simon: Cool. Thanks a lot. Bye everyone.

Creators and Guests

Host

Claire Giordano

Claire Giordano is head of the Postgres & Citus open source community initiatives at Microsoft. Claire has served in leadership roles in engineering, product management, and product marketing at Sun Microsystems, Amazon/A9, and Citus Data. At Sun, Claire managed the engineering team that created Solaris Zones, and led the effort to open source Solaris.

Host

Pino de Candia

Pino de Candia is a software dev manager at Microsoft since 2020 and is currently working on the Citus open source project. Pino previously worked on the managed PostgreSQL database service in Azure Cosmos DB for PostgreSQL, which includes Citus on Azure support for distributed PostgreSQL. Pino has lived in New Orleans since 2017.

Producer

Aaron Wislang

Open Source Engineering + Developer Relations at @Microsoft + @Azure ☁️ | @golang k8s 🐧 🐍 🦀 ☕ 🍷📷 🎹🇨🇦 | 😷 💉++ (inc. bivalent) | @aaronw.dev (on 🟦sky)

Producer

Carol Smith

Senior Program Manager at Microsoft in the Citus Community team. Previously at GitHub and Google. Horseback rider, cook, and armchair movie critic.

Working in public on open source with Simon Willison & Marco Slot

Broadcast by

Creators and Guests

headphones Listen Anywhere

Listen Anywhere