• 3 Posts
  • 92 Comments
Joined 9 months ago
cake
Cake day: September 27th, 2025

help-circle

  • P.S. This is a hypothesis, I haven’t even designed the test for it, much less run it. What follow are my suppositions.

    I think whether or not it’s a good idea depends on how similar all the models are. I don’t have a rigorous definition of “similar” but things like similar training data, similar design methodologies, similar QA processes would all contribute. Theoretically (I think), if they’re all dissimilar, they should each catch errors the others miss. However, the more similar they are, the more likely they have the same biases and weak spots, and your error rate from a response + verification may be the same or even higher than the error rate for just the original prompt, and you’d be unlikely to detect those errors using just two similar models. It can instill false confidence in the results because you’re doing something that should in theory increase the validity of the data, but in practice might make no difference or even make the quality of responses worse.


  • I think it’s tricky. It’s kind of like adding LLMs like vectors, and hopefully the effect can soften or at least reveal the shortcomings of individual models. Is it a good idea? I don’t know, I think there are good reasons to think it’s a waste of time and resources. I certainly think I’d need a better explanation of what use it would be before I spent more time building it. But I still think about what use it would be from time to time; I haven’t decided that it’s a bad idea yet.


  • One of the projects I started and never got to a satisfactory end state was basically that, plus a judging round. Every model would respond to the same prompt, then every model would evaluate every other model’s response for accuracy and completeness. Then the results would get logged to a spreadsheet.

    It’s simple enough, but for N models it requires N + N^2 model calls so it takes forever to run any decent dataset on consumer hardware. If I had the resources and a way to run it that didn’t fry the planet, I think it would be a cool running set of comparative benchmarks. IDK if it’d be useful at all but I’m still interested to see the data.




  • Yup, ollama, various models. I initially downloaded it because I, along with thousands of other people, wanted to see what would happen if I made models debate with each other after RAGging them with various books (The Prince, The Art of War, The complete works of Shakespeare, etc.).

    The results were uninteresting and I abandoned the project pretty quickly. I’ll sometimes use them for code analysis but they’re too slow on my rig to be really useful.


  • I have a similar struggle. I grew up in a family where taking psychiatric medication was a matter of deepest shame. Even long after I consciously rejected the idea, the underlying shame (and accompanying stress) was still very present and would sometimes be enough to convince me that I should be strong enough not to rely on crutches like medication just to get through the day. That impulse has never guided me wisely and it’s been a decades-long process of not letting it affect my behavior.

    Talk therapy helps. Addiction peer groups help, sometimes.



  • I’ve never had this particular model, but I’ve had pretty good success printing off Brother printers with the generic print drivers, I don’t think I’ve used the proprietary downloads in a while.

    Of note: I don’t have occasion to do scans all that often, so I can’t say if that works. Ditto the fax function, if that’s important all I can say is you have my pity. But I’ve used the print function to good success on a couple different machines.

    Still I’d recommend testing it before committing to permanent changes, if possible. Printers are mysterious and capricious.


  • I do too, I think general-purpose compute has become a too-cheap way to solve problems that have more durable (and repairable) mechanical solutions. It makes the sticker price lower even if the total cost over the lifetime of the vehicle (or laptop, or washing machine, etc.).

    I think it would be nice to have a law that certain hardware needs to have user-autitable and user-replaceable control software. If you want to ship your hardware product with some preinstalled software, the source code must be publicly available. I don’t know how it would get passed in America because it would make consumer electronics more expensive to manufacture, but I think it would be helpful in the long term to legally decouple hardware from closed-source software.


  • It surprises me too, but I actually think that’s fine. Lots of people have different skills and not everyone has to be the tech expert, as long as everyone acknowledges that it is an expertise and that that expertise deserves weight. But, most technical mistakes in an office setting don’t result in serious injury or death, and that happens a lot in mechanics shops. People who decide whether dangerous machines are safe enough would need to have a higher level of technical professionalism than your average desk jockey.

    I think that attitude is pretty common among people who make software that can get people killed (e.g. medical) but my experience is limited to a few secondhand conversations so I don’t know how well-established that culture is. I’ve only ever worked on software that, if it failed, meant that a few people would get very upset and a bunch of people would get mildly upset, then we’d fix it and everyone would move on pretty quickly.






  • There are good reasons for hiding a paper trail. Specifically in a self-hosting community, I understand operators wanting to hide their particular technical details from those who would wish to target them. This can be government agencies who like to arrest or kill dissidents, or freelance assholes who just like to attack queer infra where they can. I don’t think deleting posts is particularly effective, and the privacy concerns would be better addressed with a safe alt or a burner account, but I get why some people do it. Privacy is hard and when the stakes are high, people tend to over-secure rather than risk under-securing.


  • I have mixed feelings on post deletion. On the one hand, historical technical forum conversations are an incredibly valuable resource, and /c/selfhosted is a technical community. The value comes from having a history in context, and deleting part of the context damages the whole and makes the whole corpus less useful overall. It also allows incorrect or outdated information to fester when there isn’t a strong historical context that can be referenced.

    On the other hand, people are right to be concerned about leaving large tracts of text available on the open internet, where it can be scraped, profiled, and possibly de-anonymized. I am very sympathetic to those who delete out of concerns for their own privacy, and I don’t know what a good solution is.

    Maybe a compromise would be (on user “delete”) to leave the contents of a post intact, but simply delete the username from the post, and the post from the user’s history? Deletion on the fediverse is a bit of a sham anyway, and it would leave valuable discussions intact for other users.


  • Yep. Actually ran the rollout for Cursor at my last job, right before I got laid off lmao. I trained a bunch of devs on what do and what not to do. A bunch of my recent interviews have incorporated variations of the question “Do you think you could manage an LLM orchestration that would replace our junior devs?” and, I could but I don’t think I can muster the enthusiasm for it that people are looking for. Maybe that’s why I haven’t made it through the interview gauntlet. So much senior hiring right now seems to be looking for people to be the scapegoat for LLM bullshit and I ain’t looking for that kinda work.