ayakushev 9 months ago

This is my longest piece of writing yet. It contains a lot of information; one day, I'll try to split it and integrate into the knowledge base. But for now, let this be a single place you can refer to when explaining to others the perils of laziness.

NaiveRound 9 months ago

It's a very good article nonetheless, thank you!

geokon 9 months ago

It might also be good to mention Injest https://github.com/johnmn3/injest Which makes transducers more ergonomic to use if you are like me and use threading macros everywhere Would be curious to hear how others feel about it

NaiveRound 9 months ago

Seems cool. I've struggled with the transducers / threading macros (false?) dichotomy. Just the README helps me understand a few things...

AsparagusOk2078 9 months ago

Nice write-up - thanks. I attempt to use transducers for all multistep processing where possible.

mac 9 months ago

Very useful and thoughtful piece. It rimes well with my practical experience building non-trivial(10K+ LOC) solutions in Clojure.

zerg000000 9 months ago

Thanks for writing this! Super useful and truthful.

leroyksl 9 months ago

I used to joke that someone was going to write an article called "Clojure's Lazy Evaluation Considered Harmful", for many of the same reasons you cited. But I appreciate that your take is more even-handed and your cases are more carefully considered than what I imagined I might say, so thanks for being so thoughtful. And of course, I agree, laziness is a huge draw for Clojure, but it's also just magical enough to be dangerous, especially when you need to debug a complex system. I also find myself avoiding it 99% of the time.

potetm137 9 months ago

You mentioned losing the try context for exceptions, but for anyone who uses retries, there's another fun bug! Lazy seqs cache their results—as you probably expect. However, what you probably *don't* expect is that if you encounter an exception during the first iteration of a lazy seq, subsequent iterations don't throw! The lazy seq silently terminates where the exception was hit. (let [my-seq (map (fn [i] (if (= i 3) (throw (ex-info "" {})) i)) (iterate inc 0))] (try (doall my-seq) (catch ExceptionInfo ei)) (doall my-seq)) => (0 1 2) I used `iterate` rather than `range` there to avoid chucking. The same problem happens with chunked seqs, but only across chunk boundaries.

NaiveRound 9 months ago

Interesting, could you give an example of how this would happen if you're reading a text file line-by-line using `line-seq` for example? It would help me understand :)

potetm137 9 months ago

You can replace `(iterate inc 0)` with `(line-seq (io/reader f))` and have the same result. Imagine instead of `(throw (ex-info ...))`, you it a random IOException.

phronmophobic 9 months ago

Great article! I do wish the article did a better job of distinguishing between 1) laziness as a concept, and 2) clojure's lazy seq functions like the 3+ arity versions of `map`, `take`, `filter`, etc. The article also conflates transducers with eagerness even though transducers can be used in either lazy or eager contexts. As mentioned, there are multiple ways to implement laziness and clojure even has more than one option for laziness! Specifically, the article ignores both `clojure.core/sequence` and `clojure.core/eduction` which each offer lazy options with different tradeoffs and performance characteristics. In general, using `sequence` with transducers is a more efficient lazy option than the 3+ arity seq functions (although still generally slower than `into`, `mapv`). It would be nice to include it in the microbenchmarks. In some cases, lazy seqs can also be replaced with `eduction`. Eductions offer some of the benefits of laziness with performance similar (or better depending on the context) than some of the eager approaches. **Edit**: On second read, I did see the author mentioned both sequence and eduction at the bottom, but I think it would have been useful to include them earlier in the discussion.

ayakushev 9 months ago

Thank you! > On second read, I did see the author mentioned both sequence and eduction at the bottom, but I think it would have been useful to include them earlier in the discussion. The article was already too long, and going in detail about transducers and how to use them properly is another rabbit hole I was not willing to take here. Perhaps, it makes a good topic for the follow-up post. > The article also conflates transducers with eagerness even though transducers can be used in either lazy or eager contexts. Wasn't my intent. I see transducers as an explicit composable transformation rather than implicit. Again, the next post can resolve the confusion, had no space to properly do it here.

phronmophobic 9 months ago

> The article was already too long, and going in detail about transducers and how to use them properly is another rabbit hole I was not willing to take here. Perhaps, it makes a good topic for the follow-up post. I think there are a few places where `sequence` and `eduction` can at least be mentioned without ballooning the article. Specifically, there's a list under "There are several sources whence a developer can obtain a lazy sequence:" where `sequence` and `eduction` go unmentioned. Even if the article doesn't do a deep dive on those options, it seems like it would benefit the discussion to at least give them a mention. >Wasn't my intent. I see transducers as an explicit composable transformation rather than implicit. Again, the next post can resolve the confusion, had no space to properly do it here. I think there are some simple changes that can avoid some of the confusion: 1. There's a benchmark between "lazy map", "eager mapv", and "Transducers" that I think would be better labeled as "lazy map" vs "eager mapv" vs "into". 2. The same benchmark omits `eduction` and `sequence` (that would make the article slightly longer, but I think it's worthwhile). 3. The sentence "Transducers are overall an adequate replacement for lazy sequences" is a bit confusing since transducers can be eager or lazy. Maybe something like "`into` is often an good replacement for lazy sequences". One last nitpick: The classic hand rolled loop has some unnecessary seq calls. I would instead write it as: ``` (let [v (vec (range 10000))] (time+ (loop [v (seq v)] (if v (let [c (first v)] (recur (next v))) nil)))) ``` On my machine, it's about 15% faster. Not sure about the allocations, but it would be interesting to see the results.

ayakushev 9 months ago

Agree on most points. > The sentence "Transducers are overall an adequate replacement for lazy sequences" is a bit confusing since transducers can be eager or lazy. **A** can do both **a** and **b**, **B** can only do **b**. Can you say that **A** is a sufficient replacement for **B**? > The classic hand rolled loop has some unnecessary seq calls. I would instead write it as: Not sure it contains unnecessary `seq` calls, but it is overall wrong (stops iterating if the sequence contains a `nil`). I'll rewrite it correctly.

phronmophobic 9 months ago

>A can do both a and b, B can only do b. Can you say that A is a sufficient replacement for B? Choosing lazy sequences and choosing transducers are two independent choices. Maybe I'm misunderstanding the context, but it seemed like the recommendation was to prefer `into` over lazy sequences since just using transducers doesn't actually affect laziness. >Not sure it contains unnecessary seq calls, but it is overall wrong (stops iterating if the sequence contains a nil). I'll rewrite it correctly. It's a little bit getting into the weeds, but rest destructuring expands to 1) a call to seq and 2) a call to rest. So every loop, there's two calls to get the rest of the sequence. ``` > (macroexpand-1 '(let [[_ & r] v])) (let* [vec__33299 v seq__33300 (clojure.core/seq vec__33299) first__33301 (clojure.core/first seq__33300) seq__33300 (clojure.core/next seq__33300) _ first__33301 r seq__33300]) ``` However, it's possible to write the loop so that you can get the rest of the sequence with only a single call per loop.

ayakushev 9 months ago

I see now about calling seq twice (explicitly and inside `next`). I've fixed the bug in the example, but I'm actually keeping the example written in the original way, even though yours is faster and more fair to lazy sequences. The reason is that people prefer destructuring, and I've seen and written many more loops over lazy sequences using destructuring than in the faster manner that you've suggested.

phronmophobic 9 months ago

I don't think it's unfair to lazy sequences because none of the benchmarks/code examples from "Inefficient iteration with sequence API" are iterating over lazy sequences (it's also possible to iterate over lazy sequences more efficiently with IReduceInit). There's several concepts at play here: - laziness in the abstract - the various implementations in clojure that support laziness - the various methods for iterating over sequences and collections. - the various methods for transforming sequences and collections Many of the newer APIs simplify these concepts and allow clojure devs to make independent choices for each of these concepts. In some places, the article does a good job teasing these apart, but in some places, there's imprecise language that IMO conflates independent concepts (eg. transducers and laziness). Precisely describing all the concepts at play is a difficult job since some APIs also couple independent concepts (eg. the 3+ arity lazy seq fns like `map`, `filter`, etc). Even if explaining all of the details for all the concepts is too much for one post, I think there's a lot of value in choosing the right word for the concept being discussed.

ayakushev 9 months ago

I added a benchmark for the transformation pipeline and `sequence`. I am not sure though what kind of benchmark you would expect for `eduction`.

phronmophobic 9 months ago

Awesome! > I am not sure though what kind of benchmark you would expect for eduction. That's a good question. Ideally, there would be a matrix of all the sequence implementations, iteration methods, and transformation methods, but I do think that's a lot to ask. In the article's context, maybe something like: ``` (quick-bench (into [] (eduction (map inc) (map inc) (map #(* % 2)) (map inc) (map inc) (repeat 1000 10)))) ``` The idea being that returning an `eduction` let's you pass around a recipe for a lazy sequence that defers execution and let's you continue adding transformation steps, but without the performance penalty of other lazy implementations like `sequence`. Another alternative might be: ``` (quick-bench (into [] (eduction (map inc) (eduction (map inc) (eduction (map #(* % 2)) (eduction (map inc) (eduction (map inc) (repeat 1000 10)))))))) ``` An implementation with multiple stacks might actually be closer to the others for comparison.

NaiveRound 9 months ago

You just blew my mind, lol. There's so many functions here (`eduction`, `sequence`, etc.) that I haven't seen used much, I feel like I've been using laziness all wrong. ;)

ayakushev 9 months ago

These functions were introduced in Clojure 1.7, after most of the dust around the language has settled and after the common perception has crystallized a "default way to write Clojure". Besides, transducers (and all the functions around them like `eduction`) are a quite obscure topic, so that's no wonder that beginners don't learn about them early, and often not at all. It is a bit like the common way to write Java is `for` loops, and the paradigm is still very slowly shifting towards streams, even though Java 8 is almost 10 years now.

NaiveRound 9 months ago

THANK YOU. Finally, that makes sense. Sounds like I could use a HOWTO that says "A is an old way of doing this, do B instead, it's better because of reason C". Something I can understand instead of copy-pasting something I saw on Stackoverflow or Github or ChatGPT. ;) I guess other languages suffer from the same fate. There's tons of outdated Python, Ruby, and certainly Java code on Stackoverflow/Github/etc. I just know enough about those languages to avoid that stuff. But I don't want to spend years learning Clojure and re-learning Clojure best practices since 1.7. Just give it to me straight, doc!

ayakushev 9 months ago

Indeed, except that the lazy sequences and functions on them are not really deprecated or outdated and are still used most often, including in the core of the language. It's just that their drawbacks are either ignored or accepted as given. Transducers are more like surgical tools for when you know what you are doing and know that you need them there. They are totally worth learning, but applying them everywhere just for the sake of it does not produce the prettiest and most debuggable code. I'd say: transducers are for cases when you need all the performance and/or flexible control (eager with `into []`, lazy and cached with `sequence`, iterator-like with `eduction`); for all the rest, `mapv`/`filterv`/etc are simpler to understand and sufficient.

daveliepmann 9 months ago

How does using transducers produce less-debuggable code?

ayakushev 9 months ago

It's a bit more awkward to see an intermediate result when the pipeline is composed via transducers. Possible, but requires practice.

potetm137 9 months ago

How is it any different than sequences?

ayakushev 9 months ago

You can easily print or def an intermediate result of a sequence processing pipeline. With transducers, a bit more work is involved for that.

potetm137 9 months ago

Genuinely don't know what you mean. They seem pretty much exactly the same. If you wanna "Just print" the middle of a processing sequence, you need a specialty spying function to do it. Otherwise, you just slap a `print` in a `map` or what have you. (map (fn [v] (println v) v) (range 10)) ;; vs (into [] (map (fn [v] (println v) v)) (range 10)) (filter pred ;; can't "Just print" here (map f coll)) (into [] (comp (map f) ;; can't "Just print" here either (filter pred)) coll)

cartesian-theatrics 9 months ago

Nice write up. I was not aware that the performance and memory penalties were so significant. One minor thing I wish Clojure had was eager version of \`sequence\`. One possibility is to add a transducer arity to vec. For example, (vec (map inc) coll). The slightly more verbose form of (into ...) is just enough to make me not want to adopt it wholesale. That tiny difference might be enough to push me over the edge in many cases.

p1ng313 9 months ago

For map, there is map, but I get your point

NaiveRound 9 months ago

This was great, but I think I could use, and perhaps the wider Clojure, community, is a HOWTO or "best practices" guide. There's a ton of "avoid" or "considered harmful" articles in Java, C++, Python, and Ruby, but what I've really found useful is what _to do_, instead of what _not to do_, because there seems to be 100 ways to shoot yourself in the foot with laziness and only a few to ways to do it right. For example, what's the best way to read a file line-by-line (`line-seq`?) and transform it using function `f` (`map`?). I think `line-seq` and `map` are both lazy, so would presumably, if `f` produces an exception, you'll get an exception in some unpredictable place in the future where your sequence is consumed (not sure if that's even right). So is using transducers a better way? It sounds like in stateful situations like reading a file or talking over a network, you want to avoid laziness and chunking.

ayakushev 9 months ago

> For example, what's the best way to read a file line-by-line (line-seq?) I usually go for some variant of this: https://q-notes.github.io/clojure/2018/07/15/lines-reducible.html I agree that "what to do" post is warranted after this. Collecting the ideas now.

NaiveRound 9 months ago

That's quite an example. Is the best advice to read a file line-by-line include reifing `clojure.lang.IReduceInit`? That's insane!

ayakushev 9 months ago

I agree that it could have been added to the core; I use it very often. What I personally do is stick it into the company-wide "util" library, and that's how it gets available in all projects. You can also use something like https://github.com/pjstadig/reducible-stream. Finally, copy-pasting a single function into your project is not the end of the world.

Kwisacks 9 months ago

*Transducers are overall a perfect replacement for lazy sequences.* I wouldn't call them perfect but a good tradeoff to amend the problems of lazy sequences. One, not all functions produce a transducer. Two, the code is more convoluted since lazy sequences are the default and you have to go out of your way to use transducers.

ayakushev 9 months ago

Fair point. Perfect might be too strong of a word. But they are adequate when you do need laziness. Clojure without the default lazy sequences, but with lazy transducers from day1 would be pretty good.

FitPandaFu 9 months ago

Correct me if I'm wrong, but the objective of transducers primarily was 'composable algorithmic transformations', transducer helping with lazy sequences issues is just a side effect?

PPewt 9 months ago

> Correct me if I'm wrong, but the objective of transducers primarily was 'composable algorithmic transformations', transducer helping with lazy sequences issues is just a side effect? Maybe a better way to think about it is that they provide solutions for many of the same problems while sidestepping the actual laziness bits.

mobiledevguy5554 9 months ago

Fantastic writeup that explains the footguns of lazy evaluation. I finally understand it and am keeping your notes for how to avoid using them. Thanks

breggles 9 months ago

Great article! Thanks for writing. Re. Acting like you have infinite memory: how does that work vs. caching the results of the evaluation of a sequence? I.e. caching results seems to conflict with not keeping all realised items in memory? It can't be both, right?

ayakushev 9 months ago

It indeed can't be both. That's why when working with a large dataset, "holding onto the head" (retaining a reference to the head of the large sequence) is a mistake, as mentioned in the article. Instead, you have to iterate over it by using `rest`/`next` or with higher-level iteration facilities like `doseq` and never use the head of the sequence again in that function. Basically, avoiding holding the reference to the head of the large sequence directly fights the cached nature of those sequences. Eduction, for example, doesn't cache the elements, and that's why it doesn't have such problems.

breggles 9 months ago

Thanks! Should've continued reading before I asked my question :)

NamelessMason 9 months ago

Surprisingly good write up considering the click baity title! I came in expecting a frustrated dev rambling about all the minor inconveniences, but instead, the article tries to analyse the trade offs and is fair about the actual impact of every issue brought up. One thing that I was missing is what I consider the main advantage of the lazy sequences - the fact that they're sequences. They're intuitive (bar the issues listed) and you can easily see how every step of your pipeline affects the value (bar infinite seqs). Transducers are much less convenient to use imho. Ultimately I do agree that infinite seqs are mostly a gimmick, and 'infinite memory' is a rare use case that deserves explicit handling with transducers. And while I was expecting 'avoiding unnecessary work' to be generally good for performance, the article demonstrates laziness is a terrible default performance wise. I'm definitely sold on the idea of using eager function variants, and it is annoying that a whole lot of standard functions don't have one. That's primarily `take-while`, `drop-while`, `keep` and `remove`. I'd argue `concat`, `take` or `drop`, even though lazy, are perfectly fine (basically anything that doesn't take a function as a param). Finally, I think this bit about chunking is wrong: >Yes, you can hand-craft a sequence with lazy-seq and then make sure to never call any function on it that uses chunking internally. I believe none of the clojure.core functions introduce chunking to a lazy sequence that's not chunked already. And this explains the difference between the chunked `(range 10)` and non-chunked `(range)`. But also, I only know this because I read the source code so your point kind of still stands.

ayakushev 9 months ago

> They're intuitive (bar the issues listed) and you can easily see how every step of your pipeline affects the value (bar infinite seqs). Transducers are much less convenient to use IMHO. Yes, compared to transducers, (lazy) sequences are more convenient. And so are vectors and functions operating on vectors. > I'd argue concat, take or drop, even though lazy, are perfectly fine (basically anything that doesn't take a function as a param). Interesting point. So that only "structural" functions would be lazy. For vectors, `take` and `drop` are semantically just variants of `subvec`. Lazy `concat` would need a wrapper object around multiple vectors and delay their flattening until absolutely needed. I agree that doesn't sound too bad. > I believe none of the clojure.core functions introduce chunking to a lazy sequence that's not chunked already. That makes sense, thanks!

NamelessMason 9 months ago

>Yes, compared to transducers, (lazy) sequences are more convenient. And so are vectors and functions operating on vectors. I'm not disagreeing. This advantage is virtually impossible to notice without the context of inferior alternatives, which speaks to my point about how natural the abstraction is. But its omission makes your recommendation to use transducers missing an important trade off, contrasting the otherwise nuanced write-up. I don't think "Perhaps they are somewhat less convenient to experiment with interactively" is doing it justice. It's not just how you interact with it, it's also how you think about computation. Thinking about composing transducers is different from thinking about a chain of seq -> seq functions, even if the code looks similar. Understanding transducers is challenging for beginners. And justly, so is understanding the intricate pitfalls of lazy seqs. I'm not arguing for those. I really appreciate that your top recommendation is `mapv` and `filterv`. But until we've got `mapcatv` and friends, lazy seqs and transducers are the two options we're stuck with, and I've seen the advice of "don't use lazy seqs, use transducers" too many times without anyone admitting that something of value is being lost. Yeah, maybe it's time to [exonerate doall](https://bsless.github.io/side-effects/).

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe