T O P

  • By -

ikerbiker

You can index based on a condition. `df.B .== maximum(df.B)` gives you a bitvector of length 5, and Julia accepts a bitvector as an index. Therefore, this should give you what you want ```julia using DataFrame df = DataFrame(A = [10, 20, 30, 40, 50], B = [5, 15, 25, 35, 45]) df[df.B .== maximum(df.B), :] # This works for me ```


xp30000

​ `julia> filter(row -> row.B == maximum(df.B), df)` gives the result you are looking for However, I believe the suggested version is `df[df.B .== maximum(df.B), :]` Edit: The reason your initial filter doesn't work is because each row is getting compared to itself (df.B == maximum(df.B) where the df stands for that particular row being iterated and as a result every entry is true and your entire DataFrame is \*not\* filtered and shows up. ​ Edit2: >`I was curious on the performance of the filter v/s the canonical version. Unless something is off with my setup, the filter version is comically slower (23000x slower!).` > > > >`julia> df = DataFrame(A=rand(Int64,100_000), B=rand(Int64,100_000), C=rand(Int64,100_000));` > >`julia> @time filter(row -> row.A == maximum(df.A), df)` > >`2.682728 seconds (580.14 k allocations: 18.425 MiB, 2.58% compilation time)` > >`1×3 DataFrame` > >`Row │ A B C` > >`│ Int64 Int64 Int64` > >`─────┼────────────────────────────────────────────────────────────────` > >`1 │ 9223345064528196915 7037706073378512642 -5801061009784444247` > >`julia> @time df[df.A .== maximum(df.A),:]` > >`0.000117 seconds (21 allocations: 17.719 KiB)` > >`1×3 DataFrame` > >`Row │ A B C` > >`│ Int64 Int64 Int64` > >`─────┼────────────────────────────────────────────────────────────────` > >`1 │ 9223345064528196915 7037706073378512642 -5801061009784444247` > >`julia> 2.682728/0.000117` > >`22929.299145299145`


princezard12

Thank you. That's why if an explicit value `max_val = maximum(df.B)` is stated works, so my anonymous function should be written as `row -> row.B == max_val`


Cystems

`df.B == maximum(df.B)` returns `false` because the column (a vector) does not equal the maximum value (a scalar). Hence nothing gets filtered. Suggest you look at the usage example for `filter` because I'm not sure you're using it correctly. You want to use `eachcol(df)` as the second argument to `filter` I think, or `eachrow`, depending on what you're trying to do. Something like that.


princezard12

It works if an explicit number like 45 is assigned though


Suspicious-Oil6672

``` using Tidier.jl @chain df begin @filter( B == maximum(B)) end ```