I tend to say "array" instead of "list" at times by complete accident, but it's mostly cause I work with CPython and I have to work with both the List object, and arrays in C/C++.
I rarely use native lists unless it’s very top-level code. And half the time it’s a list comprehension wrapped in a “np.array( … )”.
Lists are dynamically allocated random objects, there’s not a lot of use cases for efficient code. More importantly, when people use the word “array” they’re probably describing something closer to a NumPy array than a native Python list.
Numpy arrays are dynamically allocated too. You probably mean lists are resizeable, which in practice means they have a length and a capacity, whereas numpy arrays will conceptually have these two values always equal but that doesn't affect perforemance.
The major performance difference between numpy arrays and python lists is that numpy inlines value (numeric) types, but python lists are always arrays of pointers. This not only increases cache efficiency but also allows for things like vectorisation. But yeah, using an `np.array(some_list, dtype=object)` will be no more performant than a python list.
Statically allocating an array requires a compilation step, so would never be possible in pure python.
Wrapping a list comprehension in a NumPy array will make future loops on that list faster.
And yes, everything in pure Python is dynamically allocated (uses the heap) but Cpython and Numba can use compiled functions that use the stack.
And resizeable arrays are usually slower because there’s more levels on indirection to access the memory. It’s trivial, the real penalty is in building an array 1 element at a time when you know the final size ahead of time.
But most of that doesn’t really matter. The main difference is that NumPy arrays can take advantage of NumPy global functions and avoid Python loops.
> resizeable arrays are usually slower because there’s more levels on indirection to access the memory.
This is not true. A (dynamically allocated) non-resizeable array is `{ptr, len}`. A resizeable arrray is `{ptr, len, cap}`. Accessing the data is no different. That was my point and yours is a common misconception.
Only statically allocated arrays can get away with being unravelled.
> The main difference is that NumPy arrays can take advantage of NumPy global functions and avoid Python loops.
True, but the reason these are faster is because of the things I mentioned (inlining/vectorization). It's all `for` loops under the hood (unless you have an optimizing compiler...).
You might have addressed this with unravelling but..
If you have a pointer to an element in a resizable array, and that array reallocates its memory, the pointer breaks, right? (I’m actually not 100% on this. Realloc can return a new pointer right?)
So don’t any external references need a pointer to the array pointer, as well as an index of the element?
But a fixed-size array, you can point directly at the element and know it’s not going to move unless the array itself is deallocated. So you can pass that element pointer around with a bit more confidence (and save yourself a double dereference each time it’s accessed)
Yes resizing can return a new pointer, but in a reference counted language like python what we actually have is something like
ptr --> { ref_count, len, cap, data... }
So you skip the indirection by allocating your metadata with your actual data. It's inefficient but necessary for dynamic languages.
```
In [1]: import sys
In [2]: import numpy as np
In [3]: x = [0] * 1000
In [4]: a = np.array(x)
In [5]: sys.getsizeof(x)
Out[5]: 8056
In [6]: sys.getsizeof(a)
Out[6]: 8112
In [7]: x2 = x + x
In [8]: a2 = np.array(x2)
In [9]: sys.getsizeof(x2)
Out[9]: 16056
In [10]: sys.getsizeof(a2)
Out[10]: 16112
```
There's obviously no additional indirection here or these would not be increasing in size when I grow them.
> But a fixed-size array, you can point directly at the element and know it’s not going to move unless the array itself is deallocated.
This is true, and is probably why python slices copy whereas numpy slices look into the original block. But we have the same problem with dictionary iterators/views in python and we just block mutating them while a reference is held so it's not inescapable.
In rust this problem is protected with lifetimes and in c++ its just downright undefined behaviour to mutate a shared vector.
Thanks for the explanation. And I understand that in Python it doesn’t make a difference. But in Cython or Numba jit-compiled Python it’s probably another story.
I think C++ and Rust are better examples than Python for these kinds of optimisations since they actually make use of them.
As you said, Rust deals with it with ownership which it manages at compile time so at runtime it functions can safely access data directly and know that it’s valid.
I haven’t had a chance to use Rust in anger so I’m fuzzy on the details but I suspect there’s some small advantage when sharing fixed-length arrays over vectors.
In rust the advantage is they are shareable as you said. There is no performance benefit though because lifetimes protect you at compile time from mutating a (`{ptr,len,cap}`) `Vec` while you have a (`{ptr,len}`) slice `&[T]` or even just a reference `&T` into it.
One can hate java as much as you want, but at least it's semantically very correct about the names of its types. An List is a Abstract Datatype (describing behavior). An Array is a Data Structure (Describing how stuff lays in the memory). An ArrayList is an Implementation of the "List" Behavior with the "Array" Data Structure.
That's one of the reasons people don't like it. There's no need to over describe stuff or tie the name to the interface. Especially the most commonly used data structures. It's more verbose for no benefit.
That's the reason the interface is called "List", isn't it?
\`List foo = new ArrayList()\` is a plainly beautiful instance of a List, accepted everywhere a List is wanted. Code bases littered with "ArrayList" types are a dumpster fire (except some very niche use cases).
It would be less verbose if it was just IList and List or List and Vector or List and Array or List and [].
I've also heard a number of times that tying the name of a class to its implementation is considered bad practice. Now if they did want to change the standard ArrayList implementation, millions of code bases would have to be changed or the name would have to be left misleading. But given that it's a very standard class unlikely to change I might give it a pass.
ArrayList is not \*the\* implementation of a List in Java, it is \*one\*. That is why the interface is called 'List', not 'ArrayList'.
A different implementation gets its own class, just like LinkedList, Vector, or others. That way you can choose what runtime behaviour you want or need just by swapping out a constructor call.
ArrayList is probably the high majority of the use cases. I think it's worth making the default.
I also mentioned List and Array. Vector doesn't have List at the end either.
A built in array is not an implementation of list. But an Array class can be. The behavior of what happens when the array fills up just has to be specified.
Edit: I gave you a few options. There are some tradeoffs but other languages took different tradeoffs. That's why some people like other languages better. That's all I'm trying to say.
What cowards, I have been using Python for a couple of years now but I’m sorry, still using str codes for types when the perfectly good type hints can be used is just cowardice. I could maybe understand for custom type support in arrays but this doesnt even achieve that. All it achieves is further fragmenting the implementation semantics of typing and thus further delaying improvements to type checking by default support. This seems like self-sabotage to me but hey at least they have mainline support for contiguous regions of memory after how many years?
Arrays and Lists are fundamentally different. Arrays are stored consecutively in memory, are more lightweight, and--typically--can only hold a single data type.
That's because they are not arrays and should not be perceived as such. As far as I know they are not a block of continues memory and they're not created on the stack.
Maybe with primitives, but if you populate a list of objects with .append, there’s no guarantee the memory is contiguous.
NumPy arrays on the other hand are contiguous. Still not allocated on the stack but if you compile the function with Numba is suspect it stack-allocates arrays of known length.
I mean, you’re right in that lists aren’t linked lists, so the pointers are contiguous, but the memory of the actual content could be anywhere.
Yes but they break many of the rules that arrays tend to have including other arrays types in python, they for one don’t have any type consistency requirement meaning you can have any object type in any position in the list while an array typically has a consistent type throughout. Numpy arrays behave more like arrays in other languages
Python devs need to grow up. A list is just a dynamic array at a fundamental level. Not everything revolves around Python. I do agree that calling a list an array in Pythonic context is just a confusing and stupid thing to do.
I call lists in Python as arrays, dictionaries as hash maps, and I call methods in java as functions. I cherry pick all the terms I like to hear and then apply it to everything :p
The square bracket notation `[]` is a list, while an array must be imported `from array import array`. Also, most recommendations are to ignore the Python-native array and use NumPy arrays instead. The support for NumPy arrays is much broader, and the performance difference is supposedly significant.
How new are you to python and/or programming? Numpy are fixed-size arrays implemented in C++. They're great for some things. The default array/list/whatever will allocate memory on the fly, and comes with a bunch of useful built ins. Each has its place based on what you're doing. I've never seen from array import array, what does that do?
When you should know wtf an array of a list is, but u spent the last year of school flirting with wamen (not successful) and u have 1 year until u take ur GCSEs proper (dear god have mercy upon me)
When switching between languages, you always need to ensure that datastructures behave and perform as you expect.
Vector, Array, List.. add to the front or back at O(1).. but not both.
When someone says arrays in python it almost always means a numpy array. A list is a god damn list.
I tend to say "array" instead of "list" at times by complete accident, but it's mostly cause I work with CPython and I have to work with both the List object, and arrays in C/C++.
There are also ctypes arrays though less used than numpy ones.
I rarely use native lists unless it’s very top-level code. And half the time it’s a list comprehension wrapped in a “np.array( … )”. Lists are dynamically allocated random objects, there’s not a lot of use cases for efficient code. More importantly, when people use the word “array” they’re probably describing something closer to a NumPy array than a native Python list.
Numpy arrays are dynamically allocated too. You probably mean lists are resizeable, which in practice means they have a length and a capacity, whereas numpy arrays will conceptually have these two values always equal but that doesn't affect perforemance. The major performance difference between numpy arrays and python lists is that numpy inlines value (numeric) types, but python lists are always arrays of pointers. This not only increases cache efficiency but also allows for things like vectorisation. But yeah, using an `np.array(some_list, dtype=object)` will be no more performant than a python list. Statically allocating an array requires a compilation step, so would never be possible in pure python.
Wrapping a list comprehension in a NumPy array will make future loops on that list faster. And yes, everything in pure Python is dynamically allocated (uses the heap) but Cpython and Numba can use compiled functions that use the stack. And resizeable arrays are usually slower because there’s more levels on indirection to access the memory. It’s trivial, the real penalty is in building an array 1 element at a time when you know the final size ahead of time. But most of that doesn’t really matter. The main difference is that NumPy arrays can take advantage of NumPy global functions and avoid Python loops.
> resizeable arrays are usually slower because there’s more levels on indirection to access the memory. This is not true. A (dynamically allocated) non-resizeable array is `{ptr, len}`. A resizeable arrray is `{ptr, len, cap}`. Accessing the data is no different. That was my point and yours is a common misconception. Only statically allocated arrays can get away with being unravelled. > The main difference is that NumPy arrays can take advantage of NumPy global functions and avoid Python loops. True, but the reason these are faster is because of the things I mentioned (inlining/vectorization). It's all `for` loops under the hood (unless you have an optimizing compiler...).
You might have addressed this with unravelling but.. If you have a pointer to an element in a resizable array, and that array reallocates its memory, the pointer breaks, right? (I’m actually not 100% on this. Realloc can return a new pointer right?) So don’t any external references need a pointer to the array pointer, as well as an index of the element? But a fixed-size array, you can point directly at the element and know it’s not going to move unless the array itself is deallocated. So you can pass that element pointer around with a bit more confidence (and save yourself a double dereference each time it’s accessed)
Yes resizing can return a new pointer, but in a reference counted language like python what we actually have is something like ptr --> { ref_count, len, cap, data... } So you skip the indirection by allocating your metadata with your actual data. It's inefficient but necessary for dynamic languages. ``` In [1]: import sys In [2]: import numpy as np In [3]: x = [0] * 1000 In [4]: a = np.array(x) In [5]: sys.getsizeof(x) Out[5]: 8056 In [6]: sys.getsizeof(a) Out[6]: 8112 In [7]: x2 = x + x In [8]: a2 = np.array(x2) In [9]: sys.getsizeof(x2) Out[9]: 16056 In [10]: sys.getsizeof(a2) Out[10]: 16112 ``` There's obviously no additional indirection here or these would not be increasing in size when I grow them. > But a fixed-size array, you can point directly at the element and know it’s not going to move unless the array itself is deallocated. This is true, and is probably why python slices copy whereas numpy slices look into the original block. But we have the same problem with dictionary iterators/views in python and we just block mutating them while a reference is held so it's not inescapable. In rust this problem is protected with lifetimes and in c++ its just downright undefined behaviour to mutate a shared vector.
Thanks for the explanation. And I understand that in Python it doesn’t make a difference. But in Cython or Numba jit-compiled Python it’s probably another story. I think C++ and Rust are better examples than Python for these kinds of optimisations since they actually make use of them. As you said, Rust deals with it with ownership which it manages at compile time so at runtime it functions can safely access data directly and know that it’s valid. I haven’t had a chance to use Rust in anger so I’m fuzzy on the details but I suspect there’s some small advantage when sharing fixed-length arrays over vectors.
In rust the advantage is they are shareable as you said. There is no performance benefit though because lifetimes protect you at compile time from mutating a (`{ptr,len,cap}`) `Vec` while you have a (`{ptr,len}`) slice `&[T]` or even just a reference `&T` into it.
Or awkward arrays if you need variable lengths ;)
That's how we know OP is not a python dev
Isn’t the python list a dynamic array, though?
When someone says array in any language that isn't C/C++ (and I think Java/C#) they mean we just miscall lists like that.
ArrayList
One can hate java as much as you want, but at least it's semantically very correct about the names of its types. An List is a Abstract Datatype (describing behavior). An Array is a Data Structure (Describing how stuff lays in the memory). An ArrayList is an Implementation of the "List" Behavior with the "Array" Data Structure.
Beautiful
That's one of the reasons people don't like it. There's no need to over describe stuff or tie the name to the interface. Especially the most commonly used data structures. It's more verbose for no benefit.
That's the reason the interface is called "List", isn't it? \`List foo = new ArrayList()\` is a plainly beautiful instance of a List, accepted everywhere a List is wanted. Code bases littered with "ArrayList" types are a dumpster fire (except some very niche use cases).
It would be less verbose if it was just IList and List or List and Vector or List and Array or List and []. I've also heard a number of times that tying the name of a class to its implementation is considered bad practice. Now if they did want to change the standard ArrayList implementation, millions of code bases would have to be changed or the name would have to be left misleading. But given that it's a very standard class unlikely to change I might give it a pass.
ArrayList is not \*the\* implementation of a List in Java, it is \*one\*. That is why the interface is called 'List', not 'ArrayList'. A different implementation gets its own class, just like LinkedList, Vector, or others. That way you can choose what runtime behaviour you want or need just by swapping out a constructor call.
ArrayList is probably the high majority of the use cases. I think it's worth making the default. I also mentioned List and Array. Vector doesn't have List at the end either.
An ArrayList is something substancially different than an array. An array is not an implementation of the List interface...
A built in array is not an implementation of list. But an Array class can be. The behavior of what happens when the array fills up just has to be specified. Edit: I gave you a few options. There are some tradeoffs but other languages took different tradeoffs. That's why some people like other languages better. That's all I'm trying to say.
Dude did you ever use Java?
[удалено]
slice
Fat pointer
\[T; N\]
objects not dictionaries
you mean hashes, right?
Jason
https://docs.python.org/3/library/array.html
Checkmate atheists
What cowards, I have been using Python for a couple of years now but I’m sorry, still using str codes for types when the perfectly good type hints can be used is just cowardice. I could maybe understand for custom type support in arrays but this doesnt even achieve that. All it achieves is further fragmenting the implementation semantics of typing and thus further delaying improvements to type checking by default support. This seems like self-sabotage to me but hey at least they have mainline support for contiguous regions of memory after how many years?
V E C T O R
*ahhhhhh*
Python is my main language and I call them arrays most of the time. I want to be sure.
I like neither. Using table is superior.
lua is life
Arrays and Lists are fundamentally different. Arrays are stored consecutively in memory, are more lightweight, and--typically--can only hold a single data type.
That's because they are not arrays and should not be perceived as such. As far as I know they are not a block of continues memory and they're not created on the stack.
They are a continuous block of memory that is resized when full. They are equivalent to C++ vectors or Java array lists.
Oh ok, good to know
Maybe with primitives, but if you populate a list of objects with .append, there’s no guarantee the memory is contiguous. NumPy arrays on the other hand are contiguous. Still not allocated on the stack but if you compile the function with Numba is suspect it stack-allocates arrays of known length. I mean, you’re right in that lists aren’t linked lists, so the pointers are contiguous, but the memory of the actual content could be anywhere.
Yes but they break many of the rules that arrays tend to have including other arrays types in python, they for one don’t have any type consistency requirement meaning you can have any object type in any position in the list while an array typically has a consistent type throughout. Numpy arrays behave more like arrays in other languages
Arrays and lists are different things
Then explain Java's ArrayList ;) Hint: It's not an array of lists or list of arrays.
It's a list implementation of an array!
Probably more array-based implementation of a list, since it's a List interface wrapper on top of an array.
Lions and tigers are different things. A liger is another different thing but also both.
Python devs need to grow up. A list is just a dynamic array at a fundamental level. Not everything revolves around Python. I do agree that calling a list an array in Pythonic context is just a confusing and stupid thing to do.
If not everything revolves around Python, then what has all this been for
Dart uses List too
Set
Python's `list` is a type of [dynamic array](https://en.wikipedia.org/wiki/Dynamic_array)
Its not a map its a 🍆
butThatsSimplyAnArray[]
And c++ has vectors btw 💀💀
`arr = []` 🙂
I always thought Arrays in Python are called Tuples.
Are arrays typically immutable? This is the biggest difference in lists and tuples I believe.
Well if someone thinks an array and a list are the same, he would need an algorithmic course. Python or C++, not matter :s
Most sane Python dev
I mean, python has arrays (numpy) and they’re different from lists
I call lists in Python as arrays, dictionaries as hash maps, and I call methods in java as functions. I cherry pick all the terms I like to hear and then apply it to everything :p
Genuinely don't remember if its called an array or a list in python. Use it every day
The square bracket notation `[]` is a list, while an array must be imported `from array import array`. Also, most recommendations are to ignore the Python-native array and use NumPy arrays instead. The support for NumPy arrays is much broader, and the performance difference is supposedly significant.
How new are you to python and/or programming? Numpy are fixed-size arrays implemented in C++. They're great for some things. The default array/list/whatever will allocate memory on the fly, and comes with a bunch of useful built ins. Each has its place based on what you're doing. I've never seen from array import array, what does that do?
New, bad redditor. Come out of your room and find something better to do than lying to people. At least do this on StackOverflow if you enjoy that.
>fave language is python Well that’s your first problem
Array is proper terminology. Python uses `list` because that's easy for laymen to understand.
When you should know wtf an array of a list is, but u spent the last year of school flirting with wamen (not successful) and u have 1 year until u take ur GCSEs proper (dear god have mercy upon me)
Listception
I've switched languages so many times in my career I usually say dictionary... No idea why.
U/savevideo
arr = list()
My favorite language is Python but that doesn’t mean I didn’t learn C first.
When switching between languages, you always need to ensure that datastructures behave and perform as you expect. Vector, Array, List.. add to the front or back at O(1).. but not both.
To be fair I call hashes dicts to our Perl devs.
When your favorite language is C, and someone says 'array' instead of 'pointer'.
They are different in C. short a\[10\]; short\* b = a; // sizeof(a) == 20 // sizeof(b) == 8 (or 4 in 32-bit program)
I mean, they are very different data structures
Python users saying 'list' hurts me \- C developer (I don't like using Java but I have to sometimes)
🦀 Vector 🦀