Hi,
I switched from ‘my’ thread because this seems more appropriate to
this thread…
Previously “Re: Why Does Hash Apparently Reorder Its Internal
Representation And Other Associated Ponderings”
Or, Is It Possible To Have One’s Cake And Eat It Too?
First, a taxa of the different sorts of beasts of which we are
speaking might help:
a. associative array
i. unordered
ii. insertion order-ordered
iii. sort function-ordered
1. manual (adhoc)
2. automatic (upon insertion)
b. hash
i. unordered
ii. insertion order-ordered
iii. sort function-ordered
1. manual (adhoc)
2. automatic (upon insertion)
Does this cover it or is there a redundancy or something missing? For
instance are a.i. and a.ii. really the same thing by virtue of there
not being a hashing function? Does a.i. make any sense and can be
done away with from the get-go? And are a.ii. and b.ii, and a.iii.
and b.iii. really the same thing also? Not if the hash still retained
the hash as you would it expect it to.
And is there a term which could reasonably cover all of a. and b.? I
might have used the term ‘hash’ to refer to anything which is a
key-value pair, but this is apparently confusing or inaccurate.
I tried set, but that’s wrong. I tried enumerable, but that’s wrong
too.
I propose that the term, list, be used to cover anything which would
be able to mix in Enumerable and that the constituent parts of a list
are called items, rather than an element, which is a set/array term
and not appropriate for a hash or associative array. Still that
doesn’t give a name for associative lists. I love that when it just
comes out like that… AssociativeList it is then!
And so are these three sorts of data structures (array, associative
array, and hash) sufficiently different to warrant none subclassing
from the other? If so, then it would also probably warrant a syntax
addition and a core implementation (as discussed). And they are
presumably sufficiently similar that it might make sense to subclass
all of them from the same base class; list for instance, and then
mixin Enumerable there instead. Although one may as well just put the
methods in directly to List then however?
There have been concerns that a sort function-ordered or an
insertion-ordered, or just ordered hash will cause slowdowns and
memory bloat, so I was wondering if it was possible to adjust
according to need. So, thank you to Charles, as I am reminded of what
I did in compsci all that time ago!
Now given that it is possible or desirable to have a Hash redefine the
number of buckets, or rehash, is there any reason why it can’t also
reassign its internal type at a deeper level depending on access?
Thought experiment time…
What if all Hashes were able to be ordered, but rehashable for storage
type depending on how it is instantiated and how it is accessed? One
could specify storage organisation at instantiation or later, or allow
it to decide for itself what sort of hash it will be.
m = [:a => ‘a’] # m is for map. I could go for this term too as it
has brevity.
=> [:a => ‘a’]
m[0]
=> [:a => ‘a’] # m remains as an associative array.
m[:a]
=> {:a => ‘a’} # m becomes an explicit hash.
m[0]
=> {:a => ‘a’} # m becomes an associative array.
m[:a]
=> {:a => ‘a’} # m becomes an explicit hash.
m[[0]]
=> [:a => ‘a’] # m becomes an associative array.
{} = m, or
m {}= m, or
m.to_h
=> {:a => ‘a’}
This would be read as “make m equal to itself if it is already a hash,
else make it so”. Similarly, going the other way:
[]= m, or
m []= m, or
m.to_assoc
=> [:a => ‘a’]
Then perhaps one might like to also do something like this?..
n = [:b => ‘b’]
=> [:b => ‘b’]
o {}= n
=> {:b => ‘b’}
Useful? Anyhow, that was an aside.
Now, why not take this further?.. What about a generalized list? As
in:
l = [:a => ‘a’, :b, {:c => ‘c’}] # which is really, or at least
appears to be as if it were: {0 => [:a => ‘a’], 1 => :b, 2 => {:c =>
‘c’}}
The method of access determines how it is to be treated…
l{0}
=> {:a => ‘a’} # or should that be {0 => [:a => ‘a’]}?
l[[0]]
=> ‘a’
l[0]
=> [:a => ‘a’]
l{1}
=> {1 => :b} # or should that be {1 => {:b => nil}}?
l[1]
=> [1 => :b] # or should that simply be :b?
l[[1]]
=> :b
l[2]
=> {:c => ‘c’}
l[[2]]
=> ‘c’
l{2}
=> {2 => {:c => ‘c’}} # or should that be {:c => ‘c’}?
Having only lists might then do away with the literal instantiation of
an empty AssociativeArray problem (assuming a double bracket
notation):
a = [[]] # Is this a nested array or an associative array?
=> [[]]
list = []
array = [] # Really a list.
hash = {} # Explicit hash list sub-type declaration.
list = {:a => ‘a’}
=> {:a => ‘a’}
By assigning a non key-value pair to the hash it becomes a more
general list internally and this is represented by the form of
notation, from {} to [].
list[1] = ‘b’
=> [:a => ‘a’, ‘b’]
In a list the numeric value of an item is always present, so an
assignment which uses a numeric index will assume that it is a simple
placement at the position given by the numeric index of whatever
object is presented to the list.
An assignment which treats this as an associative array will work the
same as if it were an assignment without:
list[[2]] = ‘c’
=> [:a => ‘a’, ‘b’, ‘c’]
Another example might be good:
list[3] = {:d => ‘d’}
=> [:a => ‘a’, ‘b’, ‘c’, {:d => ‘d’}]
list[3]
=> {:d => ‘d’}
list[[4]] = {:e => ‘e’} # Allowable?
=> [:a => ‘a’, ‘b’, ‘c’, {:d => ‘d’}, {:e => ‘e’}]
list[4]
=> {:e => ‘e’}
list[[4]]
=> ‘e’
However,
list[0]
=> [:a => ‘a’]
list[[0]]
=> ‘a’
This is all of the top of my head, so it probably needs some work…
And as for a dynamically altering Hash, List also could change its
internal organisation on the fly depending on need. Such that if all
elements are homogenous, then the internal organisation would be
optimised for that homogeneity. If that changes to be heterogenous,
then a more general and less optimised form of internal organisation
can be chosen dynamically to accomodate.
I realise the overhead if types and methods of access changed all the
time, but does it?
t
P.S. The irony of my apparently having no thoughts about language
design is choking me.