Greetings, i have the following (fragment olny) :
devices = [{d1 => [0,1,0,1,0,0,…]},
{d2 => [0,0,1,1,0,1,…]},
{d3 => [1,0,1,1,0,1,…]},
…]
Each hash represents device, the array of zeroes and ones representing
the presence or absence of a specific software executable on that
device. (array index signifies a unique identifier to the executable)
Scenario:
A list of desktop computers (~600) that includes all software
executables deployed on each machine. There more than 1,000 distinct
executables deployed across these devices.
I am trying to identify common patterns of software deployments, looking
for an algoritm that will arrange the current state into a limited,
manageable set of discrete application profiles.
No two devices are exactly alike, even after filtering out all the
‘noise’ - different versions, common operating system and driver files
etc. So the comparison must proceed with some element of configurable
tolerance.
I have had a look at the Vector and Matrix classes but haven’t found any
suitable built in method that would allow me to submit a suitably
prepared list(an array of arrays?, an array of vectors?, a matrix?)
which is then rationalised into a few recognisably discrete groups.
Are there any libraries out ther i could leverage? I am not a maths /
statistics guy?
Regards, Rainer