Generating all possible combinations of a 5 digit pattern

zbrod · February 13, 2010, 8:01pm

This is probably childs play for most of you… But I lack the
experience/knowledge at this time, to do what I have in mind, in any
language (let alone Ruby). Side effect of working two jobs and being
very selective of what I do during downtime I suppose…

I guess I’ll start by what I am trying to do. I would make it a
point that I’m trying NOT to have prefab code handed to me, or others
do all the work - unless it really is that simple a task. But
recommended reading on functions, or good places I might find an
explanitory “learn as you copy the examples” type lesson on this
subject would be nice as well.

This is largely to be a one time, throw-away bit of code I guess, as
I only need to use it a couple times.

I will be reading in pattern matches from a Yatta (Yet Another
Telecide Tool for Anime) project file, and the goal is to look at
groups of 5 frames at a time for their pattern, and then mark a
specific frame for decimation, in an overrides file, for the
decimation utility I use. However manually parsing 10’s of
thousands of frames is a laborious task at the least and the big
problem with pattern matching is that no -current- method is fool
proof except the manual way. In Anime especially, the pattern can
shift at any time, or you may run into a section of 30fps video that
you want to avoid decimating at all.

YATTA is a great tool, but still too time consuming for me to use when
it comes to manual IVTC. So I am looking into the possibility of
writing a small utility that will import those pattern matches, in
blocks of 5 frames at a time, and compare them against a pre-defined
list of possible patterns, until the program finds an exact match, can
ID what frame that match is, and specify that frame number in the
overrides file.

The problem is… How do I generate all possible combinations of a 5
letter pattern, using only the letters “C” and “N” (the only
letters used by the particular IVTC filter I am using to ID frames)

i.e I may have a group of 20 frames that follows the standard
telecine pattern of

CCCNN CCCNN CCCNN CCCNN

But then it may shift out of nowhere into NNCNC or CCNCN or any
other possible 5 digit combination of those letters. Then it may
shift back again only after 5 frames, to the previous pattern. The
point is that the pattern can shift at any time, in any place,
sometimes even in the middle of a 5 frame section.

I think I can work out the program itself. But I am looking for a
quick and easy way to generate every possible 5 digit combination of
the letters " C" and “N” so I have the database I need to compare
each 5 frame segment.

Any helpful code, links, or suggestions would be appreciated.

-Zach

zbrod · February 13, 2010, 8:15pm

On 2010-02-13, Zach B. [email protected] wrote:

I think I can work out the program itself. But I am looking for a
quick and easy way to generate every possible 5 digit combination of
the letters " C" and “N” so I have the database I need to compare
each 5 frame segment.

That sounds like five bits.

Consider, then, using bit operations. Imagine that you have numbers
from
0 to 31. Each number must have a unique pattern of bits, and a total of
five bits. So…
0 00000 CCCCC
1 00001 CCCCN
2 00010 CCCNC
3 00011 CCCNN
…
31 11111 NNNNN

The question is, how do we turn a number into a set of binary digits?
One
way would be:
def letter(val, pos)
(val & (2 << pos)) ? ‘N’ : ‘C’
end

So that:
letter(16, 0) => ‘C’
letter(16, 1) => ‘C’
letter(16, 2) => ‘C’
letter(16, 3) => ‘C’
letter(16, 4) => ‘N’

So 16 => NCCCC.

-s

zbrod · February 13, 2010, 8:25pm

Hi,

that is very interesting and I hadn’t even considered using bits. Know
any good links where I could read more about bits / using bit
operations ?

If you would care to elaborate more on your example, that might help
me understand it better. Perhaps explaining what each line is doing? I
sort of get the idea, but I would like to understand the internals of
HOW it is doing it - in case I might want/need to modify the
example.

I think what I don’t understand the most, is the 2nd line

(val & (2 << pos)) ? ‘N’ : 'C

(inbetween the DEF and END) Where is it defining the maximum
number of letters to use in the generated combination, for example? Or
perhaps the example didn’t really cover all that and I’m mistaken.

Thank you for your assistance so far,
-Zach

zbrod · February 13, 2010, 9:40pm

Thanks again, for that.

I think I understand a lot better now. Also, I did kind of have a
“duh” moment when I was staring at the bottom section of code you
confirmed as responsible for how many letters to use.

Thanks again for your time. I will read over everything a couple of
times and play around (I just found out I don’t currently have an
interpreter installed, haha) when I get the chance.

And hopefully I will understand it at that point!

-Zach

zbrod · February 13, 2010, 9:15pm

On 2010-02-13, Zach B. [email protected] wrote:

that is very interesting and I hadn’t even considered using bits. Know
any good links where I could read more about bits / using bit
operations ?

I learned it before “links” existed, so I don’t know.

I think what I don’t understand the most, is the 2nd line

(val & (2 << pos)) ? ‘N’ : 'C

Okay.

(inbetween the DEF and END) Where is it defining the maximum
number of letters to use in the generated combination, for example? Or
perhaps the example didn’t really cover all that and I’m mistaken.

This part doesn’t cover that.

letter(16, 0) => ‘C’
letter(16, 1) => ‘C’
letter(16, 2) => ‘C’
letter(16, 3) => ‘C’
letter(16, 4) => ‘N’

This is where you decide how many letters to use – if you wanted to use
six
letters, you’d just add letter(x, 5).

Now, onto the core bit:
(which, by the way, has an OBVIOUS flaw in it. I missed it 'cuz I’m a C
programmer. And also a stupid typo)

(val & (2 << pos)) ? ‘N’ : 'C

You probably know about || (or) and && (and). “a || b” is true if
either a
is true or b is true. “a && b” is true if both a is true and b is true.

Now, imagine that you were to view a number as bits. The first bit has
the
value 1, the second 2, the third 4, the fourth 8, and so on. A number
is the
sum of the bits that are set in it; 16 is 0b1000, 15 is 0b0111, and so
on.

There are a few handy operations to perform on bits. Four common
logical
operations are used on bits. One is complement, written ~ in C. (I
don’t
even know off the top of my head whether Ruby has a complement operator,
but
I include it for completeness). Complement is also called “bitwise
not”,
because just as “!true == false” and “!false == true”, ~0 = 1 and ~1 =
0.

So if you had a four bit number x, and it were 16 (0b1000), ~x would be
0b0111, or 15. (Actually, in many cases, the top bit has special
meaning.
I’m ignoring that for now.)

The way bitwise operations are performed is by performing them
separately
on each bit, but & and | are just like && and || otherwise. 1 & 1 is 1,
1 & 0, 0 & 1, and 0 & 0 are all 0. Similarly, 1&anything is 1, 0&0 is
0.

So.

Let’s say you want to find out whether a number has the fourth bit set
in it.
You can use “x & 16”. Since 16 is 0b1000, every bit other than the 16s
bit
in the result is DEFINITELY zero. The 16s bit will be 1 if x had the
16s
bit set, and otherwise 0, so your result will be either 16 (if x had the
16s bit set) or 0 (if x didn’t have it set), no matter what other bits
were
set.

Now, in C, you could just use “x & 16” as a conditional, because 0 is
false
in C. But in Ruby, it’s not, so I should have written

((val & (2 << pos)) != 0) ? …

Now, you might be wondering about <<. <<, called “left shift”, means
“shift
all the bits left some number of times”. 0b0100 << 1 => 0b1000. 0b0001
<< 2
= 0b0100. There’s a corresponding right shift, which moves them the
other
way.

That means that 1 << x is the same as “the xth bit”. By contrast, “2 <<
x”
is a stupid typo.

So if you write
val & (1 << pos)
you get a non-zero value if val has the pos’th bit set, and otherwise
zero.
And that means that
(val & (1 << pos)) != 0
is true if val has the pos’th bit set, and otherwise false.
And that means that
((val & (1 << pos)) != 0) ? ‘N’ : ‘C’
is ‘N’ if val has the pos’th bit set, and otherwise ‘C’.

-s

zbrod · February 13, 2010, 11:00pm

A very (lazy) and bad way in 1.9:

([“B”, “C”]*5).combination(5).to_a.uniq.map(&:join)

Well, it’s a pity Enumerable#combination doesn’t work with n greater
than
Enumerable’s length.

That is the one-liner I had to give,

Regars,
BD

2010/2/13 Zach B. [email protected]

zbrod · February 13, 2010, 10:53pm

On Sat, Feb 13, 2010 at 1:00 PM, Zach B. [email protected] wrote:

I think I can work out the program itself. But I am looking for a
quick and easy way to generate every possible 5 digit combination of
the letters " C" and “N” so I have the database I need to compare
each 5 frame segment.

Any helpful code, links, or suggestions would be appreciated.

-Zach

Here is one, also based off of Seebs’ insight

digits = 5
format_string = “%0#{digits}d”
0.upto (1<<digits) - 1 do |i|
permutation = ( format_string % i.to_s(2)
).gsub(‘1’,‘N’).gsub(‘0’,‘C’)
puts permutation
end

First we find our format string “%0#{digits}d” The “%d” says to expect
an
integer (though it seems to understand that an integer represented as a
string is also valid), the “#{digits}” in between the ‘%’ and the ‘d’
tell
it there are however many digits in this integer. For example, if digits
was
equal to 5, the format string would be “%5d”, if it was equal to 10, it
would be “%10d”. And lastly, the 0 in front of the “#{digits}” tells it
that
if the number we pass in has fewer than the specified number of digits,
then
it should fill in the missing digits with zero. ie we get “00001”
instead of
" 1".
See module Kernel - RDoc Documentation for more on
format
strings

Second, it finds it’s range, 1<<5 gives us 32, but 32 has six binary
digits,
so we subtract 1 and get 31. So our range is 0 through 31. Note that you
could also write this as 0.upto 2**digits - 1 do |i|
See http://ruby-doc.org/core/classes/Fixnum.html#M001076
And Bitwise operation - Wikipedia

For each of those numbers, i, we convert it to a string, using a base 2
radix with i.to_s(2) this means that our number will be represented as
binary, and will be a string.
See http://ruby-doc.org/core/classes/Fixnum.html#M001050

Then we pass our base 2 (meaning binary) number into the format string.
with
format_string % i.to_s(2). So if our format string is “%05d” and i is 5,
then i.to_s(2) is “101” and “%05d” % “101” is “00101”
See class String - RDoc Documentation

Then we substitute the '1’s in our string with 'N’s and the '0’s in our
string with 'C’s, using gsub, which gives us the current permutation.
See class String - RDoc Documentation

zbrod · February 13, 2010, 11:44pm

On 02/13/2010 09:09 PM, Seebs wrote:

letter(16, 2) => ‘C’
(val & (2 << pos)) ? ‘N’ : 'C
even know off the top of my head whether Ruby has a complement operator, but

in C. But in Ruby, it’s not, so I should have written

So if you write
val & (1 << pos)
you get a non-zero value if val has the pos’th bit set, and otherwise zero.
And that means that
(val & (1 << pos)) != 0
is true if val has the pos’th bit set, and otherwise false.
And that means that
((val & (1 << pos)) != 0) ? ‘N’ : ‘C’
is ‘N’ if val has the pos’th bit set, and otherwise ‘C’.

I’m sorry, but this is overly complicated. If I want to know whether
the fourth bit is set I would use Fixnum#[]:

irb(main):002:0> 20.times {|i| printf “%3d %06b %d\n”, i, i, i[4]}
0 000000 0
1 000001 0
2 000010 0
3 000011 0
4 000100 0
5 000101 0
6 000110 0
7 000111 0
8 001000 0
9 001001 0
10 001010 0
11 001011 0
12 001100 0
13 001101 0
14 001110 0
15 001111 0
16 010000 1
17 010001 1
18 010010 1
19 010011 1
=> 20
irb(main):003:0>

There is also Bignum#[] with identical semantics, so you don’t need to
worry that exceeding Fixnum’s range creates problems:

irb(main):007:0> (1 << 40).class
=> Bignum
irb(main):008:0> (1 << 40)[0]
=> 0
irb(main):009:0> (1 << 40)[40]
=> 1
irb(main):010:0> (1 << 40)[41]
=> 0
irb(main):011:0>

If there are more than two alternatives things get more complicated. We
can use something like this to get the index value for each position
(example assumes 3 different values):

irb(main):014:0> 20.times {|i| a=i;x=[]
irb(main):015:1> until a == 0
irb(main):016:2> a, b = a.divmod 3; x.unshift b
irb(main):017:2> end
irb(main):018:1> printf “%3d %p\n”, i,x}
0 []
1 [1]
2 [2]
3 [1, 0]
4 [1, 1]
5 [1, 2]
6 [2, 0]
7 [2, 1]
8 [2, 2]
9 [1, 0, 0]
10 [1, 0, 1]
11 [1, 0, 2]
12 [1, 1, 0]
13 [1, 1, 1]
14 [1, 1, 2]
15 [1, 2, 0]
16 [1, 2, 1]
17 [1, 2, 2]
18 [2, 0, 0]
19 [2, 0, 1]
=> 20
irb(main):019:0>

Now combine that with storage of the values:

irb(main):019:0> val=%w{C N x}
=> [“C”, “N”, “x”]
irb(main):020:0> 20.times {|i| a=i;x=[]; until a==0;a,b=a.divmod
3;x.unshift val[b]; end;printf “%3d %p\n”, i,x.join}
0 “”
1 “N”
2 “x”
3 “NC”
4 “NN”
5 “Nx”
6 “xC”
7 “xN”
8 “xx”
9 “NCC”
10 “NCN”
11 “NCx”
12 “NNC”
13 “NNN”
14 “NNx”
15 “NxC”
16 “NxN”
17 “Nxx”
18 “xCC”
19 “xCN”
=> 20
irb(main):021:0>

A bit of adjusting needs to be done of course in order to fill leading
“zeros”.

Kind regards

robert

zbrod · February 14, 2010, 1:06am

Wow, maybe its just the size of your posting, but that looks even more
complicated, LOL…

I do appreciate the other ideas and opinions though. I am beginning
to wonder if I shouldn’t approach it differently, just from a method
of expedience (both in execution and iteration through pattern match
definitions). Or at least keep my options open. (not intending to
flip flop on what i’m doing after people have given me their time )

Would it make more sense to, for instance, parse through the project
file, in 5 frame groups as originally intended, but instead of
matching each group against pattern definitions, simply record the
pattern it is currently reading instead? If the pattern is new, then
a record of it and the frame ranges associated with it will be
recorded into some kind of data structure.

Like so.

“Pattern found: CCCNN” no existing definition, saving unique pattern
to database.

“Pattern found: CNCCN” no existing definition, saving… etc

And for any time it comes across a group of frames, if the pattern is
already there.

“Pattern found: CCCNN” Since it already exists as encountered
earlier, it simply notes the frame numbers for this particular
group, and saves them in a data structure referring to a specific
decimation type (i.e 5th frame). Every group matching that pattern
gets the 5’th frame decimated from the group, be it frame number 4,
9, 14, 49, 104, etc… (in video, 0 is actually first, like with array
indices )

I don’t necesarrily have to, or want to change from my original
proposal. But I got to thinking about it, and it seems like what I
was going to do was brute-force matching; and when I think of
brute-force, I think “slow”

Of course if my second idea might be easier to implement that would be
something to consider as well. But for now I’m just playing catch
up with the previous posts and studying the code examples. I
appreciate them greatly, even the one liners

-Zach

zbrod · February 14, 2010, 12:20am

On 2010-02-13, Robert K. [email protected] wrote:

I’m sorry, but this is overly complicated. If I want to know whether
the fourth bit is set I would use Fixnum#[]:

Oh, neat!

-s

zbrod · February 14, 2010, 2:43am

On 2/13/2010 7:05 PM, Zach B. wrote:

I saw a very basic one or two line example where a range of 3 letters
was given, equal to abc (I forgot how it was written) and it pretty
much spat out all the combos of those 3 letters.

I’m wondering if I made a big stink about nothing with my post now, if
there is a gem out there I could get, heh.

-Zach

This must be what you were talking about???
http://flori.github.com/permutation/doc/index.html
http://flori.github.com/permutation/

zbrod · February 14, 2010, 3:42am

Yes that is it. I played around with it but I got a lot of dupes,
and I’m not entirely sure I got all possible sequences of the two
letter’s C and N (but I’d have to mess around again to double check)

Expanding on the idea about simply storing unique patterns as I come
across them, I didn’t explain it fully I think.

The idea is basically to iterate through the project file, processing
every frame in blocks of 5. At the end I would then have a finite
list of every unique pattern in the video, and at that point I could
define the frame I want to decimate in each unique pattern.

OR I could just have it process each block of 5 frames, counting
the number of times C and N appears, and what their position is… This
is mostly where my proof of concept lies, based on some observations
I made (but have no coraborated with different source files) that
when you have an N frame that immediately proceeds a C frame, that N
frame is almost certainly a visual duplicate 99% of the time. Or in
the case of a group of N frames, or more than one N frame in the
pattern, the final and last occuring N proceeding a C frame,
likewise is the duplicate that needs to be removed.

I am thinking it would be perhaps be easier to write the program to
do this, and simply ID the position and frame number of the last
occuring N frame in a 5 frame pattern, and mark that frame number
for decimation.

That way no need to generate all these combinations, and set up a
matching algorithm to check every single one.

But this all hinges on the theory that I need to prove, that the
final N frame proceeding a C frame (even if its the 5th frame and the
next group of 5 begins with a C) is almost certain to be the
duplicate.

I think it would make a good proof of concept, and even if it varried
by source, I would think I could find ways to make it easily
adaptable depending on the stream and where the dupes are occuring…

What do you think?

-Zach

On Sat, 13 Feb 2010 20:39:26 -0500, Reid T.

zbrod · February 14, 2010, 4:07am

On 2/13/2010 7:05 PM, Zach B. wrote:

I saw a very basic one or two line example where a range of 3 letters
was given, equal to abc (I forgot how it was written) and it pretty
much spat out all the combos of those 3 letters.

You have only two letters, C and N. That is like have only two digits,
0 and 1. How many 5 digit combinations of 0 and 1 are there?

2**5.

So, count from 0 to 2**5-1, write the number in binary, then convert 0
to C and 1 to N:

irb(main):010:0> (0…(2**5-1)).map{|n| ("%05b" % n).tr(“01”, “CN”) }
=> [“CCCCC”, “CCCCN”, “CCCNC”, “CCCNN”, “CCNCC”, “CCNCN”, “CCNNC”,
“CCNNN”, “CNCCC”, “CNCCN”, “CNCNC”, “CNCNN”, “CNNCC”, “CNNCN”,
“CNNNC”, “CNNNN”, “NCCCC”, “NCCCN”, “NCCNC”, “NCCNN”, “NCNCC”,
“NCNCN”, “NCNNC”, “NCNNN”, “NNCCC”, “NNCCN”, “NNCNC”, “NNCNN”,
“NNNCC”, “NNNCN”, “NNNNC”, “NNNNN”]

Cheers,
Sam

zbrod · February 14, 2010, 4:30am

Sorry, I wrote “proceeding” when I meant “preceding” (as in N
frame comes directly before a C frame)

-Zach

zbrod · February 14, 2010, 1:08am

Also, to Josh:

I know I attrociously used the word “digit” in the title, and I think
once or twice in my initial post. I just wanted to make sure it was
understood I meant “letter” as in a 5 letter pattern. Since you
also spoke of digits I’m unsure if your code example as provided
would only work on actual alphanumeric digits, or would work on
characters to?.

I should have better proof read, so that’s my fault.
Also wanted to mention, I noticed the word “permutation” which
reminded me of something. Just before I discovered I in fact did
not have Ruby installed (but oddly I have RubyMine 2.0 installed) I
came across a function (or something of the sort) called permutate
or something close to that, via googling.

I saw a very basic one or two line example where a range of 3 letters
was given, equal to abc (I forgot how it was written) and it pretty
much spat out all the combos of those 3 letters.

I’m wondering if I made a big stink about nothing with my post now, if
there is a gem out there I could get, heh.

-Zach

zbrod · February 16, 2010, 10:03am

one more solution

p (0…(2**5-1)).map{|n| (n.to_s 2).tr(“01”, “CN”).rjust(5,“C”) }

zbrod · February 14, 2010, 12:16pm

Geee, now I feel -real- dumb. I appreciate that output, Sam.

-Zach

On Sat, 13 Feb 2010 22:02:51 -0500, Sam R. [email protected]