Need help with parsing a file and formatting matches

addis_a · October 22, 2014, 3:43am

What I’m trying to do is format matching text from a file. Currently
I’ve
got the basic output working, but ideally I’d like to do two additional
things.

Sort the paragraphs of output by group name, alphabetically.
Remove from output, Editors who are also a Primary or Secondary.

For #2 I tried doing the following, which didn’t work. For some reason
/primary/ is not seen as a variable:

puts "Primary: " + primary = $1 if key =~ /owner/ &&
value.match(/uid=(.),ou/)
puts "Editor: " + $1 if key =~ /uniquemember/ &&
value.match(/uid=(.),ou/)
&& !~ /primary/

Below is where I’m currently at with this. Any help would be
appreciated.
Thanks.

[Current code]:

File.readlines(‘test_groups.txt’).each do |line|
unless line.strip.empty?
myhash = Hash[line.strip.split(":")]
myhash.each do |key, value|
puts "Group: " + $1 if key =~ /dn/ && value.match(/cn=(.),ou/)
puts "Primary: " + $1 if key =~ /owner/ &&
value.match(/uid=(.),ou/)
puts "Secondary: " + $1 if key =~ /seeAlso/ &&
value.match(/uid=(.),ou/)
puts "Editor: " + $1 if key =~ /uniquemember/ &&
value.match(/uid=(.*),ou/)
end
end
puts “\n” if line.strip.empty?
end

[Current output]:

Group: group2
Primary: user1
Secondary: user2
Editor: user1
Editor: user2

Group: group1
Primary: user8
Secondary: user6
Editor: user8
Editor: user11
Editor: user20
Editor: user6

[Preferred output]:

Group: group1
Primary: user8
Secondary: user6
Editor: user11
Editor: user20

Group: group2
Primary: user1
Secondary: user2

[File contents of: test_groups.txt]:

dn: cn=group2,ou=groups,dc=example,dc=com
owner: uid=user1,ou=people,dc=example,dc=com
seeAlso: uid=user2,ou=people,dc=example,dc=com
uniquemember: uid=user1,ou=people,dc=example,dc=com
uniquemember: uid=user2,ou=people,dc=example,dc=com

dn: cn=group1,ou=groups,dc=example,dc=com
owner: uid=user8,ou=people,dc=example,dc=com
seeAlso: uid=user6,ou=people,dc=example,dc=com
uniquemember: uid=user8,ou=people,dc=example,dc=com
uniquemember: uid=user11,ou=people,dc=example,dc=com
uniquemember: uid=user20,ou=people,dc=example,dc=com

kentbarber · October 22, 2014, 9:46am

On Tue, Oct 21, 2014 at 6:36 PM, Kenton B. [email protected] wrote:

What I’m trying to do is format matching text from a file. Currently I’ve
got the basic output working, but ideally I’d like to do two additional
things.

Sort the paragraphs of output by group name, alphabetically.

In order to do this, you will have to accumulate all the groups in
some data structure, and sort them after parsing the whole file. You
cannot print as you go, since a line later in the file might go before
another one that you have already printed.
So, define a data structure to contain the group data, for example a
Struct or a full blown class and a container for one instance of that
class per group. Then you can sort them and print them all in one go
after the parsing.

Remove from output, Editors who are also a Primary or Secondary.

The problem here is that in every step of the loop you are only
looking at one line, so if you are looking at the line that contains
an Editor, you are already past the loop in which you saw the Primary
or the Secondary. So in order to skip the editors that are also
Primaries or Secondaries, you will have to store those outside of the
loop. I recommend, since you are already doing this accumulation as
per point 1 above, that you do the editor filtering after the full
parsing. Also, since you are explicitly checking each key in each step
of the loop I presume that the order of the keys in each group is not
fixed, and may change. So you might find a line with a uniquemember
before the seeAlso or the owner. If this is the case, then it’s
obviously better to parse each group fully, and the filter the editors
that have also the other roles.

For #2 I tried doing the following, which didn’t work. For some reason
/primary/ is not seen as a variable:

puts "Primary: " + primary = $1 if key =~ /owner/ &&

It’s usually better to use string interpolation than concatenation:

“Primary: #{$1}”

I’ll try to make a working example with my ideas later, but hopefully
this can get you started.

Jesus.

kentbarber · October 23, 2014, 9:01pm

Thanks for the reply Jesús. I haven’t worked with Structs or Classes.
I
came up with the following which works, except it doesn’t sort the
output
alphabetically by group. I’ll try and tackle that one tomorrow. If
anyone
sees anything glaringly bad about the code, please let me know.
Thanks,
Kent

unless STDIN.tty?
myarray = []

read in ldapsearch output and create array

ARGF.read.split("\n\n").reject { |l| l.empty? }.each do |paragraph|
paragraph.each_line do |line|
line.match(/(.): .=(.*),ou/)
myarray.push("#{$1} #{$2}")
end

# delete any blank lines
myarray.delete(" ")

# get group, primary, secondary variables from array
group = myarray[0].split.last
primary = myarray[1].split.last
secondary = myarray[2].split.last

# print variables
puts "Group: #{group}"
puts "Primary: #{primary}"
puts "Secondary: #{secondary}"

# get editors and print
myarray.each do |line|
  editor = line.split.last if line =~ /uniquemember/
  puts "Editor: #{editor}" if editor != nil && editor != primary &&

editor != secondary
end
myarray.clear
puts “”
end
end

On Wed, Oct 22, 2014 at 2:45 AM, Jesús Gabriel y Galán <

kentbarber · October 23, 2014, 11:11pm

On Thu, Oct 23, 2014 at 9:00 PM, Kenton B. [email protected] wrote:

Thanks for the reply Jesús. I haven’t worked with Structs or Classes.

I came up with my own solution, but then I had to keep with work
stuff, and I forgot about it. Now it’s gone, cause I saved in /tmp
(ouch!). I will try to reproduce something similar based on your
solution, with the sort included, but I can’t test it right now.

I
came up with the following which works, except it doesn’t sort the output
alphabetically by group. I’ll try and tackle that one tomorrow. If anyone
sees anything glaringly bad about the code, please let me know.
Thanks,
Kent

Group = Struct.new(:name, :primary, :secondary, :editors) do
def initialize
self.editors = []
end
end

unless STDIN.tty?
groups = []
# read in ldapsearch output and create array
ARGF.read.split(“\n\n”).reject { |l| l.empty? }.each do |paragraph|

Now, with the following code, you have clarified one of my questions:
the lines are always in order, so you know the first line is the
group, the second the primary, the third the secondary and the rest
are editors, correct?

#> paragraph.each_line do |line|
#> line.match(/(.): .=(.*),ou/)
#> myarray.push(“#{$1} #{$2}”)
#> end

Then, let’s replace this with:

name, primary, secondary, editors = paragraph.each_line.map {|line|
m = line.match(/.=([^,]),ou/); m.captures.first}
group = Group.new(name, primary, secondary)
group.editors.concat(*editors)
groups << group
end

Here we end the loop. We iterate through all the paragraphs, pushing
to the array a Struct (an object) for each group.
Now let’s sort it and print it:

groups.sort_by{|g| g.name}.each do |group|

print variables

puts “Group: #{group.name}”
puts “Primary: #{group.primary}”
puts “Secondary: #{group.secondary}”

get editors and print

removing editors that are also primary or secondary

(group.editors - [group.primary, group.secondary]).each do |editor|
puts “Editor: #{editor}”
end
end

This is untested, but it’s similar to what I wrote the other day
(although this is simpler because of the knowledge of the lines always
being in the correct order).

Hope this helps,

Jesus.

kentbarber · October 24, 2014, 1:25pm

On Fri, Oct 24, 2014 at 12:49 PM, Robert K.
[email protected] wrote:

Usage is defined

via a Symbol indicating a field

via a Proc which is a piece of executable code

The whole thing is quite verbose so I believe it may not really be
worthwhile for this case. I just found it interesting to explore and
that’s why I share it here. You can find it on GitHub:
Script extracting group information from a text file · GitHub

It’s nice ! Although for the loop, I prefer the approach of slicing
the input in paragraphs and processing a single group in each loop.
Cause I find the stuff about the current group and so on a bit
confusing.

Nice idea with the instruction hash !

Jesus.

kentbarber · October 24, 2014, 1:32pm

On Fri, Oct 24, 2014 at 1:24 PM, Jesús Gabriel y Galán
[email protected] wrote:

A regular expression which contains at least a capturing group is used

It’s nice ! Although for the loop, I prefer the approach of slicing
the input in paragraphs and processing a single group in each loop.
Cause I find the stuff about the current group and so on a bit
confusing.

In that case I would prefer a more efficient approach though:

File.foreach(“test_groups.txt”, “\n\n”) do |block|
block.each_line do |line|
key = line[/^\w+]
# …
end
end

This will only read one block and not the whole input.

Nice idea with the instruction hash !

Thank you!

Kind regards

robert

kentbarber · October 24, 2014, 12:49pm

On Thu, Oct 23, 2014 at 11:11 PM, Jesús Gabriel y Galán
[email protected] wrote:

On Thu, Oct 23, 2014 at 9:00 PM, Kenton B. [email protected] wrote:

Thanks for the reply Jesús. I haven’t worked with Structs or Classes.

For the fun of it I created a version which uses a declarative
approach: content of EXTRACT defines how content is extracted from the
line and how it is used (two ways):
A regular expression which contains at least a capturing group is used
to extract the data.

Usage is defined

via a Symbol indicating a field
via a Proc which is a piece of executable code

The whole thing is quite verbose so I believe it may not really be
worthwhile for this case. I just found it interesting to explore and
that’s why I share it here. You can find it on GitHub:

gist.github.com

https://gist.github.com/rklemme/cc930f9f373489c0d9ce

get-groups.rb

#!/usr/bin/ruby -w

Group = Struct.new :name, :primary, :secondary, :editors

GET_UID = /uid=([^,]*)/

EXTRACT = {
  "dn" =>           [:name, /cn=([^,]*)/],
  "owner" =>        [:primary, GET_UID],
  "seeAlso" =>      [:secondary, GET_UID],

This file has been truncated. show original

test_groups.txt

dn: cn=group2,ou=groups,dc=example,dc=com
owner: uid=user1,ou=people,dc=example,dc=com
seeAlso: uid=user2,ou=people,dc=example,dc=com
uniquemember: uid=user1,ou=people,dc=example,dc=com
uniquemember: uid=user2,ou=people,dc=example,dc=com

dn: cn=group1,ou=groups,dc=example,dc=com
owner: uid=user8,ou=people,dc=example,dc=com
seeAlso: uid=user6,ou=people,dc=example,dc=com
uniquemember: uid=user8,ou=people,dc=example,dc=com

This file has been truncated. show original

Kind regards

robert

kentbarber · October 28, 2014, 2:24pm

Thanks Robert and Jesús for your time and help! Looking at your code
and
mine, It’s obvious I have a lot of learning to do.
Kent

On Fri, Oct 24, 2014 at 6:46 AM, Jesús Gabriel y Galán <

kentbarber · October 24, 2014, 1:46pm

On Fri, Oct 24, 2014 at 1:32 PM, Robert K.
[email protected] wrote:

approach: content of EXTRACT defines how content is extracted from the
that’s why I share it here. You can find it on GitHub:
block.each_line do |line|
key = line[/^\w+]
# ...
end
end

This will only read one block and not the whole input.

Wow, I forgot about the separator param of foreach !!!
That’s actually really nice and readable.

Thanks, I learned (or remembered) something today :).

Jesus.