Newbie: working with a text file and converting to xml

hi Guys,

I have a tab-delimited text file that I would like to convert into an
xml file that can be read/imported into Apple’s Final Cut Pro.

The text file is 2 columns.
The first column is the time (timecode)
The second column is text (for sub-titling)

I thought this might be a good starting project to get into Ruby

Any suggestions on how I might approach this?

Thanks!

Adam T.

I have a tab-delimited text file that I would like to convert into an
xml file that can be read/imported into Apple’s Final Cut Pro.

The text file is 2 columns.
The first column is the time (timecode)
The second column is text (for sub-titling)

I thought this might be a good starting project to get into Ruby

Any suggestions on how I might approach this?

look at XMLBuilder and FasterCSV

Setup FasterCSV to use a tab as the delimiter instead of the comma and
then use it to read the input and then use XMLBuilder to output
datadata

should be fairly simple, or you can avoid libraries and do it by
yourself to learn more about ruby without getting bogged down in 3rd
party libs

x = Builder::XmlMarkup.new(:target => $stdout, :indent => 1)
x.instruct
x.timcode data
x.sub-title data

etc

Kev

Adam T. wrote:

hi Guys,

I have a tab-delimited text file that I would like to convert into an
xml file that can be read/imported into Apple’s Final Cut Pro.

The text file is 2 columns.
The first column is the time (timecode)
The second column is text (for sub-titling)

Could you send us 2 example files? I guess the text file format is
obvious (but better to work with a real-life example) but I am not so
sure about the Final Cut Pro XML (or is it just a plain simple XML?)

Until then, check out this code:

============================================================
input = <<INPUT
0.12 Salut, Foo!
0.15 Hola Bar! Did you see Baz?
0.22 I guess he is hanging around with Fluff and Ork.
INPUT

template = <<TEMPLATE
TIMECODE
SUB-TITLING
TEMPLATE

result = “<?xml version=\"1.0\" encoding=\"ISO-8859-1\"?>\n”

input.split(/\n/).each do |line|
data = line.split(/\t/)
result +=
template.sub(‘TIMECODE’){data[0]}.sub(‘SUB-TITLING’){data[1]}
end

result += ‘’

puts result

output:

<?xml version="1.0" encoding="ISO-8859-1"?>

0.12
Salut, Foo!
0.15
Hola Bar! Did you see Baz?
0.22
I guess he is hanging around with Fluff and
Ork.

Cheers,
Peter

__
http://www.rubyrailways.com

Hi Kev & Peter!

Thanks for respoding so quickly!

The text file looks pretty much like that

00:00:30:13 Swayambhunath Temple: building started 460AD
00:00:42:21 Durbar Square
00:01:05:06 Driving to Trisuli River for Rafting
00:01:55:22 Day 1 Trekking: Pokhara to Tirkhedhunga (1540m)
00:02:20:20 Day 2 Trekking: Tirkhedhunga to Ghorephani (2750m)
00:02:33:19 Day 3 Trekking: Ghorephani to Ghandruk (1940m)
00:02:42:04 Day 4 Trekking: Ghandruk to Pothana (1900m)
00:03:10:13 Day 5 Trekking: Pothana to Phedi (1130m)

It’ll take a while for your example to filter down into my brain - when
it does I’ll get back to you about it.

Awesome!

Thanykou so much!

Adam

Peter S. wrote:

Adam T. wrote:

hi Guys,

I have a tab-delimited text file that I would like to convert into an
xml file that can be read/imported into Apple’s Final Cut Pro.

The text file is 2 columns.
The first column is the time (timecode)
The second column is text (for sub-titling)

Could you send us 2 example files? I guess the text file format is
obvious (but better to work with a real-life example) but I am not so
sure about the Final Cut Pro XML (or is it just a plain simple XML?)

Until then, check out this code:

============================================================
input = <<INPUT
0.12 Salut, Foo!
0.15 Hola Bar! Did you see Baz?
0.22 I guess he is hanging around with Fluff and Ork.
INPUT

template = <<TEMPLATE
TIMECODE
SUB-TITLING
TEMPLATE

result = “<?xml version=\"1.0\" encoding=\"ISO-8859-1\"?>\n”

input.split(/\n/).each do |line|
data = line.split(/\t/)
result +=
template.sub(‘TIMECODE’){data[0]}.sub(‘SUB-TITLING’){data[1]}
end

result += ‘’

puts result

output:

<?xml version="1.0" encoding="ISO-8859-1"?>

0.12
Salut, Foo!
0.15
Hola Bar! Did you see Baz?
0.22
I guess he is hanging around with Fluff and
Ork.

Cheers,
Peter

__
http://www.rubyrailways.com

Adam T. wrote:

00:02:20:20 Day 2 Trekking: Tirkhedhunga to Ghorephani (2750m)
00:02:33:19 Day 3 Trekking: Ghorephani to Ghandruk (1940m)
00:02:42:04 Day 4 Trekking: Ghandruk to Pothana (1900m)
00:03:10:13 Day 5 Trekking: Pothana to Phedi (1130m)

It’ll take a while for your example to filter down into my brain - when
it does I’ll get back to you about it.


#!/usr/bin/ruby -w

data =<<EOL
00:00:30:13 Swayambhunath Temple: building started 460AD
00:00:42:21 Durbar Square
00:01:05:06 Driving to Trisuli River for Rafting
00:01:55:22 Day 1 Trekking: Pokhara to Tirkhedhunga (1540m)
00:02:20:20 Day 2 Trekking: Tirkhedhunga to Ghorephani (2750m)
00:02:33:19 Day 3 Trekking: Ghorephani to Ghandruk (1940m)
00:02:42:04 Day 4 Trekking: Ghandruk to Pothana (1900m)
00:03:10:13 Day 5 Trekking: Pothana to Phedi (1130m)
EOL

output = “<?xml version=\"1.0\" encoding=\"ISO-8859-1\"?>\n”

data.each do |line|
timecode,subtitle = line.strip.split("\t")
xml =
“#{timecode}#{subtitle}”
output += xml + “\n”
end

File.open(“output.xml”,“w”) { |f| f.write output }


The data block at the top can easily be replaced with a file reader
line:

data = File.read(filename)

Peter S. wrote:

I just noticed this. There is no closing tag for the XML header tag. The
XML
header tag is the only exception to a strict rule about tag formatting
in
XML (that a tag is either or has a …
).

Hi Peter,

I saved your code and called it convert.rb. I ran it (replacing
‘filename’ with the path of my text file - was that right to do?)

i got this error:
convert.rb:1: unknown regexp options - atal

any ideas?

also, do you know if thereis any way to run a script from the
commandline like?:
./convert.rb mytextfile.txt
i made a shell script that used this kind of thing - it took the input
file as something like $ARGV (i think - sorry i’m a super newbie!!)
make sense?

Thanks Peter!

Adam

Peter S. wrote:

Adam T. wrote:

The text file looks pretty much like that

Then it should be fine - as far as there are no tabs in the second
column. Of course even that would not mean an unsolvable problem but it
would not work with the code I sent you.

It’ll take a while for your example to filter down into my brain - when
it does I’ll get back to you about it.

Sure!

Awesome!
Yeah, Ruby is awesome! I am a beginner, too (picked up Ruby a few months
ago) and though I have very limited time to learn it, I can do a lot of
things already. The learning curve is really steep.

Cheers,
Peter

__
http://www.rubyrailways.com

Adam T. wrote:

The text file looks pretty much like that

Then it should be fine - as far as there are no tabs in the second
column. Of course even that would not mean an unsolvable problem but it
would not work with the code I sent you.

It’ll take a while for your example to filter down into my brain - when
it does I’ll get back to you about it.

Sure!

Awesome!
Yeah, Ruby is awesome! I am a beginner, too (picked up Ruby a few months
ago) and though I have very limited time to learn it, I can do a lot of
things already. The learning curve is really steep.

Cheers,
Peter

__
http://www.rubyrailways.com

Adam T. wrote:

Hi Peter,

I saved your code and called it convert.rb. I ran it (replacing
‘filename’ with the path of my text file - was that right to do?)

i got this error:
convert.rb:1: unknown regexp options - atal

any ideas?
I guess you are referring to Paul’s solution since I did not use any
files :slight_smile: In any case, could you paste the code here (convert.rb) so I
can check what’s going on?

also, do you know if thereis any way to run a script from the
commandline like?:
./convert.rb mytextfile.txt

Sure. The array called ARGV contains all the command line options.

------ test.rb
#!/usr/bin/ruby
puts ARGV[0]
puts ARGV[1]

./test rb foo bar

will output


foo
bar

Cheers,
Peter

__
http://www.rubyrailways.com

Hi,

However it only outputs the first line from my txt file:

<?xml version="1.0" encoding="ISO-8859-1"?>

00:00:30:13Swayambhunath Temple:
building started 460AD
00:00:42:21

Hmm strange. I have cut’n’pasted this code and the data from your
previous mail and
for me it works perfectly (as all other Paul’s solutions). Are you sure
your
input txt file is OK?

Are you on Mac? Maybe there can be something with the line breaks?

Apologies for my newbieness!
No need to apologize. In no time, you will be answering other’s
questions :slight_smile:

Peter

__
http://www.rubyrailways.com

Ah thanks Peter - yes on OSX - you are right, there is something funny
with the line breaks! Weird!

now i just have to work out how to add all the FCP xml stuff in there

I appreciate l all your help & encouraging words!!

Peter S. wrote:

Hi,

However it only outputs the first line from my txt file:

<?xml version="1.0" encoding="ISO-8859-1"?>

00:00:30:13Swayambhunath Temple:
building started 460AD
00:00:42:21

Hmm strange. I have cut’n’pasted this code and the data from your
previous mail and
for me it works perfectly (as all other Paul’s solutions). Are you sure
your
input txt file is OK?

Are you on Mac? Maybe there can be something with the line breaks?

Apologies for my newbieness!
No need to apologize. In no time, you will be answering other’s
questions :slight_smile:

Peter

__
http://www.rubyrailways.com

doh! Sorry guys!

Peter - thanks for the ARGV tips!

I think i have Paul’s script going using the ARGV

#!/usr/bin/ruby -w

data = File.read(ARGV[0])

output = “<?xml version=\"1.0\" encoding=\"ISO-8859-1\"?>\n”

data.each do |line|
timecode,subtitle = line.strip.split("\t")
xml =
“#{timecode}#{subtitle}”
output += xml + “\n”
end

File.open(“output.xml”,“w”) { |f| f.write output }

However it only outputs the first line from my txt file:

<?xml version="1.0" encoding="ISO-8859-1"?>
00:00:30:13Swayambhunath Temple:
building started 460AD
00:00:42:21

Apologies for my newbieness!

Cheers guys!

Adam