"A" and "an" articles in front of words

On Tue, Nov 08, 2011 at 05:23:31PM +0900, Gonalo C. Justino wrote:

does different pronunciation comes from the subsequent letters ? i’m
thinking uMBrella, uNCle, uRGengt, uNDer, uGLy, uPPer, uRGe but uNIcorn,
eULogy (or is this “an eulogy”? now i’m confused)… i’m wondering if two
consonants make it “an” and at least one vowel make in “a”. Maybe I’m just
ramblingm, this sounds so un-rubyesque :S

You’re right about unicorn and eulogy. I’m interested in checking out
the correlation between second-and-third letters and vowels that become
consonants in pronunciation now, to see how strong a correlation that
is.
I’m pretty sure there are exceptions to these perceived rules, though,
in
any case.

It seems likely that, most often, you’d get the following results, where
V means “vowel” and C means “consonant”. Lower case letters are
literals. In each case, two adjacent vowels are assumed to be
different vowels.

uCC: treat as vowel
uCV: treat as consonant
VVC: treat as consonant
yC: treat as vowel
yV: treat as consonant

These are only my immediate impressions, so far. Assuming for
argument’s
sake that they’re correct for the general case, though, there would
almost certainly be exceptions for every one of these correlations, and
the question that arises then is whether the exceptions are rare enough
to warrant using these correlations as rules with a set of exceptions
used to override them, or numerous enough for it to make more sense to
just use an extensive dictionary to handle such matters.

If I get really bored, I may put together a really extensive dictionary
to cover this, then use it to determine the strength of such
correlations some day (or week or month), but not today.

google hasn’t helped: does anyone have or know of a “complete” list of
english words ?

On Nov 6, 2011, at 1:50 PM, Ryan D. wrote:

irb(main):022:0> article_for ‘dog’
=> “a dog”
irb(main):023:0> article_for ‘animal’
=> “an animal”

A regex is not that big of a hammer, and doing this is one method dispatch over
two has direct performance benefits (if that matters):

Yeah, but a lot of the performance improvement is because regex is a
first class language feature, and thus implemented in C (try match()
versus =~ to measure method dispatch overhead). That said, eliminating a
method dispatch might not make as much difference when you start
fleshing out the functionality with things like exception lists (like
the Rails inflector) to handle cases like “unicorn” versus “uncle”.

A complete dictionary shouldn’t be necessary. Just exceptions. Look at
how Rails handles pluralization. You can use the algorithm:

  • if work starts with consonent, use “a”
  • if word matches entry in exception list, use designated article
  • else use “an” if it’s a for-certain vowel [‘a’, ‘e’, ‘i’, ‘o’, ‘u’]

This way, you only do a lookup for words starting with possible vowels
[‘a’, ‘e’, ‘i’, ‘o’, ‘u’, ‘y’, ‘h’]

You might even extend the consonant searching algorithm to use some
heuristics as suggested the email below:

‘a’ if word =~ /^[-aieouyh]/ || word =~ /^u[-aieouyh] || word =~
/^y[-aieouyh]/

The problem is that the choice between ‘a’ and ‘an’ has to do with the
way the word sounds in a given English (i.e., American, British). It
is unlikely you will capture all the cases with a dictionary, hence the
suggestion that the algorithm use a set of commonly encountered
exceptions, accepting the fact that it will be incomplete and sometimes
a bit embarrassing – but no more so that the pronunciation of words by
my nav. system’s text to speech :slight_smile:

On Thu, Nov 10, 2011 at 02:36:13AM +0900, steve ross wrote:

  • else use “an” if it’s a for-certain vowel [‘a’, ‘e’, ‘i’, ‘o’, ‘u’]

This way, you only do a lookup for words starting with possible vowels [‘a’,
‘e’, ‘i’, ‘o’, ‘u’, ‘y’, ‘h’]

These two statements are contradictory.

Right. I got the order backwards.

  • Use “an” if it’s not a “disputable” vowel %w(u y h)
  • else do a lookup

Better?

On Thu, Nov 10, 2011 at 05:55:57AM +0900, steve ross wrote:

Right. I got the order backwards.

  • Use “an” if it’s not a “disputable” vowel %w(u y h)
  • else do a lookup

Better?

Yes – apart from the fact that “e” at least might fall into a consonant
niche under certain circumstances.

What you want is the CMU Pronouncing Dictionary
[The CMU Pronouncing Dictionary]. You needn’t include
the whole dictionary and read it in real time - just do a
preprocessing run to find words whose spelling starts with a vowel but
whose pronounciation starts with a consonant, and vice versa.

martin

2011/11/9 Gonalo C. Justino [email protected]:

-----Messaggio originale-----
Da: Martin DeMello [mailto:[email protected]]
Inviato: luned 14 novembre 2011 02:44
A: ruby-talk ML
Oggetto: Re: “A” and “an” articles in front of words

What you want is the CMU Pronouncing Dictionary
[The CMU Pronouncing Dictionary]. You needn’t include the
whole dictionary and read it in real time - just do a preprocessing run
to
find words whose spelling starts with a vowel but whose pronounciation
starts with a consonant, and vice versa.

martin

2011/11/9 Gonalo C. Justino [email protected]:

wondering if two consonants make it “an” and at least one vowel
make in “a”. Maybe I’m
just
ramblingm, this sounds so un-rubyesque :S

You’re right about unicorn and eulogy. I’m interested in checking
out the correlation between second-and-third letters and vowels that
become consonants in pronunciation now, to see how strong a correlation
that is.
VVC: treat as consonant
yC: treat as vowel
yV: treat as consonant

These are only my immediate impressions, so far. Assuming for
argument’s sake that they’re correct for the general case, though,
there would almost certainly be exceptions for every one of these
correlations, and the question that arises then is whether the
exceptions are rare enough to warrant using these correlations as
rules with a set of exceptions used to override them, or numerous
enough for it to make more sense to just use an extensive dictionary to
handle such matters.

If I get really bored, I may put together a really extensive
dictionary to cover this, then use it to determine the strength of
such correlations some day (or week or month), but not today.


Chad P. [ original content licensed OWL: http://owl.apotheon.org
]


Caselle da 1GB, trasmetti allegati fino a 3GB e in piu’ IMAP, POP3 e
SMTP autenticato? GRATIS solo con Email.it http://www.email.it/f

Sponsor:
Riccione Hotel 3 stelle in centro: Pacchetto Capodanno mezza pensione,
animazione bimbi, zona relax, parcheggio. Scopri l’offerta solo per
oggi…
Clicca qui: http://adv.email.it/cgi-bin/foclick.cgi?mid982&d)-12