7 pat t e r n m at c h I n g w I t h r e g u L a r e X p r e s s I o n s


Finding Patterns of Text Without Regular Expressions



Yüklə 397,03 Kb.
Pdf görüntüsü
səhifə2/25
tarix29.11.2022
ölçüsü397,03 Kb.
#71308
1   2   3   4   5   6   7   8   9   ...   25
P A T T E R N M A T C H I N G W I T H

Finding Patterns of Text Without Regular Expressions
Say you want to find an American phone number in a string. You know the 
pattern if you’re American: three numbers, a hyphen, three numbers, a 
hyphen, and four numbers. Here’s an example: 415-555-4242.
Let’s use a function named 
isPhoneNumber()
to check whether a string 
matches this pattern, returning either 
True
or 
False
. Open a new file editor 
tab and enter the following code; then save the file as isPhoneNumber.py:
def isPhoneNumber(text):
 if len(text) != 12:
return False
for i in range(0, 3):
 if not text[i].isdecimal():
return False
 if text[3] != '-':
return False
for i in range(4, 7):
 if not text[i].isdecimal():
return False
 if text[7] != '-':
return False
1. Cory Doctorow, “Here’s What ICT Should Really Teach Kids: How to Do Regular 
Expressions,” Guardian, December 4, 2012, http://www.theguardian.com/technology/2012 
/dec/04/ict-teach-kids-regular-expressions/.


Pattern Matching with Regular Expressions
163
for i in range(8, 12):
 if not text[i].isdecimal():
return False
 return True
print('Is 415-555-4242 a phone number?')
print(isPhoneNumber('415-555-4242'))
print('Is Moshi moshi a phone number?')
print(isPhoneNumber('Moshi moshi'))
When this program is run, the output looks like this:
Is 415-555-4242 a phone number?
True
Is Moshi moshi a phone number?
False
The 
isPhoneNumber()
function has code that does several checks to see 
whether the string in 
text
is a valid phone number. If any of these checks 
fail, the function returns 
False
. First the code checks that the string is 
exactly 12 characters . Then it checks that the area code (that is, the 
first three characters in 
text
) consists of only numeric characters . The 
rest of the function checks that the string follows the pattern of a phone 
number: the number must have the first hyphen after the area code , 
three more numeric characters , then another hyphen , and finally 
four more numbers . If the program execution manages to get past all 
the checks, it returns 
True
.
Calling 
isPhoneNumber()
with the argument 
'415-555-4242'
will return 
True
. Calling 
isPhoneNumber()
with 
'Moshi moshi'
will return 
False
; the first 
test fails because 
'Moshi moshi'
is not 12 characters long.
If you wanted to find a phone number within a larger string, you would 
have to add even more code to find the phone number pattern. Replace the 
last four 
print()
function calls in isPhoneNumber.py with the following: 
message = 'Call me at 415-555-1011 tomorrow. 415-555-9999 is my office.'
for i in range(len(message)):
 chunk = message[i:i+12]
 if isPhoneNumber(chunk):
print('Phone number found: ' + chunk)
print('Done')
When this program is run, the output will look like this:
Phone number found: 415-555-1011
Phone number found: 415-555-9999
Done


164
Chapter 7
On each iteration of the 
for
loop, a new chunk of 12 characters from 
message
is assigned to the variable 
chunk
. For example, on the first itera-
tion, 
i
is 
0
, and 
chunk
is assigned 
message[0:12]
(that is, the string 
'Call me 
at 4'
). On the next iteration, 
i
is 
1
, and 
chunk
is assigned 
message[1:13]
(the string 
'all me at 41'
). In other words, on each iteration of the 
for
loop, 
chunk
takes on the following values:

'Call me at 4'

'all me at 41'

'll me at 415'

'l me at 415-'
• . . . and so on.
You pass 
chunk
to 
isPhoneNumber()
to see whether it matches the phone 
number pattern , and if so, you print the chunk.
Continue to loop through 
message
, and eventually the 12 characters 
in 
chunk
will be a phone number. The loop goes through the entire string, 
testing each 12-character piece and printing any 
chunk
it finds that satisfies 
isPhoneNumber()
. Once we’re done going through 
message
, we print 
Done
.
While the string in 
message
is short in this example, it could be millions 
of characters long and the program would still run in less than a second. A 
similar program that finds phone numbers using regular expressions would 
also run in less than a second, but regular expressions make it quicker to 
write these programs.

Yüklə 397,03 Kb.

Dostları ilə paylaş:
1   2   3   4   5   6   7   8   9   ...   25




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©www.azkurs.org 2024
rəhbərliyinə müraciət

gir | qeydiyyatdan keç
    Ana səhifə


yükləyin