Automating Fixture Information Retrieval: A python solution
During my tenure in the support team at Pitchero, we frequently received requests for fixture information from various leagues across the UK, with football being the predominant sport.
A substantial number of these requests pertained to fixture details from the FA Full-Time system. Notably, as of my time in support and at the time of writing, the FA Full-Time system remained a closed system without API accessibility. Consequently, the only available method for obtaining information was through the manual extraction of data from the website, involving the tedious task of copying and pasting information into a spreadsheet. This manual process was not only time-consuming but also susceptible to human errors.
Motivated by the need to streamline this labor-intensive process, I took the initiative to create a tool that could automate the retrieval of information from the webpage. Leveraging Python and the Selenium library, the tool was designed to visit the webpage autonomously and extract the necessary information.
This solution proved effective, significantly reducing the time spent on the task and enabling us to promptly provide accurate information to the leagues. In essence, the automation tool alleviated the challenges posed by the closed nature of the FA Full-Time system, enhancing efficiency and accuracy in handling fixture information requests.
Code examples
The following code is an example of the function that is used to scrape the data from the webpage. The whole code isn't provided here as it is quite long and not all of it is relevant to the article.
The main function for obtaining the data
def scrapeFixturesData(listingPgSrc):
try:
vehicEle = re.search(
'class="table-tab">Fixtures Grid</a>([\w\W]*?)</table>', listingPgSrc)
if vehicEle != None:
allRows = vehicEle.group(1)
allRowsData = re.findall('(<tr[\w\W]*?</tr>)', allRows)
for curRow in allRowsData:
try:
date = ''
home = ''
away = ''
kickofftime = ''
competition = ''
curEle = re.search(
'<tr[\w\W]*?>[\w\W]*?<td>[\w\W]*?</td>[\w\W]*?<td>([\w\W]*?)</td>[\w\W]*?<td>([\w\W]*?)</td>[\w\W]*?<td>([\w\W]*?)</td>[\w\W]*?<td>[\w\W]*?</td>[\w\W]*?<td>([\w\W]*?)</td>[\w\W]*?</tr', curRow)
if curEle != None:
dt = trim(curEle.group(1))
date = dt.split(' ')[0]
home = trim(curEle.group(2))
away = trim(curEle.group(3))
competition = trim(curEle.group(4))
if ' ' in dt:
kickofftime = dt.split(' ')[1]
if date != '':
dataCaptureFl.write("\"" + date + "\",\"" + home + "\",\"" +
away + "\",\"" + kickofftime + "\",\"" + competition + "\"\n")
dataCaptureFl.flush()
print(date + " | " + home + " | " + away +
" | " + kickofftime + " | " + competition + "\n")
dataCaptureFl.flush()
except Exception as e:
exc_type, exc_obj, exc_tb = sys.exc_info()
print("Error Handler2: " + str(e) +
"\nLine info :" + str(exc_tb.tb_lineno))
logFl.write("\nError Handler2: " + str(e) +
"\nLine info :" + str(exc_tb.tb_lineno))
logFl.flush()
dataCaptureFl.flush()
logFl.flush()
except Exception as e:
exc_type, exc_obj, exc_tb = sys.exc_info()
print("Error Handler on: scrapeFixturesData() " + str(e))
print("Line info :" + str(exc_tb.tb_lineno))
logFl.write("Error Handler on: scrapeFixturesData() " + str(e)+"\n")
logFl.write("Line info :" + str(exc_tb.tb_lineno)+"\n")
Handling the pagination of the data
if '/client/images/advpagination-next1.gif' in pg_src:
nextpageVailable = True
while(nextpageVailable):
try:
print("Next page Found")
pageCnt += 1
print("Current page Count: " + str(pageCnt))
logFl.write("Current page Count: " + str(pageCnt) + "\n")
nextpageEle = driver.find_element_by_name('navnextpage1')
nextpageEle.click()
time.sleep(10)
pg_src = driver.page_source
scrapeFixturesData(pg_src)
if '/client/images/advpagination-next1.gif' in pg_src:
nextpageVailable = True
else:
nextpageVailable = False
except Exception as e:
nextpageVailable = False
exc_type, exc_obj, exc_tb = sys.exc_info()
print("Error Handler1: " + str(e) +
"\nLine info :" + str(exc_tb.tb_lineno))
logFl.write("\nError Handler1: " + str(e) +
"\nLine info :" + str(exc_tb.tb_lineno))
logFl.flush()