Hvordan får jeg optionsdata gratis

En introduktion til webskrabning til finansiering

Har du nogensinde ønsket, at du kunne få adgang til historiske indstillinger, men blev blokeret af en betalingsvæg? Hvad hvis du bare vil have det til forskning, sjov eller at udvikle en personlig handelsstrategi?

I denne vejledning lærer du, hvordan du bruger Python og BeautifulSoup til at skrabe økonomiske data fra Internettet og opbygge dit eget datasæt.

Kom godt i gang

Du skal have mindst en fungerende viden om Python og webteknologier, før du begynder denne tutorial. For at opbygge disse anbefaler jeg stærkt at tjekke et websted som codecademy for at lære nye færdigheder eller pusse op på gamle.

Lad os først spin din favorit IDE op. Normalt bruger jeg PyCharm, men for et hurtigt script som dette Repl. Det vil også gøre jobbet. Tilføj et hurtigt udskriv ("Hello world") for at sikre, at dit miljø er indstillet korrekt.

Nu skal vi finde ud af en datakilde.

Desværre er Cboes fantastiske valgkædedata ret låst, selv for aktuelle forsinkede tilbud. Heldigvis har Yahoo Finance solide nok muligheder for data her. Vi bruger det til denne vejledning, da webskrabere ofte har brug for noget indholdsbevidsthed, men det kan let tilpasses til enhver datakilde, du ønsker.

Afhængigheder

Vi har ikke brug for mange eksterne afhængigheder. Vi har bare brug for Requests og BeautifulSoup-modulerne i Python. Tilføj disse øverst i dit program:

from bs4 import BeautifulSoupimport requests

Opret en mainmetode:

def main(): print(“Hello World!”)if __name__ == “__main__”: main()

Skrabning af HTML

Nu er du klar til at begynde at skrabe! Indvendigt main()tilføj disse linjer for at hente sidens fulde HTML:

data_url = “//finance.yahoo.com/quote/SPY/options"data_html = requests.get(data_url).contentprint(data_html)

Dette henter sidens fulde HTMLindhold, så vi kan finde de data, vi ønsker i den. Du er velkommen til at køre det og observere output.

Du er velkommen til at kommentere udskriftserklæringer, mens du går - disse er bare der for at hjælpe dig med at forstå, hvad programmet laver på et givet trin.

BeautifulSoup er det perfekte værktøj til at arbejde med HTMLdata i Python. Lad os indsnævre HTMLtabellen til kun prisfastsættelse, så vi bedre kan forstå det:

 content = BeautifulSoup(data_html, “html.parser”) # print(content)
 options_tables = content.find_all(“table”) print(options_tables)

Det er stadig en hel del af HTML- vi kan ikke få meget ud af det, og Yahoos kode er ikke den mest venlige over for webskrabere. Lad os opdele det i to tabeller til opkald og sætninger:

 options_tables = [] tables = content.find_all(“table”) for i in range(0, len(content.find_all(“table”))): options_tables.append(tables[i])
 print(options_tables)

Yahoos data indeholder muligheder, der er ret dybe ind og ud af pengene, hvilket kan være godt til bestemte formål. Jeg er kun interesseret i nær-de-penge muligheder, nemlig de to opkald og to sætter tættest på den aktuelle pris.

Lad os finde disse ved hjælp af BeautifulSoup og Yahoos forskellige tabelindgange for muligheder i pengene og uden for pengene:

expiration = datetime.datetime.fromtimestamp(int(datestamp)).strftime(“%Y-%m-%d”)
calls = options_tables[0].find_all(“tr”)[1:] # first row is header
itm_calls = []otm_calls = []
for call_option in calls: if “in-the-money” in str(call_option): itm_calls.append(call_option) else: otm_calls.append(call_option)
itm_call = itm_calls[-1]otm_call = otm_calls[0]
print(str(itm_call) + “ \n\n “ + str(otm_call))

Nu har vi tabelposterne for de to muligheder, der er tættest på pengene i HTML. Lad os skrabe prisdata, volumen og underforstået volatilitet fra den første opkaldsmulighed:

 itm_call_data = [] for td in BeautifulSoup(str(itm_call), “html.parser”).find_all(“td”): itm_call_data.append(td.text)
print(itm_call_data)
itm_call_info = {‘contract’: itm_call_data[0], ‘strike’: itm_call_data[2], ‘last’: itm_call_data[3], ‘bid’: itm_call_data[4], ‘ask’: itm_call_data[5], ‘volume’: itm_call_data[8], ‘iv’: itm_call_data[10]}
print(itm_call_info)

Tilpas denne kode til den næste opkaldsmulighed:

# otm callotm_call_data = []for td in BeautifulSoup(str(otm_call), “html.parser”).find_all(“td”): otm_call_data.append(td.text)
# print(otm_call_data)
otm_call_info = {‘contract’: otm_call_data[0], ‘strike’: otm_call_data[2], ‘last’: otm_call_data[3], ‘bid’: otm_call_data[4], ‘ask’: otm_call_data[5], ‘volume’: otm_call_data[8], ‘iv’: otm_call_data[10]}
print(otm_call_info)

Giv dit program et løb!

Du har nu ordbøger over de to opkaldsmuligheder, der er tæt på pengene. Det er nok bare at skrabe tabellen med putmuligheder for de samme data:

puts = options_tables[1].find_all("tr")[1:] # first row is header
itm_puts = [] otm_puts = []
for put_option in puts: if "in-the-money" in str(put_option): itm_puts.append(put_option) else: otm_puts.append(put_option)
itm_put = itm_puts[0] otm_put = otm_puts[-1]
# print(str(itm_put) + " \n\n " + str(otm_put) + "\n\n")
itm_put_data = [] for td in BeautifulSoup(str(itm_put), "html.parser").find_all("td"): itm_put_data.append(td.text)
# print(itm_put_data)
itm_put_info = {'contract': itm_put_data[0], 'last_trade': itm_put_data[1][:10], 'strike': itm_put_data[2], 'last': itm_put_data[3], 'bid': itm_put_data[4], 'ask': itm_put_data[5], 'volume': itm_put_data[8], 'iv': itm_put_data[10]}
# print(itm_put_info)
# otm put otm_put_data = [] for td in BeautifulSoup(str(otm_put), "html.parser").find_all("td"): otm_put_data.append(td.text)
# print(otm_put_data)
otm_put_info = {'contract': otm_put_data[0], 'last_trade': otm_put_data[1][:10], 'strike': otm_put_data[2], 'last': otm_put_data[3], 'bid': otm_put_data[4], 'ask': otm_put_data[5], 'volume': otm_put_data[8], 'iv': otm_put_data[10]}

Tillykke! Du har lige skrabet data for alle nær-penge-mulighederne i S&P 500 ETF og kan se dem sådan:

 print("\n\n") print(itm_call_info) print(otm_call_info) print(itm_put_info) print(otm_put_info)

Giv dit program et løb - du skal få data som dette udskrevet på konsollen:

{‘contract’: ‘SPY190417C00289000’, ‘last_trade’: ‘2019–04–15’, ‘strike’: ‘289.00’, ‘last’: ‘1.46’, ‘bid’: ‘1.48’, ‘ask’: ‘1.50’, ‘volume’: ‘4,646’, ‘iv’: ‘8.94%’}{‘contract’: ‘SPY190417C00290000’, ‘last_trade’: ‘2019–04–15’, ‘strike’: ‘290.00’, ‘last’: ‘0.80’, ‘bid’: ‘0.82’, ‘ask’: ‘0.83’, ‘volume’: ‘38,491’, ‘iv’: ‘8.06%’}{‘contract’: ‘SPY190417P00290000’, ‘last_trade’: ‘2019–04–15’, ‘strike’: ‘290.00’, ‘last’: ‘0.77’, ‘bid’: ‘0.75’, ‘ask’: ‘0.78’, ‘volume’: ‘11,310’, ‘iv’: ‘7.30%’}{‘contract’: ‘SPY190417P00289000’, ‘last_trade’: ‘2019–04–15’, ‘strike’: ‘289.00’, ‘last’: ‘0.41’, ‘bid’: ‘0.40’, ‘ask’: ‘0.42’, ‘volume’: ‘44,319’, ‘iv’: ‘7.79%’}

Opsætning af tilbagevendende dataindsamling

Yahoo, by default, only returns the options for the date you specify. It’s this part of the URL: //finance.yahoo.com/quote/SPY/options?date=1555459200

This is a Unix timestamp, so we’ll need to generate or scrape one, rather than hardcoding it in our program.

Add some dependencies:

import datetime, time

Let’s write a quick script to generate and verify a Unix timestamp for our next set of options:

def get_datestamp(): options_url = “//finance.yahoo.com/quote/SPY/options?date=" today = int(time.time()) # print(today) date = datetime.datetime.fromtimestamp(today) yy = date.year mm = date.month dd = date.day

The above code holds the base URL of the page we are scraping and generates a datetime.date object for us to use in the future.

Let’s increment this date by one day, so we don’t get options that have already expired.

dd += 1

Now, we need to convert it back into a Unix timestamp and make sure it’s a valid date for options contracts:

 options_day = datetime.date(yy, mm, dd) datestamp = int(time.mktime(options_day.timetuple())) # print(datestamp) # print(datetime.datetime.fromtimestamp(options_stamp))
 # vet timestamp, then return if valid for i in range(0, 7): test_req = requests.get(options_url + str(datestamp)).content content = BeautifulSoup(test_req, “html.parser”) # print(content) tables = content.find_all(“table”)
 if tables != []: # print(datestamp) return str(datestamp) else: # print(“Bad datestamp!”) dd += 1 options_day = datetime.date(yy, mm, dd) datestamp = int(time.mktime(options_day.timetuple())) return str(-1)

Let’s adapt our fetch_options method to use a dynamic timestamp to fetch options data, rather than whatever Yahoo wants to give us as the default.

Change this line:

data_url = “//finance.yahoo.com/quote/SPY/options"

To this:

datestamp = get_datestamp()data_url = “//finance.yahoo.com/quote/SPY/options?date=" + datestamp

Congratulations! You just scraped real-world options data from the web.

Now we need to do some simple file I/O and set up a timer to record this data each day after market close.

Improving the program

Rename main() to fetch_options() and add these lines to the bottom:

options_list = {‘calls’: {‘itm’: itm_call_info, ‘otm’: otm_call_info}, ‘puts’: {‘itm’: itm_put_info, ‘otm’: otm_put_info}, ‘date’: datetime.date.fromtimestamp(time.time()).strftime(“%Y-%m-%d”)}return options_list

Create a new method called schedule(). We’ll use this to control when we scrape for options, every twenty-four hours after market close. Add this code to schedule our first job at the next market close:

from apscheduler.schedulers.background import BackgroundScheduler
scheduler = BackgroundScheduler()
def schedule(): scheduler.add_job(func=run, trigger=”date”, run_date = datetime.datetime.now()) scheduler.start()

In your if __name__ == “__main__”: statement, delete main() and add a call to schedule() to set up your first scheduled job.

Create another method called run(). This is where we’ll handle the bulk of our operations, including actually saving the market data. Add this to the body of run():

 today = int(time.time()) date = datetime.datetime.fromtimestamp(today) yy = date.year mm = date.month dd = date.day
 # must use 12:30 for Unix time instead of 4:30 NY time next_close = datetime.datetime(yy, mm, dd, 12, 30)
 # do operations here “”” This is where we’ll write our last bit of code. “””
 # schedule next job scheduler.add_job(func=run, trigger=”date”, run_date = next_close)
 print(“Job scheduled! | “ + str(next_close))

This lets our code call itself in the future, so we can just put it on a server and build up our options data each day. Add this code to actually fetch data under “”” This is where we’ll write our last bit of code. “””

options = {}
 # ensures option data doesn’t break the program if internet is out try: if next_close > datetime.datetime.now(): print(“Market is still open! Waiting until after close…”) else: # ensures program was run after market hours if next_close < datetime.datetime.now(): dd += 1 next_close = datetime.datetime(yy, mm, dd, 12, 30) options = fetch_options() print(options) # write to file write_to_csv(options)except: print(“Check your connection and try again.”)

Saving data

You may have noticed that write_to_csv isn’t implemented yet. No worries — let’s take care of that here:

def write_to_csv(options_data): import csv with open(‘options.csv’, ‘a’, newline=’\n’) as csvfile: spamwriter = csv.writer(csvfile, delimiter=’,’) spamwriter.writerow([str(options_data)])

Cleaning up

As options contracts are time-sensitive, we might want to add a field for their expiration date. This capability is not included in the raw HTML we scraped.

Add this line of code to save and format the expiration date towards the top of fetch_options():

expiration = datetime.datetime.fromtimestamp(int(get_datestamp())).strftime("%Y-%m-%d")

Add ‘expiration’: expiration to the end of each option_info dictionary like so:

itm_call_info = {'contract': itm_call_data[0], 'strike': itm_call_data[2], 'last': itm_call_data[3], 'bid': itm_call_data[4], 'ask': itm_call_data[5], 'volume': itm_call_data[8], 'iv': itm_call_data[10], 'expiration': expiration}

Giv dit nye program et løb - det skraber de nyeste indstillingsdata og skriver det til en .csv-fil som en strengrepræsentation af en ordbog. .Csv-filen vil være klar til at blive parset af et backtesting-program eller serveret til brugere via en webapp. Tillykke!