a hyperlocal django project
PyGotham 2011
Paul Winkler
Warning, there is not much code in this talk.
An open source (GPL) platform for hyperlocal news. Version 1.0.1.
Our mandate: Make it easier to get running.
You really really want to:
Anything with a location and a date. e.g.:
http://demo.openblockproject.org
Top down from home page
Search:
Map
Location type list
"Stay up to date"
News type list
Widgets
Let's make one for NYC from scratch.
Well, almost.
http://openblockproject.org/docs/install
dropdb openblock_nycblock; createdb -T template_postgis openblock_nycblock django-admin.py syncdb --migrate django-admin.py process_tasks & django-admin.py runserver
from ebpub.settings_default import * SHORT_NAME = 'new york' DEFAULT_MAP_CENTER_LON = -73.949776 DEFAULT_MAP_CENTER_LAT = 40.741014 DEFAULT_MAP_ZOOM = 10 METRO_LIST = ({ 'extent': (-74.259567, 40.493959, -73.766384, 40.888601), 'multiple_cities': True, 'city_name': 'New York', # The SHORT_NAME in the settings file. 'short_name': SHORT_NAME, 'metro_name': 'New York', 'state': 'NY', 'state_name': 'New York', 'time_zone': TIME_ZONE, 'city_location_type': 'boroughs', })
http://localhost:8000/admin/db/locationtype/add/ http://localhost:8000/admin/db/location/upload-shapefile/
We do these first because we can then leverage them when loading blocks data, so blocks know what borough they are in.
Create a Boroughs location type. Then upload http://www.nyc.gov/html/dcp/download/bytes/nybb_10cav.zip
http://localhost:8000/admin/db/location/import-zip-shapefiles/
Paste in these:
10001 10002 10003 10005 10006 10007 10008 10009 10010 10012 10013 10014 10016 10017 10018 10019 10020 10021 10022 10023 10024 10025 10027 10028 10029 10030 10031 10032 10033 10034 10035 10036 10037 10038 10039 10040 10041 10055
bug: zip code import should replace existing, not barf
http://localhost:8000/admin/db/locationtype/add/ http://localhost:8000/admin/db/location/upload-shapefile/
Create a Neighborhood location type Be sure to select it before block import!! Zillow - ugh, works but it's pretty bad, things missing, or in vastly wrong place
Just Manhattan for now. This one takes ~ 8 minutes.
http://localhost:8000/admin/streets/block/import-blocks/
BASEURL is http://tigerline.census.gov/geo/tiger/TIGER2009/36_NEW_YORK/
Manhattan http://tigerline.census.gov/geo/tiger/TIGER2009/36_NEW_YORK/36061_New_York_County/tl_2009_36061_featnames.zip http://tigerline.census.gov/geo/tiger/TIGER2009/36_NEW_YORK/36061_New_York_County/tl_2009_36061_faces.zip http://tigerline.census.gov/geo/tiger/TIGER2009/36_NEW_YORK/36061_New_York_County/tl_2009_36061_edges.zip
Let's drop in some ready-made stuff: Meetups, Flickr photos, Open311 / SeeClickFix issues.
These are data sources that I know serve NYC, and that openblock has generic scraper scripts for.
First load a fixture that configures a news type to store each of these.
Blocks MUST be loaded before we can scrape.
Then we can scrape them
django-admin.py loaddata \ ebdata/ebdata/scrapers/general/flickr/photos_schema.json python ebdata/ebdata/scrapers/general/flickr/flickr_retrieval.py update_aggregates
Uses Open311 API:
django-admin.py loaddata \ ebdata/ebdata/scrapers/general/open311/open311_service_requests_schema.json python ebdata/ebdata/scrapers/general/open311/georeportv2.py \ --days-prior=10 \ --html-url-template=http://seeclickfix.com/issues/{id} \ http://seeclickfix.com/new-york/open311/v2 update_aggregates
There are lots of these:
django-admin.py loaddata \ ebdata/ebdata/scrapers/general/meetup/meetup_schema.json python ebdata/ebdata/scrapers/general/meetup/meetup_retrieval.py update_aggregates
curl "http://localhost:8000/api/dev1/items.json?\ locationid=neighborhoods/midtown&limit=2"
Result is GeoJSON:
{"type": "FeatureCollection", "features": [ {"geometry": {"type": "Point", "coordinates": [ -73.991821000000002, 40.768695999999998] }, "type": "Feature", "properties": { "location_name": "Btw 10th and 11th Ave at 52nd and 54th, New York, NY, 10019", "venue_name": "DeWitt Clinton Park Dog Run", "start_time": "11:30:00-05:00", "title": "Hells Kitchen Pug Meetup", "group_name": "The New York City Pug Meetup Group", "url": "http://www.meetup.com/NYCPugs/events/30986161/", "venue_phone": "", "item_date": "2011-09-18", ...
The good parts are things that are good about Django: simple, straightforward design.
A quick look at two of the scarier corners...
NewsItems have "semi-extensible" metadata. E.g. "restaurant inspections" could have a different set of metadata fields than "police reports."
Designed for simple and fast data retrieval
Tedious to configure (we hid it behind an admin UI facade)
Complicated implementation
don't say I didn't warn you
class NewsItem(models.Model): schema = models.ForeignKey(Schema) title = models.CharField(max_length=255) description = models.TextField() # Treat it like a dict. attributes = AttributesDescriptor() ...
class Schema(models.Model): """Describes a type of NewsItem. A NewsItem has exactly one Schema, which describes its Attributes, via associated SchemaFields.""" ... slug = models.SlugField(max_length=32, unique=True) ...
class Attribute(models.Model): """Extended metadata for NewsItems.""" news_item = models.OneToOneField(NewsItem, primary_key=True, unique=True) schema = models.ForeignKey(Schema) varchar01 = models.CharField( max_length=4096, blank=True, null=True) varchar02 = models.CharField( max_length=4096, blank=True, null=True) ... date01 = models.DateField(blank=True, null=True) date02 = models.DateField(blank=True, null=True) ...
class SchemaField(models.Model): """Describes the meaning of one Attribute field for one Schema type.""" schema = models.ForeignKey(Schema) name = models.SlugField(max_length=32) real_name = models.CharField( max_length=10, help_text= "Column name for Attributes." " 'varchar01', 'varchar02', etc.") ...
item = NewsItem.objects.get(schema__slug= 'foo', ...) item.attributes['bar'] = 'ouch' # Equivalent to... schemafield = SchemaField.objects.filter( schema__slug= 'foo', name= 'bar') attrs = Attributes.objects.get(newsitem=item) setattr(attrs, schemafield.real_name, 'ouch') attrs.save()
There is scarier stuff: "Lookups"
We need from PostGIS: multipolygons, intersection & containment queries, a few triggers.
Mongo has only a few geometry types, not multipolygon
Geocouch has all geometry types, but only bbox queries.
Any other options?
"EAV" pattern? (aka vertical tables)
OpenBlock's approach gives faster retrieval, with types. (But stores fewer values.)
Two data stores, postgis plus nosql?
Mandate: fewer install / admin headaches, not more!
Solving this is not in scope of our contract
too bad, it could have been fun
It would have taken a pretty big rewrite, and while the design many be odd, it works fine.
Try this:
curl -i --data-urlencode q@- http://demo.openblockproject.org/api/geotag/ <<EOF I was on my way to 325 Massachusetts Ave and I met some guy named Bob Jones who said "isn't that near Back Bay?" and I said I think it's on Mass near Shawmut. And then I sang an Olivia Newton John song. EOF
{ "locations": [ { "city": "Boston", "zip": "02115", "latlng": [ 42.342514105263156, -71.084578526315781 ], "state": "MA", "address": "325 Massachusetts Ave.", "query": "325 Massachusetts Ave", "type": "address" },
{ "query": "Back Bay", "latlng": [ 42.350222043649772, -71.080826021602178 ], "type": "neighborhood", "name": "Back Bay", "city": "BOSTON" },
{ "city": "BOSTON", "zip": "02118", "latlng": [ 42.337331999999996, -71.078071999999992 ], "state": "MA", "address": "Massachusetts Ave. & Shawmut Ave.", "query": "Mass near Shawmut", "type": "address" }
A 125-line regex!
... into which a 130-line regex is interpolated
... 11 times
Django Admin is still great
Scraping hypothesis: API is easier to deal with than framework
South migrations? -or- Multi-db? South!
Migrations must be frozen forever. So must any data they use, eg. fixtures.
Use a bit of boilerplate instead of fixtures:
def forwards(self, orm): def _create_or_update(model_id, key, attributes): Model = orm[model_id] params = {'defaults': attributes} params.update(key) ob, created = Model.objects.get_or_create(**params) for k, v in attributes.items(): # get_or_create() ignores 'defaults' on updates. setattr(ob, k, v) ob.save() return ob _create_or_update('db.schema', {'slug': 'police-reports'}, {'map_icon_url': '/map_icons/police.png'})
Q?