Category

Python

Should You Build Your App From Scratch or Using Salesforce?

By Development, DjangoOne Comment

An important decision that comes to all companies before building out an application, is the decision to use either an app builder like Salesforce, where almost no prior coding experience is required, or a web development framework, like Ruby on Rails, C#, Java, or Django (we will simply refer to rails for now). It’s an important decision that will determine the development path to follow. In this article, I’m going to go over the benefits of both Salesforce and Ruby on Rails in order to help you make an educated decision on what path you would like to follow.

Salesforce

 

Pros:

Streamlined – Salesforce strives to be an end to end software data tool to manage your complete enterprise. You can easily connect all your employees using streamlined tools and applications that come out of the box. This requires fewer resources overall and allows you to manage your business from anywhere.

Mobile First – Salesforce also gives companies and developers an easy way to create your mobile applications. In fact, it’s touted as being built mobile-first. The instant you create an app for your browser, a clean-looking mobile version is generated that can be used by all users.

Simple – As you may be able to tell from the previous benefits of Salesforce, it is built to be straight forward. Salesforce is proud to note that their software can be managed by others who have have less technical experience. This is especially true due to Salesforce’s own Trailhead, which is “The fun way to learn Salesforce”.

Cons:

Limited Customization – There isn’t much customization for how Salesforce apps work and look.  If you purchase a solution that is built on top of Salesforce, you are often limited in your ability to customize how it will work as you now rely on Salesforce updates. For users on Salesforce, some functionality may break after Salesforce update. Updates usually consist of large layout changes to the platform, causing problems to the businesses using it by making some important features now hard to find. This also gives new Salesforce businesses a learning curve around the Salesforce system, not the business.

Expensive – While Salesforce seems inexpensive out of the gate, costs can quickly escalate out of control.  If your ROI (return on investment) isn’t there, Salesforce will be an anchor instead of a rocket boost.  Many times, the rising costs of additional but much needed capability causes subscription creep that the company is saddled with.

Developers Needed – Many companies try and wedge in a java or .net programmer and tell them to code up Salesforce apps or integration only to find that they cannot do the work. To build and operating in Salesforce at a deeper technical level, you need to have experiences certified Salesforce personnel on your team.

Building From Scratch – Ruby on Rails/Django

 

Pros:

Application Flexibility – If you are building a specialized software product, rails is also no doubt one of the best for designing a customized platform.  Rails allows you to customize everything from design, to workflow, to api exposure. In Rails, because developers practically start from the ground up, you can put a very personal touch on your app.

Platform Flexibility – There are no platform limitations.  If you can think of it, you can build it.  You are not limited to restrictions. Building your apps in Rails will also save you from worrying about third parties. With Rails, the only business you need to rely on is your own.

Cost – For smaller companies with developer experience looking to start up a new application can find Rails applications to be more affordable with server usage being the only minimal cost.

Cons:

Experienced Developers – Working on a Ruby on Rails web-app will require one or multiple experienced developers. Connecting your users is also a greater hassle as well, compared to Salesforce.

 

Conclusion

If you are looking for a flexible and cheap way to create a CRM, then building your application from scratch would be your best choice! Especially if you are looking to create a much more personalized CRM that works your way. On the other hand, if you are looking for something simple and streamlined, Salesforce is the choice for you! As anyone can learn Salesforce! In the end, it’s whatever benefits your business the most, for whatever its needs may be. You could even do both, with the ability to integrate the Salesforce API into your Rails project.

GIS Crawling Done Easy Using Python With PostGreSQL

By AWS, Databases, Development, GIS, Python, RDSNo Comments

Problem

Company “Help-i-copter” allows users to rent a ride in a helicopter for a lift in style. However, Help-i-copter is having trouble creating software that find the best flat surface for the helicopter pilots to land to pick up their customers. A perfect solution would be crawling data from sites and to store it on a PostGreSQL database. What steps should they take?

Crawling

Crawling, in software terms, is the act of copying data from a website using a computer program. This data can be saved to a file, printed to your screen, or put into a database. It entirely depends on the project and its scale. On a much greater scale, where data can be overwhelming, a database would be the best option, which is why it’s important to have one. You could have your grocery list stored online, but you don’t want to login every time to see it. You could instead crawl that information off the web with a command like:

soup.find_all('a', attrs={'class': ‘list'})

  • Write a crawling program that collects data from multiple sources, to get the most accurate data. This program would most likely be written in Python as it has access to modules like Requests and BeautifulSoup.

import requests
from bs4 import BeautifulSoup
url = "http://www.placeforyourhelicopter.com"
r = requests.get(url)
soup = BeautifulSoup(r.content, "html.parser")
surfaces = soup.find_all('a', attrs={'class': 'surface'})
for d in surfaces:
#getting all the surface information

  • Store the found data on a PostGreSQL database. This allows the data to be stored and sorted. When it’s sorted, this allows the data to be queried quickly and efficiently.

import psycopg2
endpoint = "surfaces.amazonaws.com"
c = psycopg2.connect(host=endpoint ,database="surfaces", user="admin", password="admin")
cursor = c.cursor()
query = "INSERT INTO surfaces (airtraffic,windspeed,date_produced,lat,lon,state) VALUES (%s,%s,%s,%s,%s,%s);"
data = #all info from surfaces in an array
cursor.execute(query, data)

  • Create an app that allows pilots to enter in a variable they want to query and parameters for that variable. Then use Python and the Psycopg2 module to query the data accordingly. This app with allow non-programmers to access the database without having to learn PostGreSQL.

letter = raw_input("Choose a number: ")
cursor.execute("SELECT * FROM surfaces WHERE airtraffic LIKE " + letter)
for row in cursor:
print str(row) + "\n"

 

Databases

So why is it important to store large amounts of data on a database? Simply, it gives you an easy way to find and query data. For example, let’s say you have a list of employees you got from your company’s website, and added it to a PostGreSQL database. In such a database, finding data like “all employees with a first name that begins with an L” would be much simpler, as databases are well organized. Simple commands like:

where employee_name LIKE 'L%'

would return “Larry, Liz, and Lucy” quickly and efficiently.

Bounding Box

 

This data could be used in a lot of ways. Namely, we could use out latitude and longitude coordinates to create a bounding box and get information about the areas within that box. This could help Help-i-copter in a number of ways. Pilots could use a bounding box to find flat areas near them and sort those by air-traffic, location, etc… It will be essentially asking the pilot the maximum and minimum coordinates of the box and then looking through the database for any surface that fit the description. Here’s what that might look like in Python:

xmin = raw_input("What is the minimum latitude? ")
xmax = raw_input("What is the maximum latitude? ")
ymin = raw_input("What is the minimum longitude? ")
ymax = raw_input("What is the maximum longitude? ")
cursor.execute("SELECT * FROM surfaces WHERE GEOMETRY(lat,lon) && ST_MakeEnvelope("
+ ymin + "," + ymin + "," + xmax + "," + ymax + ", 4326)")
for row in cursor:
print str(row) + "\n"

Conclusion

 

As you can see, crawling and databases can work very well together, especially when crawling large amounts of data. It would be otherwise inefficient and a lot slower to just store data on a normal document, or to do a crawl of the page every time you run the program. Help-i-copter able to efficiently crawl a site, upload data to a web page, and query that data back to a pilot quickly. Thanks to the power of PostGreSQL and Python.

AWS Sagemaker – predicting gasoline monthly output

By Artificial Intelligence, AWS, Development, Python, Sagemaker

AWS continues to wow me with all of the services that they are coming out with. What Amazon is doing is a very smart strategy. They are leveraging their technology stack to build more advanced solutions. In doing so, Amazon Web Services is following the “Profit From The Core” strategy down to the t.  Aside from following Amazon’s world domination plan, I wanted to see how well their roll out of artificial intelligence tools, like Sagemaker, went.

Background

There are many articles about how AI works.  In some cases, an application is extraordinarily simple.  In other cases, it is endlessly complex. We are going to stick with the most simple model.  In this model, we have to do the following steps.

  1. Collect data
  2. Clean Data
  3. Build Model
  4. Train Model
  5. Predict Something

Amazon has tried to automate these steps as best as possible.   From Amazon’s site: “Amazon SageMaker is a fully-managed platform that enables developers and data scientists to quickly and easily build, train, and deploy machine learning models at any scale. Amazon SageMaker removes all the barriers that typically slow down developers who want to use machine learning.”

Lets see how well they do.  Gentle people…lets start our clocks.  The time is 20 May 2018 @ 6:05pm.

Notebook Instances

The first thing that you do as part of your training is build notebooks. According to Jupyter, the developer of Project Jupyter, a notebook is an application that allows you to create and share documents that contain live code, equations, visualizations and narrative text.

You follow the simple tutorial and it looks something like this.

AWS Sage simple Jupyter Notebook

Time: 6:11:34 (so far so good)

Example Selection – Time Series Forecast

The first thing that we want to do is go to the “SageMaker Examples” tab, and make a copy of “linear_time_series_forecast_2019-05-20”.  I have had some experience predicting when events would happen and wanted to follow something that I already know. If you aren’t familiar, please check out this coursera video.

Time: 6:20:17

Read Background

Forecasting is potentially the most broadly relevant machine learning topic there is. Whether predicting future sales in retail, housing prices in real estate, traffic in cities, or patient visits in healthcare, almost every industry could benefit from improvements in their forecasts. There are numerous statistical methodologies that have been developed to forecast time-series data. However, the process for developing forecasts tends to be a mix of objective statistics and subjective interpretations.

Properly modeling time-series data takes a great deal of care. What’s the right level of aggregation to model at? Too granular and the signal gets lost in the noise, too aggregate and important variation is missed. Also, what is the right cyclicality? Daily, weekly, monthly? Are there holiday peaks? How should we weight recent versus overall trends?

Linear regression with appropriate controls for trend, seasonality, and recent behavior, remains a common method for forecasting stable time-series with reasonable volatility. This notebook will build a linear model to forecast weekly output for US gasoline products starting in 1991 to 2005. It will focus almost exclusively on the application. For a more in-depth treatment on forecasting in general, see Forecasting: Principles & Practice. In addition, because our dataset is a single time-series, we’ll stick with SageMaker’s Linear Learner algorithm. If we had multiple, related time-series, we would use SageMaker’s DeepAR algorithm, which is specifically designed for forecasting. See the DeepAR Notebook for more detail.

Time: 6:24:13

S3 Setup

Let’s start by specifying:

  • The S3 bucket and prefix that you want to use for training and model data. This should be within the same region as the Notebook Instance, training, and hosting.
  • The IAM role arn used to give training and hosting access to your data. See the documentation for how to create these. Note, if more than one role is required for notebook instances, training, and/or hosting, please replace the boto regexp with a the appropriate full IAM role arn string(s).

I set up a simple s3 bucket like this: 20180520-sage-test-v1-tm

Import the Python libraries.

Got distracted and played with all of the functions.  Time 6:38:07.

Data

Let’s download the data. More information about this dataset can be found here.

You can run some simple plots using Matlab and Pandas.

Sage time series gas plots

 

Transform Data To Predictive Model

Next we’ll transform the dataset to make it look a bit more like a standard prediction model.

This stage doesn’t look immediately clear. If you were to just click through the buttons, it takes a few seconds. If you want to read through these stages, it will take you a lot longer. In the end, you should have the following files stored on S3.

Note, you can’t review the content from these using a text editor. The data is stored in binary.

Time: 7:02:43

I normally don’t use a lot of notebooks. As a result, this took a little longer because I ran into some problems.

Training

Amazon SageMaker’s Linear Learner actually fits many models in parallel. Each model has slightly different hyper-parameters. The model the best fit is the one used. This functionality is automatically enabled. We can influence this using parameters like:

  • num_models to increase the total number of models run. The specified parameters values, will always be in those models. However, the algorithm also chooses models with nearby parameter values. This is in case a nearby solution is more optimal. In this case, we’re going to use the max of 32.
  • loss which controls how we penalize mistakes in our model estimates. For this case, let’s use absolute loss. We haven’t spent much time cleaning the data. Therefore, absolute loss will adjust less to accommodate outliers.
  • wd or l1 which control regularization. Regularization helps prevent model overfitting. It works by preventing our estimates from becoming too finely tuned to the training data. This is why it is good to make sure your training data is an appropriate sample of the entire data set. In this case, we’ll leave these parameters as their default “auto”.

This part of the demo took a lot longer….

And it worked!

Ended at time: 7:21:54 pm.

 

The Forecast!

This is what we have all been waiting for!

For our example we’ll keep things simple and use Median Absolute Percent Error (MdAPE), but we’ll also compare it to a naive benchmark forecast (that week last year’s demand * that week last year / that week two year’s ago).

As we can see our MdAPE is substantially better than the naive. Additionally, we actually swing from a forecast that is too volatile to one that under-represents the noise in our data. However, the overall shape of the statistical forecast does appear to better represent the actual data.

Next, let’s generate a multi-step-ahead forecast. To do this, we’ll need to loop over invoking the endpoint one row at a time and make sure the lags in our model are updated appropriately.

 

Conclusion

It does appear that for pre-built scenarios that AWS’s Sagemaker worked for linear time series prediction!  While it doesn’t make you a master data scientist, it does however give you a simple place to train and practice with data sets.  If you wanted to master time series, you could simply plug in other datasets and conduct the same sort of analysis and cross check your work with other people’s results.  With Sagemaker, you have a complete and working blueprint!

Wrap up time: 8:19:30pm (with some distractions and breaks)

 

Entity Extraction On a Website | AWS Comprehend

By AWS, Comprehend, Development, PythonNo Comments

Use Case

You want to better understand what entities are embedded in a company’s website so you can understand  what that company is focused on.  You can use a tool like this if you are prospecting, thinking about a partnership, etc.  How do you do this in the most efficient way?  There are some tools that have made this a lot easier.

1. Select Your Target

Here are the steps that we used for http://www.magicinc.org.  They are a simple squarespace site.  You can see this by checking out https://builtwith.com/magicinc.org

2. Get the data

For entity extraction, raw text is the goal. You want as much as you can get without having duplicates.  Here is how you can pull everything that you need.  Here are some command line arguments to run on a Mac.

  1. For the domain you want to search, change directories to a clean directory labeled YYYYMMDD_the_domain.
  2. Run this command: wget -p -k –recursive http://www.magicinc.org
  3. cd into the ./blog directory.
  4. Cat all of the blog articles out using this recursive command: find . -type f -exec cat {} >> ../catted_file “;”

3. Prep Query to an Entity Extraction Engine |  Comprehend

In this simple case, we are going to query a AWS’s Comprehend service.  We will need to write some simple Python3 code.

Since we can’t submit more than 5000 bytes, we need to submit a batched job that break’s up our raw text into simplified batch text.   To do that, I wrote some very simple code:


temp = open('./body_output/catted_file', 'r').read()
strings = temp.split(" ");
counter = 0;
aws_submission = "";
submission_counter = 0;
aws_queued_objects = [] for word in strings:
pre_add_submission = aws_submission
aws_submission = aws_submission + " " + word
if len(aws_submission.encode('utf-8')) >5000:
submission_counter = submission_counter+1
print ("Number = " + str(submission_counter) + " with a byte size of "+\n"+
"+ str(len(pre_add_submission.encode('utf-8'))))
aws_queued_objects.append(pre_add_submission)
aws_submission = ""

Now,  we have to submit the batched job.  This is very simple, assuming that you have your boto3 library properly installed and your AWS configs running correctly.

response = client.batch_detect_entities(
TextList=aws_queued_objects,LanguageCode='en')

Analyze

Now…. all you have to do is visualize the results.  Note, you need to visualize this result outside of the Comprehend tool because there is no way to import data into that viewer.  This snapshot is what it looks like.

More importantly, the key work is to analyze.  We will leave that up to you!

 

Source Code

It was made to be as simple as possible without over complicating things.

Github: https://github.com/Bytelion/aws_comprehend_batched_job

 

Upgrade Python to support Django 2.0

Upgrading Python To Support Django 2.0

By Agile, Development, DevOps, Django, Python

Across the land, there are many developers, dev ops, and software delivery managers are terrified of the big move from Python 2.7 to Python 3.6 (at the time of writing this).  You can see all of the versions from the beginning of time.    I am going to walk you through why it is happening, how to plan, and more importantly, how to upgrade your infrastructure in a systematic manner.

Assumptions:

  • There are multiple developers on the team
  • There are multiple Django instances in your organization

Why move? Django

The answer is pretty easy… Django, the core web platform programming framework associated with the Python  programming language won’t support future releases in Python 2.7 any more. In fact, the Django 1.11.x series is the last to support Python 2.7.

Django 2.0 supports Python 3.4, 3.5, and 3.6. We highly recommend and only officially support the latest release of each series.

Note according to PEP 373, Python 2.7 is currently expected to remain supported by the core development team (receiving security updates and other bug fixes) until at least 2020 (10 years after its initial release, compared to the more typical support period of 18–24 months).

How to Plan?

When it comes time to plan for an event like this, it is important that you identify the critical components of your infrastructure.

Sample User Story

As a developer, would like to have a complete list of technical assets that use Python so I can generate an upgrade plan that reduces risk to the company.

Acceptance Criteria: Generate a list of the following assets which include:

  • Your Django platforms to include, dev, test, and production instances.
  • Other internal platforms ( AWS lambdas, small flask instances)
  • External/Internal libraries
  • Cronjob or schedule tasks running Python
  • Continuous Integration systems that build  (ie Jenkins)
  • Unit testing
  • QA regression testing scenarios
  • Code repositories
  • Your development team list (each one of them will have to complete upgrades)

Step 1 – Select Your First System

Pick your first system to do an end to end test.  If you only have one single platform of Django, then… you are done.  We recommend that you clone your dev instance for your first test.  If you don’t have a dev instance, stop reading this and make sure that you do have a dev, test and production version of your platform!

If you have multiple systems, we recommend that you select one that is much smaller in scope and will have as little impact on your operations as possible.

Sample User Story:

As a developer, I would like to select my first computing environment to upgrade Python on so I can minimize the impact on our operations.

Acceptance Criteria:

  • You selected one of the least impactful systems in your eco-systems
  • You clone a working environment that it identical to this instance.
  • You notify the team of what you are doing and discuss any impacts that your testing might have with them.

Step 2 – Start your documentation

If you have other people on the team, you will want to make sure that you can guide them along the path of upgrading their computing environments and be able to discuss problems that you had.

Sample User Story:

As a developer, I would like to document the upgrade process steps that  I took so I can help others on the team upgrade their computing environments to Python 3.6 with Django.

Acceptance Criteria:

  • Document your finding in confluence.

Step 3 – Review external dependencies

Not all pypi libraries are Python 3.6.3 compliant.  Many are only built on Python 2.7.  The good news however, is that most major libraries have been ported to 3.6.x.  The complete list of PyPi libraries per version is listed here.

Individual Library Inspections

As a developer, you can see the details of any pip installed library by calling this command: pip show LIBRARYNAME –verbose

Example:

(bytelion_env)~>pip show requests –verbose
Name: requests
Version: 2.13.0
Summary: Python HTTP for Humans.
Home-page: https://python-requests.org
Author: Kenneth Reitz
Author-email: me@kennethreitz.com
License: Apache 2.0
Location: /Users/terrancemacgregor/.virtualenvs/bytelion_env/lib/python2.7/site-packages
Requires:
Metadata-Version: 2.0
Installer: pip
Classifiers:
Development Status :: 5 – Production/Stable
Intended Audience :: Developers
Natural Language :: English
License :: OSI Approved :: Apache Software License
Programming Language :: Python
Programming Language :: Python :: 2.6
Programming Language :: Python :: 2.7
Programming Language :: Python :: 3
Programming Language :: Python :: 3.3
Programming Language :: Python :: 3.4
Programming Language :: Python :: 3.5
Programming Language :: Python :: 3.6
Programming Language :: Python :: Implementation :: CPython
Programming Language :: Python :: Implementation :: PyPy

Sample User Story

As a developer, I would like to know which PyPi libraries are not supported by my current system so I can determine a migration strategy.

Acceptance Criteria

  • List is generated and shared with the team
  • For each library that is not supported, identify an alternative.

 

Remember the 80/20 rule?

Many of the upgrades from 2.7 to 3.6 can be safely automated, some other changes (notably those associated with Unicode handling) may require careful consideration, and preferably robust automated regression test suites, to migrate effectively.

 

Upgrading Server Code

 

Programming Language :: Python

Here is a list of what some other people experienced:

https://www.calazan.com/upgrading-to-ubuntu-1604-python-36-and-django-111/

https://blog.thezerobit.com/2014/05/25/python-3-is-killing-python.html