by Vas Mylko
This post is technical. It describes how we are building the Netflix of Trips. Back in 2020, we declared: “Curiosio is going to make any travel story and all travel stories modifiable and reusable according to your requirements and desires.” Right now we are working on that.
Trips for the Netflix of Trips
To provide a carousel of trips inspired by National Geographic we must have a sufficient number of trip plans — another level of travel planning in comparison to the written stories by their travelers. To provide a carousel of trips inspired by Condé Nast Traveler we must have many trip plans on another level to their written travel stories. And so forth for Travel+Leisure, Lonely Planet, Frommer’s and a dozen more, and many smaller sources.
Automated Supertrip
The creation trip plans of Natgeo or Lonely Planet flavors must be automated. Curiosio has had a function called [Supertrip] for several years already. [Supertrip] is a button visible for any open trip plan. It allows you to take the context and augment the requirements according to your needs and liking. As you hit the [Supertrip] the web form is pre-filled for you. So you are only changing the fields that you want. When ready you hit [GET TRIP].
The problem is the context — how to get at least the geo context from the article? If we could understand the context we could pre-fill the web form and hit [GET TRIP]. Obviously, we would do this via our own API; the analogy was used for explanation. How to get the list of points and places from the article to fill the Curiosio web form? We did the initial research on the problem two years ago during the release of Curious Germany. Now we have completed a second iteration and now productizing the study.
Geo Soup
To make machines make trip plans in specific contexts we must feed them geo soup — a list of points and places that define the context. We call this problem Geo NER for Geographic Named Entity Recognition. “Geo” is the most important because Curiosio will likely recommend cool places at every point automatically. While the points should better be specified to Curiosio.
The problem is hard. It consists of extracting the list of potential points and places and then geocoding them to understand what is where, and then pimp to our Knowledge Graph. Extraction could be done by spaCy NLP, it features NER, and it worked well enough during our first iteration. Today it is worth trying something newer like BERT NER. It would be cool to fine-tune it to Geo NER but we are not going to mark up the data for that in the nearest future.
Truth must be told — out-of-the-box Geo NER doesn’t work well enough. Special custom code must be written, following special heuristics, which are different for each web source. Let’s look at the raw output of our Geo NER module from the Lab. Here is a Natgeo story — Geek Retreat: The Best of Silicon Valley. Below is its geo soup:
['##gle', 'Big Sur', 'California', 'Carmel', 'Chocolate Garage', 'Clock Tower', 'Computer History Museum', 'Coupa Cafe', 'Earth', 'Garden Court Hotel', 'Google', 'Intelligent Travel', 'Los Angeles', 'Malibu', 'Mountain View', 'New York', 'Pacific Coast Highway', 'Palo Alto', 'Palo Alto Creamery', 'Palo Alto Farmers Market', 'Palo Alto Junior Museum', 'Palo Alto Junior Museum and Zoo', 'Philz Coffee', 'Reposado', 'San Francisco', 'Santa Barbara', 'Santa Cruz', 'Silicon Valley', 'South', 'Stanford', 'Stanford Memorial Church', 'Stanford University', 'Zoo']
Not a state of the art but already not bad. The article contains some noise around Google’s campus; nevertheless, the algorithm cracked the names well.
Clean-up will happen during geocoding. Anomaly detection must be done after that. For example, drop New York from the list because the true cluster of points is California. When the points and places are identified then they could be thrown onto the algorithm.
Web Search Engine
If the algorithm could understand the noisy article by National Geographic and compile a meta-data from it — could the algorithm understand others? Looks like yes, it could. What if we make a search engine for road trips for all publicly available road trip stories? Web search engine for road tripping \m/
You are searching for road trips like you would do on Google. You see the web search results in the known format. You click the link and see the [Plan a Road Trip] red button. Clicking on the button will pre-fill the form for you. The context similar to the travel story will be ready for your edits. It’s exactly that [Supertrip] function described earlier.
Here are examples of the interactive trip plans in this geo context (though for other duration and budgets and a number of travelers) because our NER algorithm doesn’t care about the duration or cost of the original story:
You could notice SpaceX and Pinnacles recommended by Curiosio. The places are not present in the original Natgeo story.
The Web for the Web Search
After the pilot with select web sources — National Geographic and several more — for travel stories we are going to generalize the web search engine to road trips published everywhere. We will crawl Medium, WordPress, Blogger, Substack, etc. and videos on YouTube.
What about seeing the carousel of relevant trip plans for EVERY link? You could review the alternatives to get inspired. You could start planning on your own. You could read the original article.
Below is an explanation with callouts what is what is this design. There are several clickable elements: web link, red button [Plan a Road Trip], red button link “Go to the article↗”, and a carousel. The carousel is clickable on its own, everywhere.
We are going to facelift the page of the country in Curiosio. You will have a “Create” block to plan trips from scratch. Today, Curiosio has only this block. We will add a “Find” block to search for road trips on the web. You will see the web search results there, with [Supertrip] hooked in. When we have enough trip plans for the links we will hook the carousels of trips, ibid.
Summary
To build branded and themed travel channels for My Trips we got to do a lot of NLP and HPC work. The byproduct of that is a Web Search Engine for travel. Let’s see how this unfolds. Stay tuned and always follow your curiosity.