Hello.

I have a crawler that sends documents to a Solr trunk instance via
the ExtractingRequestHandler.

I can search on title and content and everything is OK.


The documents usually contain among others text like:

-  "...location: London, ..." or
-  "...in Brighton..." or
- "...to Birmingham" etc.

So, there are location informations in the boby/content of the document.

What I would like to do is:

1- Extract the location/town name from the document body and add them as a
separate metadata field to the document along with longitude/latitude or
other geospatial information needed.

2- Be able to do geospatial search using UK town/city names or postcode +
radius as input and find relevant documents.


I came across
http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4 as well as the
demo application at
https://github.com/ryantxu/spatial-solr-sandbox


My first and main concern at the moment is point 1):
How to extract the town/city names from the documents, map them to
geospatial coordinates and tag the documents accordingly.

I Have been thinking of OpenNLP for extracting, but not sure whether it is
the lightest way to do this.

Any hint or recommendation about 1) and 2) will be very appreciated.


Thank you.

Reply via email to