Hello. I have a crawler that sends documents to a Solr trunk instance via the ExtractingRequestHandler.
I can search on title and content and everything is OK. The documents usually contain among others text like: - "...location: London, ..." or - "...in Brighton..." or - "...to Birmingham" etc. So, there are location informations in the boby/content of the document. What I would like to do is: 1- Extract the location/town name from the document body and add them as a separate metadata field to the document along with longitude/latitude or other geospatial information needed. 2- Be able to do geospatial search using UK town/city names or postcode + radius as input and find relevant documents. I came across http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4 as well as the demo application at https://github.com/ryantxu/spatial-solr-sandbox My first and main concern at the moment is point 1): How to extract the town/city names from the documents, map them to geospatial coordinates and tag the documents accordingly. I Have been thinking of OpenNLP for extracting, but not sure whether it is the lightest way to do this. Any hint or recommendation about 1) and 2) will be very appreciated. Thank you.