from:"Lance Norskog"

Re: need help on OpenNLP with Solr

2014-01-09 Thread Lance Norskog

and add the payloads. ( but iam not able to analyze it) > > My Question is: > Can i search a phrase giving high boost to NOUN then VERB ? > For example: if iam searching "sitting on blanket" , so i want to give high > boost to NOUN term first then VERB, that are tagged by OpenNLP. > How can i use payloads for boosting? > What are the changes required in schema.xml? > > Please provide me some pointers to move ahead > > Thanks in advance > -- Lance Norskog goks...@gmail.com

Re: SolrCloud unstable

2013-11-24 Thread Lance Norskog

Yes, you should use a recent Java 7. Java 6 is end-of-life and no longer supported by Oracle. Also, read up on the various garbage collectors. It is a complex topic and there are many guides online. In particular there is a problem in some Java 6 releases that causes a massive memory leak in S

Re: SOLR: Searching on OpenNLP fields is unstable

2013-10-20 Thread Lance Norskog

gt; > > > > > > > And field declared for this analyzer: > > omitNorms="true" omitPositions="true"/> > > > > Problem is here : When I search over this field Detail_Person, results are > not constant. > > > > When I search Detail_Person:brett, it return one document > > > > > > But again when I fire the same query, it return zero document. > > > > Searching is not stable on OpenNLP field, sometimes it return documents > and sometimes not but documents are there. > > And if I search on non OpenNLP fields, it is working properly, results are > stable and correct. > > Please help me to make solr results consistent. > > Thanks in Advance. > > -- Lance Norskog goks...@gmail.com

Re: DIH - stream file with solrEntityProcessor

2013-10-14 Thread Lance Norskog

t if solrEntityProcessor could stream static files, or if I could specify the solr result format while using the xpathentityprocessor (i.e. a useSolrResultSchema option) Any other ideas? On Mon, Oct 14, 2013 at 6:24 PM, Lance Norskog wrote: On 10/13/2013 10:02 AM, Shawn Heisey wrote: On 10/13/

Re: DIH - stream file with solrEntityProcessor

2013-10-14 Thread Lance Norskog

On 10/13/2013 10:02 AM, Shawn Heisey wrote: On 10/13/2013 10:16 AM, Josh Lincoln wrote: I have a large solr response in xml format and would like to import it into a new solr collection. I'm able to use DIH with solrEntityProcessor, but only if I first truncate the file to a small subset of the

Re: Solr4.4 or zookeeper 3.4.5 do not support too many collections? more than 600?

2013-09-10 Thread Lance Norskog

Yes, Solr/Lucene works fine with other indexes this large. There are many indexes with hundreds of gigabytes and hundreds of millions of documents. My experience years ago was that at this scale, searching worked great, sorting & facets less so, and the real problem was IT: a 200G blob of data

Re: SOLR Prevent solr of modifying fields when update doc

2013-08-23 Thread Lance Norskog

Solr does not by default generate unique IDs. It uses what you give as your unique field, usually called 'id'. What software do you use to index data from your RSS feeds? Maybe that is creating a new 'id' field? There is no partial update, Solr (Lucene) always rewrites the complete document.

Re: How to SOLR file in svn repository

2013-08-22 Thread Lance Norskog

You need to: 1) crawl the SVN database 2) index the files 3) make a UI that fetches the original file when you click on a search results. Solr only has #2. If you run a subversion web browser app, you can download the developer-only version of the LucidWorks product and crawl the SVN web view

Re: Document Similarity Algorithm at Solr/Lucene

2013-08-07 Thread Lance Norskog

Block-quoting and plagiarism are two different questions. Block-quoting is simple: break the text apart into sentences or even paragraphs and make them separate documents. Make facets of the post-analysis text. Now just pull counts of facets and block quotes will be clear. Mahout has a scala

Re: Percolate feature?

2013-08-05 Thread Lance Norskog

Cool! On 08/05/2013 03:34 AM, Charlie Hull wrote: On 03/08/2013 00:50, Mark wrote: We have a set number of known terms we want to match against. In Index: "term one" "term two" "term three" I know how to match all terms of a user query against the index but we would like to know how/if we ca

Re: adding date column to the index

2013-07-22 Thread Lance Norskog

Solr/Lucene does not automatically add when asked, the way DBMS systems do. Instead, all data for a field is added at the same time. To get the new field, you have to reload all of your data. This is also true for deleting fields. If you remove a field, that data does not go away until you re-

Re: Solr 4.3.1 - SolrCloud nodes down and lost documents

2013-07-22 Thread Lance Norskog

Are you feeding Graphite from Solr? If so, how? On 07/19/2013 01:02 AM, Neil Prosser wrote: That was overnight so I was unable to track exactly what happened (I'm going off our Graphite graphs here).

Re: JVM Crashed - SOLR deployed in Tomcat

2013-07-16 Thread Lance Norskog

I don't know about jvm crashes, but it is known that the Java 6 jvm had various problems supporting Solr, including the 20-30 series. A lot of people use the final jvm release (I think 6_30). On 07/16/2013 12:25 PM, neoman wrote: Hello Everyone, We are using solrcloud with Tomcat in our produc

Re: Norms

2013-07-12 Thread Lance Norskog

Norms stay in the index even if you delete all of the data. If you just changed the schema, emptied the index, and tested again, you've still got norms in there. You can examine the index with Luke to verify this. On 07/09/2013 08:57 PM, William Bell wrote: I have a field that has omitNorms=t

Re: Solr limitations

2013-07-10 Thread Lance Norskog

Also, total index file size. At 200-300gb managing an index becomes a pain. Lance On 07/08/2013 07:28 AM, Jack Krupansky wrote: Other that the per-node/per-collection limit of 2 billion documents per Lucene index, most of the limits of Solr are performance-based limits - Solr can handle it, bu

Re: getting different search results for words with same meaning in Japanese language

2013-06-30 Thread Lance Norskog

The MappingCharFilter allows you to map both characters to one characters. If you do this during indexing and querying, searching with one should find the other. This is sort of like synonyms, but on a character-by-character basis. Lance On 06/18/2013 11:08 PM, Yash Sharma wrote: > Hi, > > we hav

Re: Distributed search results in "SocketException: Connection reset"

2013-06-30 Thread Lance Norskog

This usually means the end server timed out. On 06/30/2013 06:31 AM, Shahar Davidson wrote: Hi all, We're getting the below exception sporadically when using distributed search. (using Solr 4.2.1) Note that 'core_3' is one of the cores mentioned in the 'shards' parameter. Any ideas anyone? T

Re: Varnish

2013-06-29 Thread Lance Norskog

Solr HTTP caching also support "e-tags". These are unique keys for the output of a query. If you send a query twice, and the index has not changed, the return will be the same. The e-tag is generated from the query string and the index generation number. If Varnish supports e-tags, you can kee

Re: Http status 503 Error in solr cloud setup

2013-06-29 Thread Lance Norskog

I do not know what causes the error. This setup will not work. You need one or three zookeepers. SolrCloud demands that a majority of the ZK servers agree. If you have two ZKs this will not work. On 06/29/2013 05:47 AM, Sagar Chaturvedi wrote: Hi, I setup 2 solr instances on 2 different mach

Does SolrCloud require matching configuration files?

2013-06-22 Thread Lance Norskog

Accumulo is a BigTable/Cassandra style distributed database. It is now an Apache Incubator project. In the README we find this gem: "Synchronize your accumulo conf directory across the cluster. As a precaution against mis-configured systems, servers using different configuration files will not

Re: Best way to match umlauts

2013-06-16 Thread Lance Norskog

One small thing: German u-umlaut is often "flattened" as 'ue' instead of 'u'. And the same with o-umlaut, it can be 'oe' or 'o'. I don't know if Lucene has a good solution for this problem. On 06/16/2013 06:44 AM, adityab wrote: Thanks for the explanation Steve. I now see it clearly. In my cas

Re: Adding pdf/word file using JSON/XML

2013-06-16 Thread Lance Norskog

No, they just learned a few features and then stopped because it was "good enough", and they had a thousand other things to code. As to REST- yes, it is worth having a coherent API. Solr is behind the curve here. Look at the HATEOS paradigm. It's ornate (and a really goofy name) but it provide

Re: SOLR-4872 and LUCENE-2145 (or, how to clean up a Tokenizer)

2013-06-12 Thread Lance Norskog

In 4.x and trunk is a close() method on Tokenizers and Filters. In currently released up to 4.3, there is instead a reset(stream) method which is how it resets a Tokenizer&Filter for a following document in the same upload. In both cases I had to track the first time the tokens are consumed, a

Re: OPENNLP problems

2013-06-09 Thread Lance Norskog

test patch LUCENE-2899-x.patch uploaded on 6th June but still had the same problem. Regards, Patrick -Original Message- From: Lance Norskog [mailto:goks...@gmail.com] Sent: Thursday, 6 June 2013 5:16 p.m. To: solr-user@lucene.apache.org Subject: Re: OPENNLP problems Patrick- I found

Re: OPENNLP problems

2013-06-09 Thread Lance Norskog

gards, Patrick -Original Message- From: Lance Norskog [mailto:goks...@gmail.com] Sent: Thursday, 6 June 2013 5:16 p.m. To: solr-user@lucene.apache.org Subject: Re: OPENNLP problems Patrick- I found the problem with multiple documents. The problem was that the API for the life cycle of a Toke

Re: OPENNLP problems

2013-06-05 Thread Lance Norskog

Patrick- I found the problem with multiple documents. The problem was that the API for the life cycle of a Tokenizer changed, and I only noticed part of the change. You can now upload multiple documents in one post, and the OpenNLPTokenizer will process each document. You're right, the exampl

Re: Shard Keys and Distributed Search

2013-06-02 Thread Lance Norskog

Distributed search does the actual search twice: once to get the scores and again to fetch the documents with the top N scores. This algorithm does not play well with "deep searches". On 06/02/2013 07:32 PM, Niran Fajemisin wrote: Thanks Daniel. That's exactly what I thought as well. I did tr

Re: Dynamic Indexing using DB and DIH

2013-06-02 Thread Lance Norskog

Let's assume that the Solr record includes the database record's timestamp field.You can make a more complex DIH stack that does a Solr query with the SolrEntityProcessor. You can do a query that gets the most recent timestamp in the index, and then use that in the DB update command. On 06/02

Re: OPENNLP problems

2013-05-30 Thread Lance Norskog

I will look at these problems. Thanks for trying it out! Lance Norskog On 05/28/2013 10:08 PM, Patrick Mi wrote: Hi there, Checked out branch_4x and applied the latest patch LUCENE-2899-current.patch however I ran into 2 problems Followed the wiki page instruction and set up a field with

Re: Regular expression in solr

2013-05-22 Thread Lance Norskog

If the indexed data includes positions, it should be possible to implement ^ and $ as the first and last positions. On 05/22/2013 04:08 AM, Oussama Jilal wrote: There is no ^ or $ in the solr regex since the regular expression will match tokens (not the complete indexed text). So the results yo

Re: Upgrading from SOLR 3.5 to 4.2.1 Results.

2013-05-17 Thread Lance Norskog

This is great; data like this is rare. Can you tell us any hardware or throughput numbers? On 05/17/2013 12:29 PM, Rishi Easwaran wrote: Hi All, Its Friday 3:00pm, warm & sunny outside and it was a good week. Figured I'd share some good news. I work for AOL mail team and we use SOLR for our

Re: SOLR guidance required

2013-05-13 Thread Lance Norskog

If this is for the US, remove the age range feature before you get sued. On 05/09/2013 08:41 PM, Kamal Palei wrote: Dear SOLR experts I might be asking a very silly question. As I am new to SOLR kindly guide me. I have a job site. Using SOLR to search resumes. When a HR user enters some keywor

Re: Why is SolrCloud doing a full copy of the index?

2013-05-04 Thread Lance Norskog

Great! Thank you very much Shawn. On 05/04/2013 10:55 AM, Shawn Heisey wrote: On 5/4/2013 11:45 AM, Shawn Heisey wrote: Advance warning: this is a long reply. I have condensed some relevant performance problem information into the following wiki page: http://wiki.apache.org/solr/SolrPerforman

Re: SolrCloud vs Solr master-slave replication

2013-04-18 Thread Lance Norskog

Run checksums on all files in both master and slave, and verify that they are the same. TCP/IP has a checksum algorithm that was state-of-the-art in 1969. On 04/18/2013 02:10 AM, Victor Ruiz wrote: Also, I forgot to say... the same error started to happen again.. the index is again corrupted :(

Re: Spatial search question

2013-04-12 Thread Lance Norskog

Outer distance AND NOT inner distance? On 04/12/2013 09:02 AM, kfdroid wrote: We currently do a radius search from a given Lat/Long point and it works great. I have a new requirement to do a search on a larger radius from the same point, but not include the smaller radius. Kind of a donut (toru

Re: Flow Chart of Solr

2013-04-07 Thread Lance Norskog

Seconded. Single-stepping really is the best way to follow the logic chains and see how the data mutates. On 04/05/2013 06:36 AM, Erick Erickson wrote: Then there's my lazy method. Fire up the IDE and find a test case that looks close to something you want to understand further. Step through it

Re: Blog Post: Integration Testing SOLR Index with Maven

2013-03-14 Thread Lance Norskog

Wow! That's great. And it's a lot of work, especially getting it all keyboard-complete. Thank you. On 03/14/2013 01:29 AM, Chantal Ackermann wrote: Hi all, this is not a question. I just wanted to announce that I've written a blog post on how to set up Maven for packaging and automatic testi

Re: InvalidShapeException when using SpatialRecursivePrefixTreeFieldType with custom worldBounds

2013-03-09 Thread Lance Norskog

Thank you (and Hoss)! I have found this concept elusive, and you two have nailed it. I will be able to understand it for the 5 minutes I will need to code with it. Lance On 03/09/2013 10:57 AM, David Smiley (@MITRE.org) wrote: Just finished: http://wiki.apache.org/solr/SpatialForTimeDurations

Re: Returning to Solr 4.0 from 4.1

2013-03-01 Thread Lance Norskog

Yes, the SolrEntityProcessor can be used for this. If you stored the original document bodies in the Solr index! You can also download the documents in Json or CSV format and re-upload those to old Solr. I don't know if CSV will work for your docs. If CSV works, you can directly upload what you

Re: Poll: SolrCloud vs. Master-Slave usage

2013-02-25 Thread Lance Norskog

"Do you use replication instead, or do you just have one instance?" On 02/25/2013 07:55 PM, Otis Gospodnetic wrote: Hi, Quick poll to see what % of Solr users use SolrCloud vs. Master-slave setup: http://blog.sematext.com/2013/02/25/poll-solr-cloud-or-not/ I have to say I'm surprised with the

Re: Benefits of Solr over Lucene?

2013-02-12 Thread Lance Norskog

Lucene and Solr have an aggressive upgrade schedule.From 3 to 4 got a major rewiring, and parts are orders of magnitude faster and smaller. If you code using Lucene, you will never upgrade to newer versions. (I supported Solr&Lucene customers for 3 years, and nobody ever did.) Cheers, Lance I

Re: Upgrading indexes from Solr 1.4.1 to 4.1.0

2013-02-04 Thread Lance Norskog

I don't have the source handy. I believe that SolrCloud hard-codes 'id' as the field name for defining shards. On 02/04/2013 10:19 AM, Shawn Heisey wrote: On 2/4/2013 10:58 AM, Lance Norskog wrote: A side problem here is text analyzers: the analyzers have changed how they split

Re: Upgrading indexes from Solr 1.4.1 to 4.1.0

2013-02-04 Thread Lance Norskog

A side problem here is text analyzers: the analyzers have changed how they split apart text for searching, and are matched pairs. That is, the analyzer queries are created matching what the analyzer did when indexing. If you do this binary upgrade sequence, the indexed data will not match what

Re: Indexing nouns only - UIMA vs. OpenNLP

2013-01-31 Thread Lance Norskog

Thanks, Kai! About removing non-nouns: the OpenNLP patch includes two simple TokenFilters for manipulating terms with payloads. The FilterPayloadFilter lets you keep or remove terms with given payloads. In the demo schema.xml, there is an example type that keeps only nouns&verbs. There is a

Re: Solr load balancer

2013-01-31 Thread Lance Norskog

It is possible to do this with IP Multicast. The query goes out on the multicast and all query servers read it. The servers wait for a random amount of time, then transmit the answer. Here's the trick: it's multicast. All of the query servers listen to each other's responses, and drop out when

Re: Solr 4 slower than Solr 3.x?

2013-01-28 Thread Lance Norskog

For this second report, it's easy: switching from a single query server to a sharded query is going to be slower. Virtual machines add jitter to the performance and response time of the front-end vs the query shards. Distributed search does 2 round-trips for each sharded query. Add these all up

Re: Index data from multiple tables into Solr

2013-01-14 Thread Lance Norskog

Try all of the links under the collection name in the lower left-hand columns. There several administration monitoring tools you may find useful. On 01/14/2013 11:45 AM, hassancrowdc wrote: ok stats are changing, so the data is indexed. But how can i do query with this data, or ow can i search

Re: Schema Field Names i18n

2013-01-14 Thread Lance Norskog

Will a field have different names in different languages? There is no facility for 'aliases' for field name. Erick is right, this sounds like you need query and update components to implement this. Also, you might try using URL-encoding for the field names. This would save my sanity. On 01/10/

Re: RSS tutorial that comes with the apache-solr not indexing

2013-01-14 Thread Lance Norskog

This example may be out of date, if the RSS feeds from Slashdot have changed. If you know XML and XPaths, try this: Find an rss feed from somewhere that works. Compare the xpaths in it v.s. the xpaths in the DIH script. On 01/13/2013 07:38 PM, bibhor wrote: Hi I am trying to use the RSS tutori

Re: DIH fails after processing roughly 10million records

2013-01-09 Thread Lance Norskog

At this scale, your indexing job is prone to break in various ways. If you want this to be reliable, it should be able to restart in the middle of an upload, rather than starting over. On 01/08/2013 10:19 PM, vijeshnair wrote: Yes Shawn, the batchSize is -1 only and I also have the mergeSchedu

What is group.query?

2013-01-03 Thread Lance Norskog

What does group.query do? How is it different from q= and fq= ? Thanks.

Re: Upgrading from 3.6 to 4.0

2013-01-03 Thread Lance Norskog

Please start new mail threads for new questions. This makes it much easier to research old mail threads. Old mail is often the only documentation for some problems. On 01/02/2013 10:04 AM, Benjamin, Roy wrote: Will the existing 3.6 indexes work with 4.0 binary ? Will 3.6 solrJ clients work wit

Re: Terminology question: Core vs. Collection vs...

2013-01-03 Thread Lance Norskog

Also, searching can be much faster if you put all of the shards on one machine, and the search distributor. That way, you search with multiple simultaneous threads inside one machine. I've seen this make searches several times faster. On 01/03/2013 06:36 AM, Jack Krupansky wrote: Ah... the mul

Re: SolrJ | IOException while Indexing a PDF document with additional fields

2013-01-02 Thread Lance Norskog

You did not include the stack trace. Oops. Try using fewer threads with the concurrent uploader, or use the single-threaded one. On 01/01/2013 03:55 PM, uwe72 wrote: the problem occurrs when i add a lot of values to a multivalue field. id i add just a few, then it works. this is the full sta

Re: Upgrading from 3.6 to 4.0

2013-01-02 Thread Lance Norskog

Indexes will not work. I have not heard of an index upgrader. If you run your 3.6 and new 4.0 Solr at the same time, you can upload all the data with a DataImportHandler script using the SolrEntityProcessor. How large are your indexes? 4.1 indexes will not match 4.0, so you will have to upload

Re: Viewing the Solr MoinMoin wiki offline

2013-01-01 Thread Lance Norskog

3 problems: a- he wanted to read it locally. b- crawling the open web is imperfect. c- /browse needs to get at the files with the same URL as the uploader. a and b- Try downloading the whole thing with 'wget'. It has a 'make links point to the downloaded files' option. Wget is great. I have do

Re: Converting fq params to Filter object

2012-12-26 Thread Lance Norskog

A Solr facet query does a boolean query, caches the Lucene facet data structure, and uses it as a Lucene filter. After that until you do a full commit, using the same fq=string (you must match the string exactly) fetches the cached data structure and uses it again as a Lucene filter. Have you

Re: [ANNOUNCE] Apache Solr 3.6.2 released

2012-12-26 Thread Lance Norskog

Cool! On 12/25/2012 08:03 AM, Robert Muir wrote: 25 December 2012, Apache Solr™ 3.6.2 available The Lucene PMC and Santa Claus are pleased to announce the release of Apache Solr 3.6.2. Solr is the popular, blazing fast open source enterprise search platform from the Apache Lucene project. Its

Re: [DIH] Script Transformer: Is there a way to import js file?

2012-12-26 Thread Lance Norskog

Maybe you could write a Javascript snippet that downloads and runs your external file? On 12/26/2012 09:12 AM, Dyer, James wrote: I'm not very familiar with using scipting langauges with Java, but having seen the DIH code for this, my guess is that all script code needs to be in the section o

Re: multi field query with selective results

2012-12-23 Thread Lance Norskog

lighten me? On Sunday, December 23, 2012, Lance Norskog wrote: Please start a new thread. Thanks! On 12/22/2012 11:03 AM, J Mohamed Zahoor wrote: Hi I have a word completion requirement where i need to pick result from two indexed fields. The trick is i need to pick top 5 results from each

Re: multi field query with selective results

2012-12-22 Thread Lance Norskog

Please start a new thread. Thanks! On 12/22/2012 11:03 AM, J Mohamed Zahoor wrote: Hi I have a word completion requirement where i need to pick result from two indexed fields. The trick is i need to pick top 5 results from each field and display as suggestions. If i set fq as field1:XXX AND

Re: Finding the last committed record in SOLR 4

2012-12-21 Thread Lance Norskog

The only sure way to get the last searchable document is to use a timestamp or sequence number in the document. I do not think that using a timestamp with default=NOW will give a unique timestamp, so you need your own sequence number. On 12/19/2012 10:17 PM, Joe wrote: I'm using SOLR 4 for an

Re: Pause and resume indexing on SolR 4 for backups

2012-12-20 Thread Lance Norskog

To be clear: 1) is fine. Lucene index updates are carefully sequenced so that the index is never in a bogus state. All data files are written and flushed to disk, then the segments.* files are written that match the data files. You can capture the files with a set of hard links to create a back

Re: optimun precisionStep for DAY granularity in a TrieDateField

2012-12-14 Thread Lance Norskog

Do you use rounding in your dates? You can index a date rounded to the nearest minute, N minutes, hour or day. This way a range query has to look at such a small number of terms that you may not need to tune the precision step. Hunt for NOW/DAY or 5DAYS in the queries. http://wiki.apache.org/s

Re: Modeling openinghours using multipoints

2012-12-10 Thread Lance Norskog

om code to build, save, and query the bitmap whereas working on top of existing functionality seems to me a lot more maintainable on the user's part. ~ David ____ From: Lance Norskog-2 [via Lucene] [ml-node+s472066n4025579...@n3.nabble.com] Sent: Sunday, December 09,

Re: Modeling openinghours using multipoints

2012-12-09 Thread Lance Norskog

0 or minX (always), 6 for minY (start time), 8 for maxX >> > > (end time), and the largest possible value for maxY. You wouldn't >> actually >> > > use 6 & 8, you'd use the number of 15 minute intervals since your >> epoch for >> > > this equivalent time span. >> > > >> > > You'll need to configure the field correctly: geo="false" >> worldBounds="0 0 >> > > maxTime maxTime" substituting an appropriate value for maxTime based on >> > > your unit of time (number of 15 minute intervals you need) and >> > > distErrPct="0" (full precision). >> > > >> > > Let me know how this works for you. >> > > >> > > ~ David >> > > Author: >> > > http://www.packtpub.com/apache-solr-3-enterprise-search-server/book >> > Author: >> http://www.packtpub.com/apache-solr-3-enterprise-search-server/book >> > >> > >> > If you reply to this email, your message will be added to the discussion >> below: >> > >> http://lucene.472066.n3.nabble.com/Modeling-openinghours-using-multipoints-tp4025336p4025434.html >> > To unsubscribe from Modeling openinghours using multipoints, click here. >> > NAML >> >> >> >> >> -- >> View this message in context: >> http://lucene.472066.n3.nabble.com/Modeling-openinghours-using-multipoints-tp4025336p4025454.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> -- Lance Norskog goks...@gmail.com

Re: Downloading files from the solr replication Handler

2012-11-29 Thread Lance Norskog

Maybe these are text encoding markers? - Original Message - | From: "Eva Lacy" | To: solr-user@lucene.apache.org | Sent: Thursday, November 29, 2012 3:53:07 AM | Subject: Re: Downloading files from the solr replication Handler | | I tried downloading them with my browser and also with a

Re: User context based search in apache solr

2012-11-24 Thread Lance Norskog

matext.com/spm/index.html | Search Analytics - http://sematext.com/search-analytics/index.html | | | | | On Sat, Nov 24, 2012 at 9:30 PM, Lance Norskog | wrote: | | > sagarzond- you are trying to embed a recommendation system into | > search. | > Recommendations are inherently a matrix

Re: configuring solr xml as a datasource

2012-11-24 Thread Lance Norskog

You don't need the transformers. I think the paths should be what is in the XML file. forEach="/add" And the paths need to use the syntax for name="fname" and name="number". I think this is it, but you should make sure. xpath="/add/doc/field[@name='fname']" xpath="/add/doc/field[@name='number

Re: User context based search in apache solr

2012-11-24 Thread Lance Norskog

sagarzond- you are trying to embed a recommendation system into search. Recommendations are inherently a matrix problem, where Solr and other search engines are one-dimensional databases. What you have is a sparse user-product matrix. This book has a good explanation of recommender systems: Mah

Re: Solr Delta Import Handler not working

2012-11-19 Thread Lance Norskog

| dataSource="null" I think this should not be here. The datasource should default to the listing. And 'rootEntity=true' should be in the XPathEntityProcessor block, because you are adding each file as one document. - Original Message - | From: "Spadez" | To: solr-user@lucene.apache.

Re: Solr Delta Import Handler not working

2012-11-17 Thread Lance Norskog

I think this means the pattern did not match any files: 0 The wiki example includes a '^' at the beginning of the filename pattern. This matches a complete line. http://wiki.apache.org/solr/DataImportHandler#Transformers_Example More: Add rootEntity="true". It cannot hurt to be explicit. The d

Re: More references for configuring Solr

2012-11-11 Thread Lance Norskog

LucidFind collects several sources of information in one searchable archive: http://find.searchhub.org/?q=&sort=#%2Fp%3Asolr - Original Message - | From: "Dmitry Kan" | To: solr-user@lucene.apache.org | Sent: Sunday, November 11, 2012 2:24:21 AM | Subject: Re: More references for configu

Re: SolrCloud, Zookeeper and Stopwords with Umlaute or other special characters

2012-11-07 Thread Lance Norskog

You can debug this with the 'Analysis' page in the Solr UI. You pick 'text_general' and then give words with umlauts in the text box for indexing and queries. Lance - Original Message - | From: "Daniel Brügge" | To: solr-user@lucene.apache.org | Sent: Wednesday, November 7, 2012 8:45:4

Re: Does SolrCloud supports MoreLikeThis?

2012-11-06 Thread Lance Norskog

The question you meant to ask is: "Does MoreLikeThis support Distributed Search?" and the answer apparently is no. This is the issue to get it working: https://issues.apache.org/jira/browse/SOLR-788 ("Distributed Search" is independent of SolrCloud.) If you want to make unit tests, that would r

Re: Where to get more documents or references about sold cloud?

2012-11-06 Thread Lance Norskog

LucidFind is a searchable archive of Solr documentation and email lists: http://find.searchhub.org/?q=solrcloud - Original Message - | From: "Jack Krupansky" | To: solr-user@lucene.apache.org | Sent: Monday, November 5, 2012 4:44:46 AM | Subject: Re: Where to get more documents or refere

Re: trouble instantiating CloudSolrServer

2012-11-02 Thread Lance Norskog

What is the maven repo id & version for this? - Original Message - | From: "Mark Miller" | To: solr-user@lucene.apache.org | Sent: Friday, November 2, 2012 6:52:10 PM | Subject: Re: trouble instantiating CloudSolrServer | | I think the maven jars must be out of whack? | | On Fri, Nov 2,

Re: After adding field to schema, the field is not being returned in results.

2012-11-02 Thread Lance Norskog

t. You might try logging the value somewhere | in | your php so you an post that and/or include it in your sample XML | file... | | Best | Erick | | | On Fri, Nov 2, 2012 at 10:02 AM, Dotan Cohen | wrote: | | > On Thu, Nov 1, 2012 at 9:28 PM, Lance Norskog | > wrote: | > > Have y

Re: After adding field to schema, the field is not being returned in results.

2012-11-01 Thread Lance Norskog

Have you uploaded data with that field populated? Solr is not like a relational database. It does not automatically populate a new field when you add it to the schema. If you sort on a field, a document with no data in that field comes first or last (I don't know which). - Original Message

Re: throttle segment merging

2012-10-28 Thread Lance Norskog

1) Do you use compound files (CFS)? This adds a lot of overhead to merging. 2) Does ES use the same merge policy code as Solr? In solrconfig.xml, here are the lines that control segment merging. You can probably set mergeFactor to 20 and cut the amount of disk I/O.

Re: lukeall.jar for Solr4r?

2012-10-27 Thread Lance Norskog

Aha! Andrzej has not built a 4.0 release version. You need to check out the source and compile your own. http://code.google.com/p/luke/downloads/list - Original Message - | From: "Carrie Coy" | To: solr-user@lucene.apache.org | Sent: Friday, October 26, 2012 7:33:45 AM | Subject: lukeal

Re: Get metadata for query

2012-10-27 Thread Lance Norskog

nts that it's hard to know what the best | facets are for the result set. | | Erik | | | On Oct 27, 2012, at 04:09 , Lance Norskog wrote: | | > Nope! Each document comes back with its own list of stored fields. | > If you want to find all fields in an index, you have to fetch | > every l

Re: Get metadata for query

2012-10-27 Thread Lance Norskog

rstand the real question here. What is the | "metadata". | | I mean, q=x&fl=* gives you all the (stored) fields for documents | matching | the query. | | What else is there? | | -- Jack Krupansky | | -Original Message- | From: Lance Norskog | Sent: Friday, October 26,

Re: Search and Entity structure

2012-10-26 Thread Lance Norskog

A side point: in fact, the connection between MBA and grade is not lost. The values in a multi-valued field are stored in order. You can have separate multi-valued fields with matching entries, and the values will be fetched in order and you can match them by counting. This is not database-ish,

Re: Get metadata for query

2012-10-26 Thread Lance Norskog

Ah, there's the problem- what is a fast way to fetch all fields in a collection, including dynamic fields? - Original Message - | From: "Otis Gospodnetic" | To: solr-user@lucene.apache.org | Sent: Friday, October 26, 2012 3:05:04 PM | Subject: Re: Get metadata for query | | Hi, | | No.

Re: DIH throws NullPointerException when using dataimporter.functions.escapeSql with parent entities

2012-10-26 Thread Lance Norskog

ira/browse/SOLR-2141) which goes back to | October 2010 and is flagged as "Resolved: Cannot Reproduce". | | | 2012/10/20 Lance Norskog : | > If it worked before and does not work now, I don't think you are | > doing anything wrong :) | > | > Do you have a different versi

Re: DIH throws NullPointerException when using dataimporter.functions.escapeSql with parent entities

2012-10-19 Thread Lance Norskog

If it worked before and does not work now, I don't think you are doing anything wrong :) Do you have a different version of your JDBC driver? Can you make a unit test with a minimal DIH script and schema? Or, scan through all of the JIRA issues against the DIH from your old Solr capture date.

Re: Solr-4.0.0 DIH not indexing xml attributes

2012-10-19 Thread Lance Norskog

Do other fields get added? Do these fields have type problems? I.e. is 'attr1' a number and you are adding a string? There is a logging EP that I think shows the data found- I don't know how to use it. Is it possible to post the whole DIH script? - Original Message - | From: "Billy Newma

Re: Flushing RAM to disk

2012-10-17 Thread Lance Norskog

I do not know how to load an index from disk into a RAMDirectory in Solr. - Original Message - | From: "deniz" | To: solr-user@lucene.apache.org | Sent: Wednesday, October 17, 2012 12:15:52 AM | Subject: Re: Flushing RAM to disk | | I heard about MMapDirectory - actually my test env is u

Re: Flushing RAM to disk

2012-10-16 Thread Lance Norskog

There is no "backed by disk" RamDirectory feature. The MMapDirectory uses the operating system to do almost exactly the same thing, in a much better way. That is why it is the default. - Original Message - | From: "deniz" | To: solr-user@lucene.apache.org | Sent: Tuesday, October 16, 20

Re: How many documents in each Lucene segment?

2012-10-16 Thread Lance Norskog

CheckIndex prints these stats. java -cp lucene-core-WHATEVER.jar org.apache.lucene.index.CheckIndex - Original Message - | From: "Shawn Heisey" | To: solr-user@lucene.apache.org | Sent: Monday, October 15, 2012 9:46:33 PM | Subject: Re: How many documents in each Lucene segment? | | On

Re: Solr Autocomplete

2012-10-15 Thread Lance Norskog

http://find.searchhub.org/?q=autosuggest+OR+autocomplete - Original Message - | From: "Rahul Paul" | To: solr-user@lucene.apache.org | Sent: Monday, October 15, 2012 9:01:14 PM | Subject: Solr Autocomplete | | Hi, | I am using mysql for solr indexing data in solr. I have two fields: | "n

Re: How to import a part of index from main Solr server(based on a query) to another Solr server and then do incremental import at intervals later(the updated index)?

2012-10-14 Thread Lance Norskog

a query) to the slave. > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/How-to-import-a-part-of-index-from-main-Solr-server-based-on-a-query-to-another-Solr-server-and-then-tp4013479p4013580.html > Sent from the Solr - User mailing list archive at Nabble.com. -- Lance Norskog goks...@gmail.com

Re: Solr - db-data-config.xml general asking to entity

2012-10-14 Thread Lance Norskog

WHERE c.source_id = ${blog.id}"> > clob="true" /> > > > > > Is that possible? > How would the result looks like (cant test it until monday)? > All example I found the "sub" entity just select 1 column but I need at > least 2. > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Solr-db-data-config-xml-general-asking-to-entity-tp4013533.html > Sent from the Solr - User mailing list archive at Nabble.com. -- Lance Norskog goks...@gmail.com

Re: Using

2012-10-12 Thread Lance Norskog

gt; [ivy:resolve] :: >> [ivy:resolve] :: >> org.eclipse.jetty.orbit#javax.servlet;3.0.0.v201112011016!javax.servlet.orbit >> [ivy:resolve] :::::: >> [ivy:resolve] >> [ivy:resolve] >> [ivy:resolve] :: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS >> >> Can anybody point me to the source of this error or a workaround? >> >> Thanks, >> Tricia -- Lance Norskog goks...@gmail.com

Re: which api to use to manage solr ?

2012-10-12 Thread Lance Norskog

t; >> >> -- >> View this message in context: >> http://lucene.472066.n3.nabble.com/which-api-to-use-to-manage-solr-tp4013491.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> -- Lance Norskog goks...@gmail.com

Re: segment number during optimize of index

2012-10-10 Thread Lance Norskog

Study index merging. This is awesome. http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html Jame- opening lots of segments is not a problem. A major performance problem you will find is 'Large Pages'. This is an operating-system strategy for managing servers with 10s of

Re: Using additional dictionary with DirectSolrSpellChecker

2012-10-10 Thread Lance Norskog

Hapax legomena (terms with DF of 1) are very often typos. You can automatically build a stopword file from these. If you want to be picky, you can use only words with a very small distance from words with much larger DF. - Original Message - | From: "Robert Muir" | To: solr-user@lucene.

Re: Query foreign language "synonyms" / words of equivalent meaning?

2012-10-10 Thread Lance Norskog

I want an update processor that runs Translation Party. http://translationparty.com/ http://downloadsquad.switched.com/2009/08/14/translation-party-achieves-hilarious-results-using-google-transl/ - Original Message - | From: "SUJIT PAL" | To: solr-user@lucene.apache.org | Sent: Wednesda

Re: [ANN] new SolrMeter release

2012-10-10 Thread Lance Norskog

Cool! Who made the logo? It's nice. - Original Message - | From: "Tomás Fernández Löbbe" | To: solr-user@lucene.apache.org | Sent: Wednesday, October 10, 2012 3:57:32 PM | Subject: [ANN] new SolrMeter release | | Hi everyone, I'm pleased to announce that SolrMeter 0.3.0 was | released

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1436 matches

Mail list logo