oying something like LSH clustering.
>
> On Thu, Nov 24, 2011 at 5:04 PM, Fred Zimmerman >wrote:
>
> > I have a corpus that has a lot of identical or nearly identical
> documents.
> > I'd like to return only the unique ones (excluding the "nearly identical"
I have a corpus that has a lot of identical or nearly identical documents.
I'd like to return only the unique ones (excluding the "nearly identical"
which are redirects). I notice that all the identical/nearly identicals
have identical Solr scores. How can I tell Solr to throw out all the
success
Any options that do not require adding new software?
On Mon, Nov 7, 2011 at 11:11 AM, Nagendra Nagarajayya <
nnagaraja...@transaxtions.com> wrote:
> Shaun:
>
> You should try NRT available with Solr with RankingAlgorithm here. You
> should be able to add docs in real time and also query them in r
t; If you're crawling the data by yourself, you can just add the source
> to the document.
>
> If you're using DIH, you can specify the field as a constant. Or you
> could implement a custom Transformer that inserted it for you.
>
> Best
> Erick
>
> On Wed, Nov
I want to be able to list some searches to particular sources, e.g. "wiki
only", "crawled only", etc. So I think I need to create a source field in
the schema.xml. However, the native data for these sources does not
contain source info (e.g. "crawled"). So I want to use (I think)
to add a strin
I have a lot of fields: I count 31 without omitNorms values, which means
> false by default.
Gak! 11,000,000 * 1 * 31 = 31 x 10M = 310MB RAM all by itself.
On Wed, Oct 26, 2011 at 1:01 PM, Fred Zimmerman wrote:
> More on what's happening. It seems to be timing out during the commit.
&
on=2} hits=11576871
> status=0 QTime=1
> *java.lang.OutOfMemoryError: Java heap space*
> Dumping heap to /home/bitnami/apache-solr-3.4.0/example/heaplog ...
> Heap dump file created [306866344 bytes in 32.376 secs]
On Wed, Oct 26, 2011 at 11:09 AM, Fred Zimmerman wrote
It's a small indexing job coming from nutch.
2011-10-26 15:07:29,039 WARN mapred.LocalJobRunner - job_local_0011
java.io.IOException: org.apache.solr.client.solrj.SolrServerException: Error
executi$
at
org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat.getRec$
at o
It is not a multi-core setup. The solr.xml has null value for . ?
HTTP ERROR 404
Problem accessing /solr/admin/index.jsp. Reason:
missing core name in path
2011-10-26 13:40:21.182:WARN::/solr/admin/
java.lang.IllegalStateException: STREAM
at org.mortbay.jetty.Response.getWriter(Re
what about something that's a bit less discovery-oriented? for my particular
application I am most concerned with bringing back a straightforward "top
ten" answer set and having users look at it. I actually don't want to bother
them with faceting, etc. at this juncture.
Fred
On Tue, Oct 25, 2011
So, basically, yes, it is a real problem and there is no designed solution?
e.g. optional sub-schema files that can be turned off and on?
On Sun, Oct 23, 2011 at 6:38 PM, Erik Hatcher wrote:
>
> On Oct 23, 2011, at 19:34 , Fred Zimmerman wrote:
> > it seems from my limited experie
Hi,
it seems from my limited experience thus far that as new data types are
added, schema.xml will tend to become bloated with many different field and
fieldtype definitions. Is this a problem in real life, and if so, what
strategies are used to address it?
FredZ
Offhand, it looks as though you're trying to do something
> with DIH that it wasn't intended to do. But that's just a guess
> since the details of what you're trying to do are so sparse...
>
> Best
> Erick
>
> On Wed, Oct 19, 2011 at 10:49 PM, Fred Zimmerman
http://business.zimzaz.com/wordpress/2011/10/how-to-clone-wikipedia-mirror-and-index-wikipedia-with-solr/
Solr dataimport is reporting file not found when it looks for foo.xml.
Where is it looking for /data? is this an url off the apache2/htdocs on the
server, or is it an URL within example/solr/...?
dumb question ...
today I set up solr3.4/example, indexing to 8983 via post is working, so is
search, solr/dataimport reports
0
0
0
2011-10-19 18:13:57
Indexing failed. Rolled back all changes.
Google tells me to look at the exception logs to find out what's happening
... but, I can't find the l
Hi,
I am getting ready to index a recent copy of Wikipedia's pages-articles
dump. I have two servers, foo and bar. On foo.com/mediawiki I have a
Mediawiki install serving up the pages. On bar.com/solr I have my solr
install. I have the pages-articles.xml file from Wikipedia and the solr
instruct
Hi,
I want to include the search query in the output of wt=csv (or a duplicate
of it) so that the process that receives this output can do something with
the search terms. How would I accomplish this?
Fred
I did this
bin/nutch crawl urls -solr http://localhost:8983/solr/ -depth 3 -topN 5
per http://wiki.apache.org/nutch/NutchTutorial
On Fri, Oct 7, 2011 at 13:36, Andy Lindeman wrote:
> On Fri, Oct 7, 2011 at 13:32, Fred Zimmerman wrote:
> > I am running a big nutch job which is su
I am running a big nutch job which is supposed to be sending information to
solr for indexing, but it does not seem to be occurring. the number of docs
and max docs in solr statistics is not changing. how can I figure out what's
happening here?
probably can't help, but pls keep the topic on list, as it is important for
me too!
On Wed, Oct 5, 2011 at 14:12, FionaY wrote:
> We have Solr integrated, but we are having some issues with search
> relevance
> and we need some help fine tuning the search results. Anyone think they can
> help?
Hi,
I followed the very simple instructions found at '
http://wiki.apache.org/solr/Flare/HowTo
but run into a problem at step 4
Launch Solr:
cd ; java -Dsolr.solr.home= -jar start.jar
where Solr complains that it can't find solrconfig.xml in either the
classpath or the solr-ruby home dir. Can
Hi,
for my application, I would like to be able to create web queries
(wget/curl) that get "more like this" for either a single arbitrarily
specified URL or for the first x terms in a search query. I want to return
the results to myself as a csv file using wt=csv. How can I accomplish the
MLT pie
got it.
curl "
http://zimzazsearch3-1.bitnamiapp.com:8983/solr/select/?indent=on&q=video&fl=name,id&wt=csv";
works like a champ.
On Tue, Oct 4, 2011 at 15:35, Fred Zimmerman wrote:
> This http request works as desired (bringing back a csv file)
>
>
> htt
This http request works as desired (bringing back a csv file)
http://zimzazsearch3-1.bitnamiapp.com:8983/solr/select?indent=on&version=2.2&q=battleship&wt=csv&;
but the same URL submitted via wget produces the 500 error reproduced below.
I want the wget to download the csv file. What's going on
wrote:
> conf/velocity by default. See Solr's example configuration.
>
> Erik
>
> On Sep 23, 2011, at 12:37, Fred Zimmerman wrote:
>
> > ok, answered my own question, found velocity rw in solrconfig.xml. next
> > question:
> >
>
11:57, Fred Zimmerman wrote:
> This seems to be out of date. I am running Solr 3.4
>
> * the file structure of apachehome/contrib is different and I don't see
> velocity anywhere underneath
> * the page referenced below only talks about Solr 1.4 and 4.0
>
> ?
>
&g
This seems to be out of date. I am running Solr 3.4
* the file structure of apachehome/contrib is different and I don't see
velocity anywhere underneath
* the page referenced below only talks about Solr 1.4 and 4.0
?
On Thu, Sep 22, 2011 at 19:51, Markus Jelsma wrote:
> Hi,
>
> Solr support the
can you say a bit more about this? I see Velocity and will download it and
start playing around but I am not quite sure I understand all the steps that
you are suggesting. Fred
On Thu, Sep 22, 2011 at 19:51, Markus Jelsma wrote:
> Hi,
>
> Solr support the Velocity template engine and has veyr g
>
> Hi,
I would like to take the HTML documents that are the result of a Solr search
and combine them into a single HTML document that combines the body text of
each individual document. What is a good strategy for this? I am crawling
with Nutch and Carrot2 for clustering.
Fred
30 matches
Mail list logo