Faffing round the edges Its easier than getting stuck in

15Jun/090

Collective Intelligence in Erlang

It has been a long, long tie since I posted. I have a fair few more Project Euler solutions to post, and I will post them soon. Work has got in the way of my Erlanging. I'm gutted. I'd ditch Java in a flash for paid Erlang work.

Anyway. I start reading Programming Collective Intelligence, 1st Edition and I am enjoying it. I decided (er, for fun) to work along with the code and re-write it in Erlang.

Here is the first stuff (upto chapter 2.4)

I know I need to learn how to write @spec Edoc declarations.

So you can try it with:

recomendations:top_matches(recomendations:data(), "Toby",
    fun(X, Y, Z) -> recomendations:pearson_similarity(X, Y, Z) end).
28Mar/090

Euler 22 – Name scores in a list

It has been a while since I Euler'd but I have been mega depressed what with working in Sheffield and trying to retro-fit web security as an after thought to a web app that I was thrown at last minute...so at home, to cheer up, it is Euler time again.

I've been looking at Nitrogen a lot too. Anyhoo, the problem:

Using names.txt (right click and ‘Save Link/Target As…’), a 46K text file containing over five-thousand first names, begin by sorting it into alphabetical order. Then working out the alphabetical value for each name, multiply this value by its alphabetical position in the list to obtain a name score.

For example, when the list is sorted into alphabetical order, COLIN, which is worth 3 + 15 + 12 + 9 + 14 = 53, is the 938th name in the list. So, COLIN would obtain a score of 938 × 53 = 49714.

What is the total of all the name scores in the file?

I don't think this is a very mathy problem, just a programming one. I edited the file a little so I could use erlang's file:consult/1 which allows you to read a file as a set of erlang terms. I merely inserted "{names, [" at the start and "]}." (full stop =:= very important!) at the end of the file of names.

With erlang treating strings as lists it is very easy to calculate the score for a name. I used lists:foldl/3 on each name in turn after sorting the list of names. For once I find erlangs handling of strings very, very useful. The code is below.

%%%%%%
% Prob 22
% Each letter has the value ASCII code value - 64 so A = 65 - 64 so A = 1
%%%%%
problem22() ->
	%%Read the file
	{ok, Tup} = file:consult("names.txt"),
	%%Get the name list
	[{names, L}] = Tup,
	%%Sort it
	Names = lists:sort(L),
	calculate_values(Names, 1, 0).

%% Calulate the values for the names in the file
%% list of names, current iteration, current accumulated value
calculate_values([], _Iteration, Total) -> Total;
calculate_values([Name|Names], Iteration, Total) ->
	calculate_values(Names, Iteration+1, Total + calculate_value(Name, Iteration)).

%%Calulate the value for a single name (That is sum of letter scores (EG A=1) * this names position in the list of names (Iter))
calculate_value(Name, Iteration) ->
	NameScore =lists:foldl(fun(X, Acc) -> Acc + (X - 64) end, 0, Name),
	NameScore * Iteration.

It is cheating since I used the lists:sort/1 function of erlang. When I have more time I'll implement the sort too, since that would seem to be the point of the exercise.

28Dec/080

Problem 14: Find the longest sequence

The problem is:

The following iterative sequence is defined for the set of positive integers:

n n/2 (n is even)
n 3n + 1 (n is odd)

Using the rule above and starting with 13, we generate the following sequence:
13 40 20 10 5 16 8 4 2 1
It can be seen that this sequence (starting at 13 and finishing at 1) contains 10 terms. Although it has not been proved yet (Collatz Problem), it is thought that all starting numbers finish at 1.

Which starting number, under one million, produces the longest chain?

NOTE: Once the chain starts the terms are allowed to go above one million.

I just brute forced it. I did take a guess that the number would be in the 500k to 1m range. It takes about 40 seconds. It runs in about 16 seconds if I use only odd numbers between 500001 and 1m. Is there a better way?

Anyway, the code:

%%
% Problem 14 (a bit slow (40 seconds ish) maybe I could add a cache??)
% Note1: optimised after reading the forum to only use the odd numbers
%%
prob14() ->
	L = lists:seq(500001, 1000000, 2),
	max(lists:map(fun(X) -> {X, length(get_next([X]))} end, L), {0, 0}).

max([], Max) -> Max;
max([{_X, Len}=H|T], {_MaxX, MaxLen}=Max) ->
	case Len > MaxLen of
		true ->
			max(T, H);
		false ->
			max(T, Max)
	end.

get_next([H|T]=L) when H rem 2 =:= 0 ->
	get_next([H div 2|L]);
get_next([H|T]=L) when H =:= 1 ->
	L;
get_next([H|T]=L) ->
	get_next([(3 * H) + 1|L]).

Some people on the forum are getting there results in ~1-5 seconds which is a whole load faster than mine. Some are doing that by using a cache and others by using what appears to be faster implementations. I guess my Erlang solution is slow because I don't know the right way to write Erlang.

I also think (for learning purposes) I'll try and do a multi-process version of this solution since it decomposes well into that idiom.

28Dec/080

Problem 13: Work out the first ten digits of the sum of the following one-hundred 50-digit numbers.

This is not a tough problem for lisp or erlang. Some bright spark in the forum simply states that he cut and paste the list and wrapped it with (+ ) and had the result. I think the size of the numbers presents problems for some languages and the optimization is to only sum the first eleven digits of the long numbers since we only need the first 10 digits of the result.

I am lazy so I went the cut and paste option...

prob13() ->
    BigNumbers = [ELIDED],
    lists:sublist(integer_to_list(lists:sum(BigNumbers)), 1, 10).

So sum the list and then take the first 10 digits of the result. If you want to see the list of numbers they are here.

28Dec/080

Problem 20: Sum of the digits in 100!

This is so similar to problem 16. The problem is:

n! means n (n 1) ... 3 2 1

Find the sum of the digits in the number 100!

Which is cool 'cos I have learned what n! means. My code is:

prob20(N) ->
	L = lists:seq(N, 1, -1),
	NBang = lists:foldl(fun(X, Acc) -> Acc * X end, 1, L),
	sum_the_digits(NBang).

sum_the_digits(N) ->
	L = integer_to_list(N),
	lists:foldl(fun(X, Acc) -> Acc + list_to_integer([X]) end, 0, L).

I reuse the sum_the_digits function from problem 16. These recent problems have caused me nothing like the trouble of the first 10. Either I am over some initial erlang/fp/maths hump or they admins have the difficulty order wrong.

Oh yeah. I find that I am using lists:foldl all the time.

28Dec/080

Problem 16: Sum of digits

I realised that you can sort the Project Euler problems by difficulty. So I did. Which is why the order gets a bit odd from now onwards. I have no time to do Euler problems so I am cherry picking what I can fit into the spare 5 and 10 minute slots I have. So...problem 16:

15 = 32768 and the sum of its digits is 3 + 2 + 7 + 6 + 8 = 26.

What is the sum of the digits of the number 21000?

Not a lot to say really so just the code.

prob16(N, Exp) ->
	sum_the_digits(round(math:pow(N, Exp))).

sum_the_digits(N) ->
	L = integer_to_list(N),
	lists:foldl(fun(X, Acc) -> Acc + list_to_integer([X]) end, 0, L).

The second function (integer_to_list) turns a number into a list of its digits and then folds the list to sum the digits. Can't just use lists:sum as each list element is now a "String" so needs to be turned back into an integer with list_to_integer([X]). This reeks of hack so if anyone knows a better way please let me know.

28Dec/080

Problem 10: Sum of the Primes

I haven't posted for a while. I'm contracting for a VoD producing company pathologically averse to normal quality measures which has taken all (ALL) my time.

Anyway, the problem:

The sum of the primes below 10 is 2 + 3 + 5 + 7 = 17.

Find the sum of all the primes below two million.

Since I already have a functional prime checker from problem 3 so the code is just:

prob10_alt(N) ->
	Primes = [A || A <- lists:seq(3, N, 2), is_prime2(A)],
	lists:sum([2|Primes]).

I know this isn't the Sieve of Eratosthenes but I had a crack at a sieve implmentation and it was pitifully slow compared to the above. I know it is not optimal but it is adequate. I did learn about The Sieve due to this problem so it was of benefit. I have also read and implemented a Java version of The Sieve (it really suits iteration) so education has occurred.

In case anyone reads this and can help me with my Sieve versions here it is:

prob10_sieve2(N) ->
	Limit = floor(math:sqrt(N)),
	Cans = array:new(N, {default, true}),
	Cans2 = seive_it(4, N, 2, Cans),
	Primes = seive2(Cans2, Limit),
	array:foldl(fun(X, Val, Acc) -> arr_fold_fun(X, Val, Acc) end, 0, Primes) - 1.

arr_fold_fun(X, Val, Acc) ->
	case Val of
		true ->
			X + Acc;
		false ->
			Acc
		end.

seive2(Array, Limit) ->
	for(3, Limit, 2, Array).

for(Start, Limit, _Step, Array) when Start >= Limit -> Array;
for(Start, Limit, Step, Array) ->
	case array:get(Start, Array) of
		true ->
			for(Start+Step, Limit, Step,
                             seive_it(Start*Start, array:size(Array), 2*Start, Array));
		false ->
			for(Start+Step, Limit, Step, Array)
		end.

seive_it(Start, Limit, _Step, Array) when Start >= Limit -> Array;
seive_it(Start, Limit, Step, Array) ->
	seive_it(Start+Step, Limit, Step, array:set(Start, false, Array)).
28Nov/080

Problem 1 again

So I read a comment on reddit about how bad my maths was (and how badly I abuse Erlang too). I started reading Algorithms in a Nutshell as I figured I might have enough maths now to understand it. Well I don't really but I did understand the Big O notation stuff at the start enough to understand the comment on reddit:

Ah, what a misuse of Erlang, especially considering that all the 3 programs presented are all O(n) far from the optimal O(1). The guy should learn some math rather than experimenting with 'creating a process for every number'. His next post about problem 2 is not much better (he does 'find the sum of fibs below 4mln' by evaluating fib(n) from scratch until it becomes bigger than 4mln), but at least there he confesses: 'I am really let down by my lack of maths at school.'

So I read the PDF again from Project Euler solution (which I earned the right to read by misusing myself to a solution).

I think I understand the optimal solution well enough, and I was certainly able to implement it:

problem1(Target) ->
	sumDivBy(3, Target) + sumDivBy(5, Target) - sumDivBy(15, Target).

sumDivBy(N, Target) ->
	P = Target div N,
	N * (P * (P+1)) div 2.

And that is much better.

Now I'm not tooooo cut up by the comments (not ready yet). I didn't get any maths education beyond 16, and what I got up to then was rubbish but at least I am trying to rectify it. My parallel processing attempt was educational for me. I can never do anything without a practical motivation.

The problem is: I understand how the above solution works and I can see that it is optimal but without being taught it I doubt I would have got anywhere near it on my own.

Tagged as: , No Comments
24Nov/080

Problem 9: Pythagorean triplets.

Project Euler problem 9 is back in proper maths mode with this beauty:

A Pythagorean triplet is a set of three natural numbers, a b c, for which,
a2 + b2 = c2
For example, 32 + 42 = 9 + 16 = 25 = 52.

There exists exactly one Pythagorean triplet for which a + b + c = 1000.
Find the product abc.

Wow. Numbers are strange. Their relationships inscrutable and powerful. Euclid sorted this one out some time ago so I just stood on his giant shoulders (well, Wikipedia's actually) and had this for generating triplets:

generate_triple(M, N) ->
	A = 2 * (M*N),
	B = (M*M) - (N*N),
	C = (M*M) + (N*N),
	{A, B, C}.

Now, my friend Nick at work managed to express C in terms of A and B and save himself the trouble of the above but he is smarter and better educated than me (though I doubt he can do a short 21 for a 10 mile TT so I can comfort myself there).

The rest of the work is more brute force but Erlang does the heavy lifting for me:

prob9() ->
	L = [generate_triple(M, N) || N <- lists:seq(1,100), M <- lists:seq(2, 100), M > N],
	[{A,B,C}] = lists:filter(fun({A,B,C}) -> A+B+C =:= 1000 end, L),
	A*B*C.

I love the second line. The power of pattern matching is clear and it creates a readable, uncluttered way to get at the values of A, B and C. The question "...There exists exactly one..." assures me that [{A,B,C}] will match.

Again, Nick's Java effort is more optimal (he doesn't do the 2 loops (he needs a blog I guess)).

24Nov/080

Euler 8: Text handling is not a strong suit

Problem 8 in Project Euler is not so much a maths problem as a text manipulation one (isn't it?):

Find the greatest product of five consecutive digits in the 1000-digit number.

73167176531330624919225119674426574742355349194934
96983520312774506326239578318016984801869478851843
85861560789112949495459501737958331952853208805511
12540698747158523863050715693290963295227443043557
66896648950445244523161731856403098711121722383113
62229893423380308135336276614282806444486645238749
30358907296290491560440772390713810515859307960866
70172427121883998797908792274921901699720888093776
65727333001053367881220235421809751254540594752243
52584907711670556013604839586446706324415722155397
53697817977846174064955149290862569321978468622482
83972241375657056057490261407972968652414535100474
82166370484403199890008895243450658541227588666881
16427171479924442928230863465674813919123162824586
17866458359124566529476545682848912883142607690042
24219022671055626321111109370544217506941658960408
07198403850962455444362981230987879927244284909188
84580156166097919133875499200524063689912560717606
05886116467109405077541002256983155200055935729725
71636269561882670428252483600823257530420752963450

Again it just makes sense to me to brute force it. That is try every sequence of 5 digits numbers and chose the highest product. To whit:

prob8() ->
	BigNum = "7316717653133062491922511967442657474235534....(elided)",
	calculate_prod(5, BigNum, 1, []).

calculate_prod(Bite, BigNum, Start, Products) when Start > (length(BigNum) - Bite)
	-> lists:max(Products);
calculate_prod(Bite, BigNum, Start, Products) ->
	Prod = lists:foldl(fun(X, Acc) -> Acc * list_to_integer([X]) end, 1, lists:sublist(BigNum, Start, Bite)),
	calculate_prod(Bite, BigNum, Start+1,[Prod|Products]).

I like thinking in Erlang. It is nice to know that the number is fine as a number in Elrang so that there is no need for hacks or special classes (Java's BigDecimal) to manipulate these big numbers.

I like recursion, I like "overloading" functions with pattern matching and guards. I know there is a lot of hype around Erlang right now but it is an ideal balance, in my mind, of functional programming and practicality. It is, after all, a language of industry not research and it shows.

 

February 2010
M T W T F S S
« Jun    
1234567
891011121314
15161718192021
22232425262728

Archives